This paper examines the problems of access to non-patent literature and the development and implications of Open Access literature with comparisons to patent literature. It also provides a guide to enhancing the efficiency and cost-effectiveness of discovery and retrieval of Open Access literature. Significant strides have been made in, on the one hand, maintaining and developing the IP system worldwide and, on the other, in easing access to enormous amounts of scientific and engineering disclosures. Open Access is presented as a significant and healthy growth of accessible publishing. Although the business model for non-patent literature has been different traditionally from that of patents, recently there has been a move to favour Open Access. It appears that adequate income is still being generated for non-patent literature publishers who use an Open Access strategy.
A pharmaceutical company P owned a patent whereas, another company C was granted approval by a National Pharmaceutical Control organisation to market tablets with the same active ingredient. Company P filed suit and claimed that the acts of company C to import, manufacture, offer for sale, sale and stocking for the purpose of sale or offer for sale of the tablets infringed its patent. Company C counterclaimed for a declaration that the patent was invalid and filed newly searched prior art documents to prove this. The High Court held all claims of the patent to be invalid for lack of novelty or inventive step. Sounds familiar? Uncertainty with respect to legal, technical or commercial viability of patented technology makes or can make patent sale and licensing into "A Market for Lemons."1 As Akerlof points out this type of market is influenced by asymmetric data, i.e. that the purchaser or licensee has difficulty in determining the legal, technical or commercial validity/viability of a patent—or it would be too time consuming and costly to determine these. The asymmetry could be reduced if searching for relevant documents were more reliable and if patent offices and third parties have low cost access to comprehensive and full-text searchable legal, technical and commercial information as well as free-of-charge translation.
Information from patent applications and patents is now freely available for study and these are searchable with access to free-of-charge downloadable full text copies, as well as free-of-charge in-line machine translations at least into the English language. The title of this article is derived from "Free as in Freedom" a book detailing Richard Stallman's Crusade for Free Software. A derivative of this—Open Source software—has gained acceptance and some respectability as an alternative business strategy. Private initiatives such as Google Books have made many books available for on-line searching but not for full text copies. One monopolistic bastion which has been harder to crack relates to the copyright protection of scientific and engineering journals, especially the cost of gaining access to articles in high-impact journals. An open question to be discussed is whether "Open Access" can provide concrete benefits by reducing the asymmetry of information relevant to licensing and sale of patents by making scholarly outputs openly available.
This paper examines the problems of access to non-patent literature and the development and implications of Open Access literature; it then provides a guide to enhancing the efficiency and cost-effectiveness of discovery and retrieval of Open Access literature.
Advanced industrial countries have knowledge based economies with R&D teams in academic or research organisations or for-profit or not-for-profit organisations generating and publishing scientific and engineering results in technical or academic journals, in books and patent applications or making information available to the public through poster sessions, videos, seminars, presentations, etc. These economies also market scientific and technical products. A further growth in technical publications is driven by the continuing expansion of scientific and technical products and by growth in Internet-based product support. For example, employment of technical writers of such product documentation is projected to grow 10 percent from 2014 to 2024, faster than the average for all occupations (source: United States Department of Labor). All these may be described as "disclosures."
Such disclosures can be described as either "patent literature" or "non-patent literature." Patent literature (PL) is published by patent offices. Non-patent literature (NPL) is more diverse and is published by a variety of different organisations. NPL includes books, monographs, trade journals, company disclosures such as product documentation, academic journals, conference proceedings, standards, master and PhD theses, and commercial databases.2
Information disclosed in patents does not always appear in non-patent literature. PL is still not often used as a source of scientific technical information by academics despite the fact that patent applications must meet the requirement of "sufficiency" (Europe: EPC Article 83, U.S.: enablement 35 U.S.C. 112(a) or pre-AIA 35 U.S.C. 112), i.e. that the information in the description and drawings must be sufficient for a skilled person to reproduce the invention. Much of NPL is moving in the same direction as research funders are increasingly requiring that the data produced as a result of research be made openly available to enable the reproducibility of results. Accordingly, PL and NPL can be valuable independent or complementary sources of information.
If patent offices (examples: EPO, USPTO, SIPO, JPO, KPO (IP) carry out a full substantial examination for novelty (e.g. EPC: Article 54, U.S.: anticipation USC 35 102) and inventive step (e.g. EPC: Article 56, U.S.: obviousness USC 35 103); a search must be performed. The search is vitally important to the quality of the whole patenting procedure and without a high quality search there is no legal certainty.3 For patent attorneys, IP attorneys at law, corporate IP specialists (any of whom may be involved in patent litigation, opposition, patent valuation or licensing) or licensing executives (e.g. confronted with the question of the value of a patent and the royalty fees to be considered), it can be important to obtain an independent assessment of the validity of a patent. Individual inventors or R&D or marketing managers may also want the same. Hence efficient, precise and economical methods of searching are important if one wishes not just to rely on patent offices.
In accordance with the EPC and the EPO practice, for example, any information made available to the public by any means can be used in such an examination and hence should, theoretically be searched. However, not all disclosures are searchable easily or economically even for large patent offices, not to mention individual inventors or the licensing executive. An ideal situation for such a search would be that one could, at any time and independent of the technology and the amount or nature of disclosures, be able to drill down quickly and economically to a few highly relevant hits. This would require full text searching and advanced Boolean search operators and perhaps translation support and even the application of Artificial Intelligence. One stumbling block in such an ideal search is the way that NPL is made available to the public.
Patent applications and patents are classified by a highly detailed classification system, e.g. the International Patent Classification or national classifications. The International Patent Classification (IPC) was established by the Strasbourg Agreement 1971, and provides for a hierarchical system of language independent symbols for the classification of patents and utility models according to the different areas of technology to which they pertain. In combination with full text searching and powerful Boolean operators, rapid and accurate searching is possible. Generally only fee-based commercial databases provide this for a significant number of different national and regional patent databases. The newer Cooperative Patent Classification (CPC) is a patent classification system, which has been jointly developed by the European Patent Office (EPO) and the United States Patent and Trademark Office (USPTO). The CPC is based on the previous European classification system (ECLA), which itself was a more specific and detailed version of the International Patent Classification (IPC) system. The CPC has achieved worldwide acceptance. NPL does not generally have such a system of classification or controlled vocabulary, except in some specialist areas such as medicine.4 Instead there may be "keywords" assigned to each document.
For both NPL and PL, searching in only abstracts is rarely economical. If the feature one wishes searched is more likely to be in the text than in the abstract, one is forced to use broad search terms and then consult the full texts of the many hits to see if the wanted feature is present. If that happens then "free" databases become expensive because of the time taken to analyse the many hits. In addition, for NPL one may be required to pay a membership and/or per hit fee to examine the full text of a journal. This can increase the cost of a search considerably.
Foreign databases pose a further problem of language. For example, the copying service JSTcopy provides a non-patent literature copying service for Japanese documents. The cost for getting one copy of non-patent literature including copyright fee and labour cost is said to be $20-30 on average depending on the length of the literature. On the other hand, the patent database Espacenet provides free translations, e.g. into English for many foreign languages. These are machine translations and are often difficult to understand but usually provide enough information to decide if a human translation would be worth it. It is expected that machine translation will improve when Artificial Intelligence, neural networks and context translation methods become more available and refined.
PL has become a part of research literature and PL includes scientific and engineering information in the descriptions of patents, with an analysis of, and link to background through the publication of patent search reports.5 Non-patent literature is cited less often than patent literature in patent searches but it is still an important component of the search. In 2015 the EPO stated that it had the world's largest collection of patent and non-patent literature documents, containing more than 540 million records in over 130 databases and updated daily, as well as online access to more than 6,000 journals via the EPO Virtual Library, as well as tools and services such as machine translation to extend the range of easily accessible information.6 However, the results of oppositions to granted European patents are roughly one third revoked, one third maintained in amended form and one third are maintained unamended. So searching could still be improved and there is still more that could be done to improve the legal certainty of the examination process. Also, these EPO databases are not available to third parties, e.g. are not available to a defendant in patent litigation.
Although some suggest that NPL provides access to scientific and engineering information, this is not true if financial restrictions of gaining access reduce the availability of this information. Whereas some patent databases allow full text access for free, a further difficulty with NPL is the charges for viewing, copying or printing full text articles required by many scientific journals. The need for journals to earn money on copies runs counter to the idea of quick, cheap full text searching. However, a solution to the access problem is presented by the Open Access movement. Therefore this important development deserves a thorough and detailed review.
Some definitions
One of the early and seminal definitions of Open Access (OA) occurs in the statement of the Budapest Open Access Initiative (BOAI), which was first published in 2002, then re-affirmed 10 years later in 2012 (http://www.budapestopenaccessinitiative.org/read). The statement is long and radical:
By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.
A more concise definition is provided by Peter Suber: Open Access literature is "digital, online, free of charge, and free of most copyright and licensing restrictions."7
These definitions are in contrast to the traditional method of obtaining access to research outputs: Toll Access (TA), by means of institutional or personal subscription to journals, or aggregations of content, or by means of paying publishers for access to individual articles.
Intellectual property laws in the UK and elsewhere offer limited "fair dealing" or "fair use" exemptions. With OA a distinction is made between Gratis and Libre. Gratis OA is free of charge to access but subject to the limits of fair dealing; it removes toll barriers but not permission barriers. Libre OA is both free of charge and free of at least some legal and licensing restrictions; it removes toll barriers and at least some permission barriers; the BOAI definition is libre.
Green OA
A further distinction is made between Green and Gold OA, Green being long established, Gold being more recent.
Green OA is delivered through self-archiving: authors deposit manuscripts in institutional or disciplinary repositories. It relies on the relatively recent but well established infrastructure of institutional and subject repositories in academic institutions. The repositories can be discovered through the Registry of Open Access Repositories (ROAR—http://roar.eprints.org/), which aims "to promote the development of open access by providing timely information about the growth and status of repositories throughout the world." In August 2017 it listed 3847 repositories, the majority being institutional or departmental. It is fully searchable and browsable.
Green OA is easy and cheap: each article incurs only a very small portion of the overhead costs of setting up and running repositories (estimated by Swan at between £6 and £15 per article).8 It does not incur the overheads of peer-review, yet deposited articles may be, most often have been, peer-reviewed for publication in TA journals. It is compatible with subscription journal publishing: scholars can publish in TA journals and, through self-archiving, still make their articles OA. Generally an embargo period, of six or 12 months, is imposed by publishers, and authors generally need to obtain rights from publishers to deposit and make articles available. As patent applications worldwide remain secret for 18 months such an embargo period is fully compatible with the patent system because third parties usually receive information about a patent application first of all on its publication after 18 months. What is required as a minimum is that all relevant patent and non-patent documents are available once the patent application in reference publishes. Finally, it is hospitable to many other types of document, notably theses and multiple-media offerings.
Current thinking on maximising the deposit of OA papers is that they should be deposited on acceptance by a publisher. This practice does not contravene publishers' embargo requirements. The deposit step is a separate action from making an article openly available and the publisher has no sanction over it. The aim is to get authors to deposit their articles as they are accepted for publication, which is the moment they are dealing with the paper for the last time in practical terms. So long as a paper is deposited, the author, and support staff, need not worry about it any longer: if it is under a publisher's embargo the repository software automatically opens the article and makes it public at the end of the embargo period.9
Gold OA
Gold OA has been delivered mainly through journals: these may be completely OA or hybrid, where some articles are OA and others toll access. Articles, in both OA and hybrid journals, are paid for by the authors or their institutions or funders (according to a Jisc survey in 2016 the average article processing charge (APC) paid by UK universities was about GBP1800).10 Articles are peer-reviewed for publication; thus Gold OA may incur much the same costs for the editorial and peer review process as TA journal publishing. Gold OA is always immediate, while Green OA is often subject to time embargoes imposed by subscription journal publishers. It provides access to the published version of an article, while Green OA generally provides access only to the author's final peer-reviewed manuscript, without the formatting or pagination of the published version. By its nature Gold OA is confined to post-prints and generally obtains rights and permissions direct from the rights-holder (usually the author). Both Green and Gold OA are gratis. Green OA generally is only gratis; Gold OA may be libre.
We noted above that Green OA offers a cheap and easy route, based on the peer review practice of traditional TA journals. Gold OA is more problematic. Like TA publishing, it has to bear the costs of editorial boards, peer review, production and marketing. However its costs are met not through TA's well established market route of subscriptions, which are now generally paid by institutions at the wholesale level of the Big Deals, but generally through individual article processing costs (APCs). Payment and administration by institutions are therefore at a very granular level, especially when compared with the Big Deals.
Despite these complications and difficulties, and perhaps a general reluctance to adopt new means of publishing, there is evidence of a continuing rise in Gold OA. In 2011 a study by Mikael Laakso et al.11 (p.8) identified three earlier cycles in the development of Open Access publishing: the "Pioneering Years" (1993 to 1999), the "Innovation Years" (2000 to 2004), and the "Consolidation Years" (2005 to 2009).
The Pioneering Years were characterised by innovation by individuals or small groups of scholars, using simple technologies. There was rapid growth from, obviously, a small base: in 1993, it is estimated that 20 open access journals published 247 articles; by 2000, 741 journals are estimated to have published 35,519 articles. Many of these early journals did not survive.
The Innovation Years coincided with the wholesale movement of journal content to electronic delivery. In terms of OA they were characterised by burgeoning advocacy of OA and the development of economic models for Gold OA, notably article processing costs (APCs). BioMed Central and PLoS demonstrated the viability and high quality of Gold OA. There was significant growth of both titles and articles: by 2005, 2,837 journals published 90,720 articles, an increase of 155 percent on 2000.
The Consolidation Years saw the growth of infrastructure to support OA, such as Open Source publishing software, the DOAJ, Creative Commons licences. Discovery was enhanced and enabled by Google and Google Scholar. Growth was not as spectacular, but still very strong: in 2009 4,767 journals published 191,851 articles, an increase of 111 percent on 2005. One might add that the Consolidation Years also saw the adoption by funders of policies on deposit and public availability of the results of the research they fund. The first was the Wellcome Trust, followed by the National Institutes of Health and many others.
In August 2017 the Directory of Open Access Journals (DOAJ—https://doaj.org/), "an online directory that indexes and provides access to high quality, open access, peer-reviewed journals" recorded 9,822 Gold OA journals from 123 different countries, comprising 2,568,248 articles. There has therefore been a doubling in the number of OA journals in the eight years since 2009. The Jisc survey already mentioned shows a rise in the number of APCs paid by 10 UK universities from 800 in 2013 to about 2800 in 2016, a rise of 250 percent.
We may therefore conclude that there is significant and healthy growth in Gold OA publishing, and that it is increasingly generating income.
Fostering Growth in OA
The most important factor in driving the growth of OA is the policies adopted by funders and individual institutions. Recognising this, the European Commission's FP7 Project PASTEUR4OA (http://www.pasteur4oa.eu/) "aimed to support the European Commission's Recommendation to Member States of July 2012 that they develop and implement policies to ensure Open Access to all outputs from publicly-funded research." As part of the work of PASTEUR4OA, the database of Open Access policies, ROARMAP (http://roarmap.eprints.org/), was extended and elaborated. It now records, and links to, every known policy's conditions under an exhaustive set of categories, and is fully searchable. This database as a whole provides a rich source of data to analyse when studying policy effectiveness.
The project also looked at the mandatory policies in place at over 120 universities around the world and assessed the effectiveness of each policy. This was measured in terms of the percentage of Open Access material available from each institution compared to the total number of articles published from those institutions each year.
Using regression analysis, the project determined that the important elements of a policy, whether of a funder or an institution, are as follows:12
The critical elements of a policy are:
A policy that includes the elements above and is implemented properly by funders and institutions will succeed in gathering a large volume of Open Access content. The requirement to deposit, and the insistence that this step cannot be waived for any reason, ensures that authors deposit their work.
There is evidence from the PASTEUR4OA project to show that the adoption of strong policies by funders drive the adoption of policies, particularly aligned policies, in institutions.13 Most importantly the policy for the European Commission's Horizon 2020 research funding programme is also of this type, meaning that institutions making this type of policy are aligning their own policy with that of the European funding programme. This is important, as researchers within the institution may be funded under this programme and will therefore have the agreeable experience of their funder's and institution's policies have matching requirements, making it simple to comply with both through one set of actions.
An excellent example of a policy linking deposit of articles to research evaluation is provided by HEFCE (Higher Education Funding Council for England). The policy insists that "to be eligible for submission to the post-2014 REF [the next research assessment exercise], authors' outputs must have been deposited in an institutional or subject repository" (http:// www.hefce.ac.uk/pubs/Year/2014/201407/). Deposit must also take place on acceptance by a publisher. There is evidence from individual institutions that this policy is already having the effect of increasing the number and proportion of OA deposits. At UCL (University College London) for instance, the repository contained 10,000 OA outputs in 2011 and 14,000 OA papers in 2013; OA content then sharply increased to 22,500 papers by September 201514 and to more than 40,000 in 2017.
There is as yet little evidence of funders preferring Gold OA over Green OA. At most their policies are neutral, but they do state that Gold OA publishing costs may be included in bids for funding. Obviously there will be a considerable delay in such Gold OA articles being published—generally only after completion of the research project. There has, however, also been a recently extended European Commission initiative: a pilot, run by OpenAIRE, to provide retrospective Gold OA funding for FP7 projects. In summary according to the website (https://www.openaire.eu/postgrantoapilot):
Some individual institutions, such as UCL, have themselves made funds available for Gold OA.
OA Monographs
Obviously journals are of primary importance as relevant NPL, but for the sake of completeness it is worth noting that OA scholarly monographs, the staple of subjects in the arts and humanities, are increasingly available. With a typical print-run of 200-250 and low financial returns for all involved in traditional publication of scholarly monographs, OA is an obvious avenue to be tested and developed. Gold OA monographs are starting to appear, through publishers such as Ubiquity (http://www.ubiquitypress.com/) and Open Book Publishers (http://www.openbookpublishers.com/), especially in the humanities and social sciences.
The number of OA articles is large (Heather Morrison's estimate for September 2017 is 70 million)15 and growing, and of course brings the problem of cheaply and efficiently identifying the most relevant NPL. There are four issues to discuss: the sophistication of searching enabled by the major search engines; databases of OA articles; browser extensions that identify OA alternatives to toll access articles; and "request-a-copy" buttons.
Search Engines16
As is generally known, searching relies mainly on the Boolean operators AND, OR and NOT. The use of AND specifies that all the search terms must be present in the results, e.g. cloning AND ethics AND human.17 The use of OR specifies that any one of the search terms must be present in the results, e.g. cloning OR ethics OR human. The use of NOT specifies that a term must be absent from the results, e.g. cloning NOT sheep. If no operators are applied, search engines will assume AND, which has priority. The use of brackets allows sophistication of searches, e.g. ethics AND (cloning OR reproductive techniques).
Google is the most used search engine, being searched billions of times each day.18 It searches trillions of web pages; this scope is admirable but also gives rise to the problem of swamping. It is, however ,possible to refine searches with tools that are not obvious to the general user.
To give some examples, polymerization shrinkage of dental materials remains a major concern in restorative dentistry. Therefore, the assessment of their shrinkage behaviour during curing (or polymerization) is a major topic in the research and development of dental materials. Searching for shrinkage of dental materials yields 204,000 results where the search terms are all present, but not necessarily together. Phrase searching is also possible: enclosing the search terms in double quotes—"shrinkage of dental materials" yields a slightly more manageable 30,000 results.
More sophisticated searching is provided through a related (but not well publicised) Google site: https:// www.google.com/advanced_search. This site enables a number of filters of results; for instance by language, country URL (e.g. .fr) or format (e.g. pdf). More importantly, the site has filters that enable searches on a particular site (such as wikipedia.org) or domain (such as .ac.uk); the position of search terms in a page (e.g. title); usage rights (e.g. free to use or share).
Thus restricting the phrase search "shrinkage of dental materials," which yields 30,000 results, to appearances in the title of the page yields only four results. Similarly restricting by the licence condition free to use or share yields only three results.
Another member of the Google family is Google Scholar (https://scholar.google.com/intl/en-US/scholar/ help.html). Scholar purports to include "journal and conference papers, theses and dissertations, academic books, pre-prints, abstracts, technical reports and other scholarly literature from all broad areas of research." It covers not only the commercial publishers but also the majority of the academic institutional repositories, which yield many Open Access materials. Our phrase search "shrinkage of dental materials" yields 39 results. One very useful feature is the inclusion in a separate column of URLs for Open Access versions of the articles.
A word of caution: Google Scholar is not comprehensive, and is indexed by machines rather than humans. It should therefore be used in conjunction with other databases such as Web of Science or Scopus. However, it is particularly useful for those without access to the commercial databases.
Databases of OA Articles
We have already noted that Google Scholar indexes many of the individual institutional repositories.
There is also one major composite database that is the first port of call for many researchers seeking Open Access materials: BASE (Bielefeld Academic Search Engine—https://www.base-search.net/about/en/) "provides more than 100 million documents from more than 5,000 sources"; about 60 percent are Open Access. There is a high degree of quality control; for instance: resources are selected "intellectually"; "only document servers that comply with the specific requirements of academic quality and relevance are included."
Searching is sophisticated. Our standard search for "shrinkage of dental materials" yielded 505 results, of which 179 are known to be Open Access. It is possible to sort the search results, for instance by date or author, and to filter the results, for instance by subject term.
In the medical field there is PubMed Central (https:// www.ncbi.nlm.nih.gov/pmc/), "a free archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health's National Library of Medicine." In late 2017 it comprised 4.6 million articles, however none on our standard topic "shrinkage of dental materials."
There are two well known browser extensions that facilitate the discovery of (legal) OA versions of toll access articles.
Unpaywall (http://unpaywall.org/) gathers "content from thousands of open-access repositories worldwide." Once installed it works automatically; any article URL open in a browser window receives an OA (green or gold unlocked padlock) or closed access (grey locked padlock) symbol on the extreme right of the window. Clicking on the green symbol redirects to the repository. Unpaywall claims to "find full text for 50-85 percent of articles, depending on their topic and year of publication."
Open Access Button (https://openaccessbutton.org/) works slightly differently. After installation one needs to visit the required article page on the journal's website and click the OA button in the browser. A search points to any OA versions.
Request a Copy Button
A further useful feature of Open Access Button is the incorporation of a facility to request a copy from the corresponding author where no OA version has been found. Minimal information is required (identity, email address, interest…).
Many individual institutional repositories also incorporate similar features. Cambridge University, for instance, redirects to a webpage incorporating a request form if an item is embargoed. This facility has proved popular, with currently about 3000 requests a year.19
It should be noted that reference to Open Access as used above refers to alternative ways of providing access to information such as documents while still respecting copyrights where appropriate.
Significant strides have been made in, on the one hand, maintaining and developing the IP system worldwide and, on the other, in easing access to enormous amounts of scientific and engineering disclosures. It has been a process of continuous evolution rather than one of revolution and destruction. This covers both patent literature and non-patent literature. Search capabilities have also improved but in general the number of Boolean operators and filters is often limited for the free-of-charge databases. Commercial search engines fill this gap.
We may conclude therefore that there is significant and healthy growth of accessible publishing. For patent literature the publications are financed by the fees paid by the applicants. The business model for non-patent literature has been different traditionally but recently there has been a move to favour Open Access. It appears that adequate income is still being generated for non-patent literature publishers who use an Open Access strategy.
We look forward to one of the concrete benefits that can be realized by making scholarly outputs openly available being the further reduction in asymmetric information in IP licensing and sales. ■
Available at Social Science Research Network (SSRN): https://ssrn.com/abstract=3164352