Cutting-edge technologies being adopted in new sectors (automotive, health, security, etc.), thanks to the Internet of Things (IoT) and various standards, are offering unique experiences to users and economic benefits to manufacturers. However, inexperienced licensees who enter license negotiations on patents required for standardized products and services struggle in understanding the value of the patent portfolio they are looking to license. This has created a market for computer platforms offering automated patent valuation. While these tools can provide an overview of a patent portfolio, it is important that both users and service-providers be aware of the drawbacks and limitations of these automated methods. This paper analyzes the most common factors used by popular platforms and correlates them with the existing evidence about their reliability as value indicators.
Companies from different sectors (automotive, construction, health, agriculture, security and more) are enhancing their products and services with standardized cutting-edge technologies. Nowadays, thanks to cellular standards, consumers benefit from products with high reliability, low latency and high-speed connectivity, which allows cars to respond to a crash situation automatically,2 a fridge to order food directly from the supermarket, smart meters to help to reduce energy consumption, and so on. As a result, some companies can significantly increase their revenues. For instance, "the total revenue from connectivity-enabled products and services in the automotive sector was calculated to grow from $223 billion in 2018 to $483 billion in 2023 for a subset of existing revenue pools, with forecasts predicting as much as $2 trillion by 2030."3 Moreover, companies can enjoy additional benefits in their production methods, thanks to smart transportation and smart factories.4 These cellular standards are typically protected by patents (so-called standard essential patents, or SEPs) that are generally licensed on Fair, Reasonable and Non-Discriminatory (FRAND) terms and conditions. In addition to including SEPs, innovative products are increasingly becoming more complex via incorporating non-essential patents. Therefore, inexperienced licensees struggle in understanding the value of patent portfolios for which they may need a license. This has created a market for computer platforms offering automated patent valuation.
Indeed, companies and individuals invest heavily to evaluate patents. One of the approaches is to use data analysis or analytics, which is the automated processing of large quantities of information, with the intention to find relevant knowledge that can be useful for decision-making.5 Nowadays, data analysis is frequently used in fields such as healthcare, security, marketing and even politics. Also, analytics has been applied in online platforms to assist in assessing the value of intellectual property (IP) assets. In the case of patents, the platforms offer a wide variety of functionalities that range from "smart" patent searches to patent valuation, rankings, landscaping, classification and more. The results are targeted to decision-makers along the value chain of inventions who need information for the licensing of technology, management of R&D, patent monetization and investment strategy. The analysis is performed on pieces of information that can be obtained in an automated manner from patent databases and online sources. These pieces of patent information are called factors.
Despite being quicker and less expensive than the analysis performed by human experts, the information resulting from applying an automated analysis is far from perfect, often overlooking relevant aspects necessary to properly assess the value of a patent. In fact, although the algorithms for automated statistical analysis have experienced a fast pace of development in recent years, their implementation in the legal field has been limited, mainly because it requires legal knowledge and other forms of advanced abstract thinking.6 That said, automated analysis is known to have been used as input to the policy-making process regarding issues of patent and standardization policy.7 For example, the European Commission and the German Ministry of Economy commissioned studies on patent landscaping, which were elaborated using automated platforms. For inexperienced stakeholders, there is the risk that this is interpreted as an implied approval by governmental entities to the reliability of this methodology, leading to these stakeholders relying on these platforms for their licensing negotiations (to assess the strength and value of patent portfolios), without being aware of the drawbacks and limitations. This paper analyzes the most common factors used by popular platforms and correlates them with the existing evidence about their reliability as value indicators.
To better address the performance of IP valuation platforms, the first important question to be answered is: What data do these platforms analyze? All the information contained in a patent is publicly accessible, but this does not mean that the information is readily available for an algorithm to analyze. Some pieces of data are easily obtainable and classifiable while others require interpretation of the written language. We often hear terms such as keyword analysis, text mining and big data: they all refer to the acquisition of useful and measurable information from a source that is not easy to process for a machine. Several approaches have been generated as an attempt to make the automated platforms more capable of understanding human language.8 Nevertheless, there are still substantial challenges that need to be overcome.
The second question is: Are the analyzed factors truly indicators of patent value? The term high-quality patent may have a different meaning for someone focused in R&D than for a licensing or a litigation expert. It is therefore important to define what is considered valuable, and how to determine the quality of a patent. It is possible to examine the underlying assumptions about the factors, even when the platforms usually maintain secrecy about how exactly the information is processed.9
There are several obstacles to overcome when a computer tries to analyze a patent, such as acknowledging that patents are, by definition, technically unique and shaped by the peculiarities of jurisdiction-specific laws and regulations. The results obtained from such a heterogeneous source will be impacted by the parameters used.
For example, when more and more parameters for analysis are added, there is a risk that an algorithm that corresponds precisely to the data set that is being used as the training model will be produced, but that it does not work with other data sets. Algorithms are typically developed (or trained)10 with the intention to generate accurate results over a wider data population, and its performance is measured in a test with a control data set. Eventually, the algorithm is able to accurately pinpoint the "correct" patents within the control data set, but when faced with a different group of patents it fails to identify any "correct" patents at all. This happens because the program was created to identify details and nuances specific to the control test, but which are not general rules for the identification of correctness. This phenomenon is known as overfitting and is often observed when analyzing data sets that were created with objectives different than software analysis, such as patent databases.11 The opposite effect, underfitting, can also happen, causing the algorithm to fail in the obtention of the desired results. This is usually the result of an oversimplified model that leaves out important parameters in the analysis.
Obstacles are also faced in the rationale behind the processing, e.g., even when the computing works well, the theoretical assumptions behind it might be inaccurate. For instance, when analyzing the number of patents cited as prior art in a patent document (backward citations), some may argue that a high number reflects a high value of the patent because it reflects that the invention is closely related to developed technology and would be easy to implement. Others may assume, however, that a low number is a better indicator of value because it reflects less competition in the market for that technology. Depending on which assumptions are used to create the formula, the results will change drastically. In other words, the mathematical part of the model may correctly classify patents according to the rules implemented, but that classification only reflects the hypotheses about the factors. Factors are frequently presented to operate as direct indicators of value (or lack of it) based on certain simplifying, and sometimes undisclosed, assumptions. The relation between a factor and the value of a patent is usually much more complicated.
In an attempt to allow computers to solve the problems without dealing with the challenges of human language, the platforms reduce the information to numerical values collected through the patent factors. The intention is to break down the information from a patent into a group of less complex pieces of information, making it easier for a computer to analyze them.12 Several different rules and statistical calculations are created using theoretical arguments and then computed for each one of the factors. As a result, a computer can relatively quickly analyze hundreds or thousands of patents.13 The following factors are the most commonly used and studied.
The backward citations factor refers to the number of patents cited as prior art in a patent document. These citations are often used with the intention to assess the patentability and legitimacy of the patent claims.14
Some argue that if a patent contains a high number of backward citations, this means that the applicant has conducted a more extensive prior art search and thus it is an indication that he is being meticulous because he considers the patent to be valuable.15 However, some platforms suggest that the opposite is true: less citations would indicate that the invention was created in a "white-space" of technology instead of a densely populated area, therefore having less competition and more possibilities to dominate the market, making it more valuable.16 Certain empirical studies have found that this factor produces contradictory and ambiguous results, leading to the conclusion that it is not a reliable indicator of value.17
Remarkably, despite the empirical studies finding a lack of reliability, the use of this factor is broadly applied by IP valuation platforms. This may be due to the simplicity of extracting relevant information from databases and its easy manipulation via algorithms.
The citations of non-patent literature factor counts the number of scientific sources cited as prior art for a patent. It has been proposed that, for patents, a high number suggests closeness of the invention to fundamental knowledge, and that it may be an indicator of interaction between scientific developments and industrial technology.18
Empirical tests have found that this factor is difficult to implement because most of the patents do not cite non-patent literature, and when they do, they vary heavily depending on the field and the jurisdiction.19 Moreover, the occurrence of non-patent citations is so small that it cannot significantly be tested, therefore this factor has not yet been established as a reliable source.
Some argue that a patent receiving an elevated number of citations from later patents signals the importance of the technology because the patent is considered to be the basis for more inventions.20 This would imply that the industry is highly interested in the teachings of the patent.
Forward self-citations, i.e., those made by the same patent holder, have received polarized opinions. Some say that they are a good indicator of value because they suggest that the invention is a continuation on the development of a company's technology.21 However, others argue that a high ratio of self-citations, when compared to external citations, indicates an over-enthusiasm or over-investment in the invention.22 Some others even completely exclude self-citations from the score, suggesting that they are not valuable.23
In the United States, reliance on forward citations—among other factors—to establish a reasonable royalty rate in patent infringement litigation has been an accepted practice by several district courts.24 That said, this generally congenial treatment of forward citations as court evidence was not embraced in the recent ruling by the District Court for the Central District of California, in TCL v. Ericsson.25 In TCL, the court, although recognizing the "economic logic" of relying on forward citations, went on to dismiss them as a relevant indicator of the value of standard-essential patents.26
One known bias in the application of this factor is that a patent that has been published for a longer period of time will have an opportunity to be cited more than a more recently published patent, tending to overinflate the score of earlier-published patents and rendering the factor unusable for the most recent innovations. Usually, it is necessary to wait five years or more after the publication of a patent for reliable results to be obtained.27 Another known bias is that forward citations are more common in some jurisdictions, like the United States, than in others, tending to overinflate the score of U.S.-heavy patent portfolios. Moreover, while certain studies conclude that the most valuable patents will show a high deviation from average, there is little empirical evidence establishing how much more valuable those patents actually are.28
Some believe that a high number of claims provides more protection to the invention.29 Some patent offices charge the applicants differently depending on the number of claims, and this has been assumed to indicate that a higher number of claims equals a higher value of the patent.30 However, as practitioners have pointed out, the cost per claim is relatively low and is therefore a weak indicator of value.31
Another approach to analyzing the claims is to count the number of words in them, deducing that a low number of words in the shortest independent claim32 is an indicator of a broader legal scope.33
While it is true that the legal breadth of the claims is a relevant aspect of value, this legal breadth is not reflected in the number of claims or their word count. Studies have established only that patents with more claims are more likely to be involved in lawsuits.34 Furthermore, patent attorneys also report that in-house guidelines and drafting style have a bigger influence on the final number and length of the claims than value or scope.
Some platforms35 include this factor, which refers to patents protecting the technology needed to implement a standard, i.e., SEPs. However, what the platforms frequently identify as "essential" are those patents and patent applications that their owners have declared as potentially essential towards a Standard Development Organization (SDO). This means that patents and patent applications that, to their best knowledge, are or might become essential to the standard. The purpose of the declarations is not to define essentiality but rather to ensure that, in case any of these patents (or patent applications) ever become SEPs, they will be available on FRAND terms and conditions.36 In practice, a large number of those potentially essential patents and patent applications will not become SEPs. For instance, (1) a patent application may not be granted, or (2) may be granted with amendments such that the granted patent is no longer essential, (3) the patent may be revoked or (4) invalidated. Furthermore, (5) a company may be incentivized to declare patents (or patent applications) "essential" as soon as any probabilities of them becoming essential appear. This happens for a variety of reasons, including to increase the perception of the company holding more essential patents.
At this time, a rigorous essentiality determination cannot be assessed by algorithms. Instead, it is still rather a highly complex exercise which results from tens of hours of in-depth internal review by engineers and specialized lawyers.37 Since the platforms do not collect SEPs but only potential SEPs, the factor is currently implemented by analytics platforms in a highly unreliable manner.
A patent family is the name given to a set of patents that protect a single invention, based on the number of granted or pending applications that share a common priority application. However, there are some differences on how it may be counted. For example, multiple patents in the same country may share the same priority but differ in the technical content, as is the case with some divisional applications and continuations-in-part.38 Some sources count "family size" as the number of countries where protection is available (each country counts only once even when it has several applications claiming the same priority). Other sources count the number of patent applications (each patent application counts, including divisionals and continuations, even when they are in the same country).
A big patent family may suggest that the applicant is more interested in the technology or thinks it will be broadly adopted.39 Filing a patent in several offices is an expensive process, so this factor also suggests that the applicant expects the patent to be valuable. Studies have found this factor to be a good indication of there being value, especially when the applicant is an expert in the technology and broad adoption of the technology can be predicted before wide filing decisions are made.40 A large family size implies additional filing costs, delays, translation costs, etc., that may be cumbersome. Thus, owners will be incentivized to engage in the process only if they consider that it will be a good investment.41
Some platforms suggest that using the PCT system42 indicates value, because it would imply that the applicant will seek protection in a very large number of countries.43 However, in the few studies available, there has not been shown to be a direct correlation.44 As patent professionals and existing literature point out, there is a wide range of reasons to use the PCT, which are not necessarily related to the perceived value of a patent.45
Family size is a factor that is easy to extract from most patent databases and, in general, does seem to provide an indication of value. Still, it is very important that the user knows which method is used by the platforms for the calculation of this factor.
Grant-lag means the number of days elapsed between the filing and the granting of a patent. Some believe that a short grant-lag indicates that the patent received a high level of attention by the applicant, who may have been meticulous in the preparation of the documents or may have used any available fast-track examination procedures, a treatment which would only be given to the most valuable patents.46
Others argue that a long grant-lag is better, since a longer prosecution could be a sign that the applicant has made a bigger effort and a higher investment by engaging in a more cumbersome or more expensive procedure, and remained persistent because he perceived the patent as valuable.47
In practice, a patent may be granted quickly because the claims were too narrow, while controversial claims often lead to slower grants, albeit with a broader scope. Also, there are usually big differences in the prosecution times between patent offices, and even between departments within the same office. There is no evidence that this factor influences the value of a patent.
This factor identifies whether the patent is granted, pending or abandoned. This provides strong information on the ability to enforce the patent rights now or in the future. An abandoned application loses all its value, while granted and pending patent applications work as deterrents to the use of the invention by competitors.48
This factor is closely related to the payment of maintenance fees.49 More specifically, the number of years for which the fees have been paid. If the maintenance fees are not paid, the patent is abandoned and loses its value. Patents that are active may be valuable and therefore are subject to a full analysis, inactive patents are generally not.
The legal status of a patent is useful because it is the basis for other factors, such as family size or essentiality. If the legal status is not up to date, other factors will lose significance or provide inaccurate information. Moreover, empirical data has shown that when a patent is granted, the perceived value of the invention increases.50 However, it has been suggested that, in order to use it as an indicator, the focus should be on predicting the behavior of the patent holder; that is, to predict the chances that maintenance fees will be paid in the future.51
This factor specifies whether the patent has been licensed. The fact that the patent has generated or is generating income is a clear sign of value.52 One challenge with using this factor as a value indicator comes from the accessibility of the information, since licenses are not usually publicly disclosed.
Due to the limited availability of the relevant information, the use of this factor has been addressed only in academic literature, not implemented by the platforms.53 Moreover, not only the number of licensees but in particular the revenue generated by the patents, which is often not known, may be needed to properly determine value. Therefore, when information about licenses is at hand, it can provide a lot of information about the value of a patent portfolio, but it is unrealistic to try to apply this factor within an automated valuation method.
Opposition and litigation procedures have been linked to the value of the patent,54 as the patent owner may wish to stop the unauthorized use of the invention or a manufacturer may fear that the patent locks a certain part of the market. Both procedures are expensive and require a big effort, so they are only used if there is a threat to someone's business activities.55
In general, the fact that a patent survived opposition/litigation can be a reliable indicator of value. The complication is that there is only a limited amount of data related to this factor because, in practice, very few patents are opposed or litigated.56 For the case of litigation, an additional complication is that the case-level information from the courts is not always available. Even when the information about such procedures is publicly available, it requires a significant effort to compile that information into a useful database.57
There are several difficulties when compiling data about this factor, but a proper assessment can provide insight on a patent's value. As explained above, the mere fact that a patent survived such procedures can already indicate value, even with amendments. But it shoud be kept in mind that the absence of opposition or litigation does not indicate low value, per se.
III. K. Ownership
This factor rates a patent based on who the holder is. This has been assessed in several different forms. For instance, one way to do so is to calculate a score depending on the size of the patent holder under the assumption that large companies invest more in their technologies and therefore produce more valuable inventions.58 Exceptions to this assumption are not hard to find because there are several valuable innovations that have been invented by startups, or even lone inventors.
Academics have approached this topic from different points of view, and there are two general observations:
First, that small and medium enterprises (SMEs) have less resources for IP protection than large companies; and second, that SMEs are more likely to use patents for reputation purposes (i.e., to increase company value or to convince investors about the value of a technology).59 Since SMEs need to maximize the allocation of their limited resources, they must pay special attention to limit their patenting efforts to only the most valuable inventions. It is therefore assumed that the quantity of patents assigned to SMEs is lower, but the average value of those patents is higher than in large firms,60 which contradicts the assumption that large firms have more valuable patents.
The empirical evidence available regarding this factor is limited, but it has found that SMEs are more active in the patenting of emerging technologies and their patents are less often refused by patent offices. However, patents filed by big companies receive more forward citations and have, on average, a bigger family size.61 No correlation to value has been established.
This factor represents the number of days left until the patent lapses. Patents typically have a limited life of 20 years from the filing date, after which the exclusivity to the invention disappears. Assuming that all other factors and conditions are equal, would a buyer pay less for a patent that is near the end of its term than for the same patent if it was at the beginning? Probably yes, because the patent at the beginning of its term has a longer time for it to potentially be infringed. However, the relation between value and term should not be oversimplified. The value of the patent may change over time for other reasons.
There are no relevant economic studies about this factor, probably because it is difficult to isolate the effect of the term on the value of patents. As such, it is not possible to state the potential usefulness of this factor based on external studies.
Nevertheless, this factor is very easy to calculate from the information that is available in the patent databases and is usually included in the platforms. The information may be complemented by another factor: legal status. The composite of the two provides basic useful information about a patent, but not much about its value.
Technological scope refers to the technological breadth of the invention protected by the patent. Patent offices classify applications depending on their technological field to make searching easier. Currently, the most widely used system is the International Patent Classification (IPC), and therefore this is the one commonly employed to calculate this factor.62 All the reviewed platforms count this factor as the number of distinct four-digit IPC subclasses in which a patent was classified.
Some believe that inventions with a wider area of application are more valuable, and that this can be measured by the IPC technical classes.63 The reasoning is that a patent that is classified in more subclasses has a wider applicability and could potentially cover a wider market, while a narrow scope would mean that the invention is a solution for a specific technology and would cover a smaller portion of the market. On the other hand, a downside of a broad scope is that it becomes easier to find prior art that affects the novelty or inventive step, therefore increasing the chances of rejection or invalidation. Moreover, a narrower, more specialized patent may have better technical performance than a broad patent; in this case the value of the patent would be higher for the buyer/licensee.
Additionally, some suggest that specific technology fields are more valuable than others because patents in those fields are licensed more often.64 However, even if this is true, this determination only works when comparing technologies in different fields and not for comparison of patents within the same technical field.
The results from empirical tests have been inconclusive so far: the correlation has not been found to be significant, and the studies have limited scope.65
However, the information is relatively easy to extract from patent databases, so there are ongoing attempts to use it.
With the incorporation of standardized connectivity, stakeholders from different sectors are now faced with the challenge of determining the value of patent portfolios of information and communications technologies. Without the necessary experience, there is a risk that some of them rely on automated valuation systems. When analyzing these platforms, evidence shows that some of the factors are useful for patent valuation in specific contexts, while others are not. The circumstances on which they work best are variable. For some of them, the information is difficult to extract from the patent databases, and for others the information is nonexistent for long periods of time. A summary of the analysis can be seen in Table 1 below.
Currently, most of the platforms use a weighted sum approach. Each factor's score gets multiplied by a "weight" value based on its importance, and then the results are summed up for a final score. A relevant question to ask is How much weight should be given to each factor? Each platform provides a different answer to this question, and none of them has been established as consistently reliable.
Unfortunately, there is no factor that is reliable for all cases. Even those with the strongest connection to value suffer from issues that prevent their applicability in certain circumstances. This variation in the reliability of the factors makes human involvement necessary for a trustable result of the process. Moreover, users should know that a platform applying a higher number of factors does not necessarily provide the most accurate results. For example, if weight is placed on unreliable factors or the wrong algorithm is used, the results will be misleading.
As users, the best way to assess the quality of a platform is to check (1) whether the factors used are reliable as indicators of value, and (2) how the information is interpreted. This means understanding how the platform weights each of the factors in the first instance (i.e., how relevant they consider each of the factors to be towards the final score), and how the algorithms actually compute the scores (e.g., are the algorithms used to filter information in order to normalize the datasets). Therefore, transparency regarding the methodologies for the calculation of the factors, as well as to what are the exact definitions that these platforms are using, is a must. Unfortunately, the platforms are typically secretive on the way they calculate their scores.
Given the uncertainties, the results reached by existing platforms should be carefully scrutinized. This way, the recently increased risk of relying on conclusions produced by platforms without the appropriate know-how can be reduced. For instance, a recent report66 and articles published by IPLytics about 5G have generated controversy due to a wrong application of the factor of essentiality.67 In both cases, the focus was placed solely on the patent holders's declaration towards the relevant SDO of potential essential patents, whereas the actual (technical) essentiality of the patents with respect to the practice of the standard involved was not considered.68 This has led to major inaccurate results.
These concerns are all the more relevant in the context of patent litigation, whereby courts are called upon to reach a decision on the economic value of a patent infringed, a decision that is binding upon the parties and producing precedence regarding the value of a given patent or set of patents. Evidence of patent value relying exclusively on automated analyses should be treated by judges cautiously in their forming of judicial opinion. In short, automated analysis cannot, by itself, substitute for human expertise in either patent licensing/litigation or policy-making. ■
Available at Social Science Research Network (SSRN): https://ssrn.com/abstract=3658631