Chapter

Towards Automated Identification of Technological Trajectories

Authors:
  • Federal Research Centre Computer Science and Control RAS
  • FRC CSC RAS
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The paper presents a text mining approach to identifying technological trajectories. The main problem addressed is the selection of documents related to a particular technology. These documents are needed to identify a trajectory of the technology. Two different methods were compared (based on word2vec and lexical-morphological and syntactic search). The aim of developed approach is to retrieve more information about a given technology and about technologies that could affect its development. We present the results of experiments on a dataset containing over 4.4 million of documents as a part of USPTO patent database. Self-driving car technology was chosen as an example. The result of the research shows that the developed methods are useful for automated information retrieval as the first stage of the analysis and identification of technological trajectories.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Chapter
The paper presents a text mining approach to identifying and analyzing technological trajectories. The main problem addressed is the selection of documents related to a particular technology. These documents are needed to detect a trajectory of technology. The approach includes new keyword and keyphrase detection method, word2vec embeddings-based similar document search method and fuzzy logic-based methodology for revealing technology dynamics. USPTO patent database was used for experiments. The database contains more than 4.7 million documents from 1996 to 2020. Self-driving car technology was chosen as an example. The result of the experiment shows that the developed methods are useful for effective searching and analyzing information about given technologies.
Conference Paper
Full-text available
Patent engineers are spending significant time analyzing patent claim structures to grasp the range of technology covered or to compare similar patents in the same patent family. Though patent claims are the most important section in a patent, it is hard for a human to examine them. In this paper, we propose an information-extraction-based technique to grasp the patent claim structure. We confirmed that our approach is promising through empirical evaluation of entity mention extraction and the relation extraction method. We also built a preliminary interface to visualize patent structures, compare patents, and search similar patents.
Article
Full-text available
Many challenges still remain in the processing of explicit technological knowledge documents such as patents. Given the limitations and drawbacks of the existing approaches, this research sets out to develop an improved method for searching patent databases and extracting patent information to increase the efficiency and reliability of nanotechnology patent information retrieval process and to empirically analyse patent collaboration. A tech-mining method was applied and the subsequent analysis was performed using Thomson data analyser software. The findings show that nations such as Korea and Japan are highly collaborative in sharing technological knowledge across academic and corporate organisations within their national boundaries, and China presents, in some cases, a great illustration of effective patent collaboration and co-inventorship. This study also analyses key patent strengths by country, organisation and technology.
Conference Paper
Full-text available
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Conference Paper
Biomedical literature is composed of an ever increasing number of publications in natural language. Patents are a relevant fraction of those, being important sources of information due to all the curated data from the granting process. However, their unstructured data turns the search of information a challenging task. To surpass that, Biomedical text mining (BioTM) creates methodologies to search and structure that data. Several BioTM techniques can be applied to patents. From those, Information Retrieval is the process where relevant data is obtained from collections of documents. In this work, a patent pipeline was developed and integrated into @Note2, an open-source computational framework for BioTM. This integration allows to run further BioTM tools over the patent documents, including Information Extraction processes as Named Entity Recognition or Relation Extraction.
Article
Understanding the evolution of a technological field in the course of time is a key task in technology analysis. Analysts in research institutions as well as in companies need to know which topics are relevant for the respective technological field, which are the emerging topics, which traditional topics have been deepened in the course of time and which have been abandoned. For this purpose we suggest a patent lane analysis. Patent lanes can be seen as the deployment of patent clusters in the course of time. We use a method based on semantic similarities to develop patent lanes. A case study focuses on the application of carbon fibers in bicycle technology; it is used to demonstrate our method, i.e. to establish patent lanes in this case and characterize them by multiple use of a Tf idf measure. Despite some limitations, patent lanes enable deep insights into the development of patent-friendly technological fields.
Chapter
The paper presents the system-“Exactus Expert”-search and analytical engine. The system aims to provide comprehensive tools for analysis of large-scale collections of scientific documents for experts and researchers. The system challenges many tasks, among them full-text search, search for similar documents, automatic quality assessment, term and definition extraction, results extraction and comparison, detection of scientific directions and analysis of references. These features help to aggregate information about different sides of scientific activity and can be useful for evaluation of research projects and groups. The paper discusses general architecture of the system, implemented methods of scientific publication analysis and some experimental results.
Article
Patent search is a substantial basis for many operational questions and scientometric evaluations. We consider it as a sequence of distinct stages. The “patent wide search” involves a definition of system boundaries by means of classifications and a keyword search producing a patent set with a high recall level (see Schmitz in Patentinformetrie: Analyse und Verdichtung von technischen Schutzrechtsinformationen, DGI, Frankfurt (Main), 2010 with an overview of searchable patent meta data). In this set of patents a “patent near search” takes place, producing a patent set with high(er) precision. Hence, the question arises how the researcher has to operate within this patent set to efficiently identify patents that contain paraphrased descriptions of the sought inventive elements in contextual information and whether this produces different results compared to a conventional search. We present a semiautomatic iterative method for the identification of such patents, based on semantic similarity. In order to test our method we generate an initial dataset in the course of a patent wide search. This dataset is then analyzed by means of the semiautomatic iterative method as well as by an alternative method emulating the conventional process of keyword refinement. It thus becomes obvious that both methods have their particular “raison d’être”, and that the semiautomatic iterative method seems to be able to support a conventional patent search very effectively.
Article
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.
Article
The procedures and the nature of “technologies” are suggested to be broadly similar to those which characterize “science”. In particular, there appear to be “technological paradigms” (or research programmes) performing a similar role to “scientific paradigms” (or research programmes). The model tries to account for both continuous changes and discontinuities in technological innovation. Continuous changes are often related to progress along a technological trajectory defined by a technological paradigm, while discontinuities are associated with the emergence of a new paradigm. One-directional explanations of the innovative process, and in particular those assuming “the market” as the prime mover, are inadequate to explain the emergence of new technological paradigms. The origin of the latter stems from the interplay between scientific advances, economic factors, institutional variables, and unsolved difficulties on established technological paths. The model tries to establish a sufficiently general framework which accounts for all these factors and to define the process of selection of new technological paradigms among a greater set of notionally possible ones.The history of a technology is contextual to the history of the industrial structures associated with that technology. The emergence of a new paradigm is often related to new “schumpeterian” companies, while its establishment often shows also a process of oligopolistic stabilization.
Article
We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields. © 2010 Wiley Periodicals, Inc.
Semantic-syntactic analysis of natural languages. Part II. Method for semantic-syntactic analysis of texts
  • I V Smirnov
Exactus patent-sistema patentnogo poiska i analiza (Exactus Patent-patent search and analysis system)
  • G S Osipov
Opredelenie svyazannosti nauchno-technicheskikh dokumentov na osnove kharakteristiki tematicheskoy znachimosti (Determination of the connectedness of scientific and technical documents based on the characteristics of thematic significance)
  • R E Suvorov
  • I V Sochenkov
Metod sravneniya textov dlya resheniya poiskovo-analiticheskikh zadatch (Text comparison method for solving search and analytical tasks)
  • I V Sochenkov
Servisy polnotekstovogo poiska v informacionno-analiticheskoy sisteme (chast 1) (Full-text search services in the information and analytical system)
  • I V Sochenkov
  • R E Suvorov