ArticlePDF Available

A methodology for technology trend monitoring: the case of semantic technologies

Authors:
  • Federal Institute of Industrial Property Russia

Abstract

This paper introduces a systematic technology trend monitoring (TTM) methodology based on an analysis of bibliometric data. Among the key premises for developing a methodology are: (1) the increasing number of data sources addressing different phases of the STI development, and thus requiring a more holistic and integrated analysis; (2) the need for more customized clustering approaches particularly for the purpose of identifying trends; and (3) augmenting the policy impact of trends through gathering future-oriented intelligence on emerging developments and potential disruptive changes. Thus, the TTM methodology developed combines and jointly analyzes different datasets to gain intelligence to cover different phases of the technological evolution starting from the ‘emergence’ of a technology towards ‘supporting’ and ‘solution’ applications and more ‘practical’ business and market-oriented uses. Furthermore, the study presents a new algorithm for data clustering in order to overcome the weaknesses of readily available clusterization tools for the purpose of identifying technology trends. The present study places the TTM activities into a wider policy context to make use of the outcomes for the purpose of Science, Technology and Innovation policy formulation, and R&D strategy making processes. The methodology developed is demonstrated in the domain of “semantic technologies”.
1 23
Scientometrics
An International Journal for all
Quantitative Aspects of the Science of
Science, Communication in Science and
Science Policy
ISSN 0138-9130
Volume 108
Number 3
Scientometrics (2016) 108:1013-1041
DOI 10.1007/s11192-016-2024-0
A methodology for technology trend
monitoring: the case of semantic
technologies
Oleg Ena, Nadezhda Mikova, Ozcan
Saritas & Anna Sokolova
1 23
Your article is protected by copyright and
all rights are held exclusively by Akadémiai
Kiadó, Budapest, Hungary. This e-offprint is
for personal use only and shall not be self-
archived in electronic repositories. If you wish
to self-archive your article, please use the
accepted manuscript version for posting on
your own website. You may further deposit
the accepted manuscript version in any
repository, provided it is only made publicly
available 12 months after official publication
or later and provided acknowledgement is
given to the original source of publication
and a link is inserted to the published article
on Springer's website. The link must be
accompanied by the following text: "The final
publication is available at link.springer.com”.
A methodology for technology trend monitoring:
the case of semantic technologies
Oleg Ena
1
Nadezhda Mikova
1
Ozcan Saritas
1
Anna Sokolova
1
Received: 24 February 2015 / Published online: 25 June 2016
ÓAkade
´miai Kiado
´, Budapest, Hungary 2016
Abstract This paper introduces a systematic technology trend monitoring (TTM)
methodology based on an analysis of bibliometric data. Among the key premises for
developing a methodology are: (1) the increasing number of data sources addressing dif-
ferent phases of the STI development, and thus requiring a more holistic and integrated
analysis; (2) the need for more customized clustering approaches particularly for the pur-
pose of identifying trends; and (3) augmenting the policy impact of trends through gathering
future-oriented intelligence on emerging developments and potential disruptive changes.
Thus, the TTM methodology developed combines and jointly analyzes different datasets to
gain intelligence to cover different phases of the technological evolution starting from the
‘emergence’ of a technology towards ‘supporting’ and ‘solution’ applications and more
‘practical’ business and market-oriented uses. Furthermore, the study presents a new
algorithm for data clustering in order to overcome the weaknesses of readily available
clusterization tools for the purpose of identifying technology trends. The present study
places the TTM activities into a wider policy context to make use of the outcomes for the
purpose of Science, Technology and Innovation policy formulation, and R&D strategy
making processes. The methodology developed is demonstrated in the domain of ‘‘semantic
technologies’’.
Oleg Ena, Nadezhda Mikova, Ozcan Saritas and Anna Sokolovaequal have contributed equally to this paper.
&Anna Sokolova
avsokolova@hse.ru
Oleg Ena
ovena@hse.ru
Nadezhda Mikova
nmikova@hse.ru
Ozcan Saritas
osaritas@hse.ru
1
National Research University Higher School of Economics, Moscow, Russian Federation
123
Scientometrics (2016) 108:1013–1041
DOI 10.1007/s11192-016-2024-0
Author's personal copy
Keywords Trend monitoring Bibliometrics Technology mining Foresight Semantic
technologies Russia
Introduction
Recent years have witnessed major advancements in science, technology and innovation
(STI). The new global context suggests an increased financial, trade and investment flows
leading to a more interconnected and interdependent world, which is accelerated by rapid
technological progress in areas such as ICTs, biotechnologies, fuel cells and nanotech-
nologies. Meanwhile, severe social and economic instabilities have been witnessed due to
the economic recession, lack of fresh water, food, and energy supply, climate change,
regional conflicts, and respective population movements. In such a rapidly changing
complex environment with full of opportunities and threats, it becomes crucial to identify
emerging trends as the weak signals of potential changes and indicators of future shocks
and surprises in the form of wild cards. A number of studies have been undertaken by a
wide variety of institutions for the purpose of identifying and monitoring trends. These
involved:
Large international organizations including the European Commission (Mrakotsky-
Kolm and Soderlind 2009), Organisation for Economic Co-operation and Development
(OECD 2007), International Telecommunications Union (ITU 2014), and International
Energy Agency (IEA 2013).
National research centers including RAND (Anto
´n et al. 2001; Silberglitt et al. 2006),
US National Research Council (NRC 2005), US Office of Naval Research (ONR 2014)
and UK Government Office for Science (2010).
Universities and research institutions including Manchester Institute of Innovation
Research (iKNOW 2014), National Institute of Science and Technology Policy (NISTEP
2010), Fraunhofer Institute for Systems and Innovation Research (Fraunhofer ISI 2014).
Corporations including Shell (2007), IBM (2013) and Microsoft-Fujitsu (2011).
Consulting companies including Battelle (2014), Gartner (2013), Z-Punkt (2014), Lux
Research (2014), Deloitte (2012), TechCast (2014) and TrendHunter (2014).
TTM has been frequently used by inter-governmental organisations to monitor global
technology trends and to set international standards by the private sector to develop cor-
porate Research and Development (R&D) strategies; by consultancy companies to provide
technology intelligence to their clients and by research and academic institutions to keep
track of S&T advancements, identifying new research topics and collaboration networks.
For instance, ITU has launched a Technology Watch programme with the major objective
of monitoring rapidly changing ICTs, providing international standards in the domain, and
tackling issues related to the global ICT development (ITU 2014). The US Navy’s Global
Technology Watch aims at developing a wide variety of approaches, which are capable of
identifying the key technological trends in the defence sector (Kostoff 1999,2003).
Majority of these activities aim to utilize technology trend monitoring (TTM) for collecting
and analysing data to explore trends and provide early indications of potential changes and
developments for more anticipatory policy and strategy making. The anticipatory intelli-
gence gathered through the TTM work provide public STI policy makers and private R&D
strategy developers, with tools for prioritising potential opportunities and threats and
1014 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
allocating resources to increase the ability to capitalise on, protect against, or mitigate the
impact of potential disruption.
The process typically involves collection and analysis of data from structured and
unstructured sources to extract ‘hidden patterns’, which may be indications of emerging
trends and developments. A wide variety of approaches has been used for this purpose.
These range from simple keyword searches to more sophisticated applications of scien-
tometrics and technology mining using qualitative and quantitative methods. Among the
qualitative methods used for trend monitoring are literature review, scenarios, expert
panels, interviews and others. For example, key methods used in the RAND studies
‘Global technology revolution’ were literature review (foresights, articles, outlooks and
S&T journals), assessment of real progress in research and development (R&D) as well as
investment attractiveness, interviews with RAND experts and gathering collective opinions
on S&T development from a broad spectrum of individuals (Anto
´n et al. 2001). More
recently, the advancement of computing technologies allow the use of quantitative methods
through the analysis of large amounts of data. Amongst the most intensively discussed and
developed quantitative methods are (1) ‘bibliometric/scientometric analysis’ (e.g. Chao
et al. 2007; Chen 2006; Cobo et al. 2011; Guo et al. 2011; Kajikawa et al. 2008; Kostoff
et al. 2008; Morris et al. 2002; Porter and Cunningham 2005; Shibata et al. 2008; Smal-
heiser 2001; Upham and Small 2010); (2) ‘patent analysis’ (e.g. Abraham and Moitra 2001;
Campbell 1983; Corrocher et al. 2003; Dereli and Durmusoglu 2009; Kim et al.
2009,2008; Lee et al. 2009,2011; Porter and Cunningham 2005; Tseng et al. 2007; Wang
et al. 2010; Yoon and Park 2004). As the internet has become an important source of
information methods for web-based information retrieval are also developed by a number
of researchers (Palomino et al. 2013).
Although, technological trend monitoring (TTM) has been acclaimed by a number of
leading policy and strategy makers, and methodologies have been developed particularly in
the last 15 years with an increasing use of information technologies, there is still room for
improvement, particularly, in the use of data sources, methodology for data analysis, and
the positioning the TTM in the policy and strategy making process.
First of all, the analysis of literature reveals that most of the work in the TTM field
draws upon the analysis of the publication and patent databases—and in most cases either
of them. However, the analysis of sole publications and patents is not enough to understand
the full cycle of technology development pathways. At present, the data sources are so rich
and diverse. Besides publications and patents, for example, Foresight reports, conference
and business presentations, social networks may provide equally important information for
understanding rapid technological changes. All these different sources may provide
valuable input for addressing different phases of the STI development. Some of these
sources may provide evidence on the ‘supply’ side of technologies, such as academic and
scientific publications, where information regarding R&D and early stages of technological
development can be derived. Similarly, the ‘demand’ side dynamics may be captured in
other information sources, such as academic and sectoral conferences and presentations.
The integrated use of the data sources will certainly give a more complete picture of the
emergence and evolution of STI developments.
The use of diverse data sources in a complementary way require more flexible and
adaptive data analysis and clustering approaches. Most of the tools used for clustering at
present generate clusters that are similar to domain hierarchy levels, which does not allow
for the identification of trends, but only generate broad labels such as ‘energy sector’,
‘shipbuilding’, and ‘aerospace’. Therefore, the present study posits that there is a need for
more customized clustering approach particularly for the purpose of identifying trends.
Scientometrics (2016) 108:1013–1041 1015
123
Author's personal copy
Finally, it is important to ensure that the TTM work generates impactful results for
policy and strategy making. The present study strives to design a process to explicitly link
TTM into policies and strategies within an explicit and systematic framework. Current
work in this field, and the methodologies developed consider TTM as a standing alone
activity and generate reports on outputs without a systematic translation of the technology
intelligence into policy and strategy. The proposed TTM process aims at augmenting the
policy impact of TTM. A range of information sources are analysed to extract a number of
established and emerging new technologies and technology application areas, along with
their individual, institutional, geographical and temporal attributions. Then, the relation-
ships between the technologies and application areas are revealed. It is through the sys-
temic relationships, technology clusters are generated, which then lead to the identification
of the technology trends by studying the dynamics of technologies over time. The trends
are described in detail by using a set of parameters, which indicate future opportunities and
threats for policy makers, corporations, research bodies and other potential users. For the
purpose of demonstration, the present study focuses on one of the Russian STI priorities,
Information and Communication Technologies, and more specifically on ‘Semantic
technologies’.
Thus, ‘Theoretical background’ section of the paper begins with the review of TTM
and presents examples of the studies conducted in the field. This will allow benchmarking
the proposed approach with other similar efforts. ‘Methodology’ describes the method-
ology across its five stages and introduces the use of existing and newly developed tools.
The results of the study conducted in the area of semantic technologies, and how the
overall TTM work is integrated into STI policy making for Russia will be presented in
‘‘ ‘Semantic technologies’ case study’ section. The paper will be concluded with lessons
learned and questions for further research in the last section.
Theoretical background
A technology trend can be considered as a continuously growing technology area with a
certain pattern. In order to identify the pattern as a trend it should have existed for a certain
period of time, usually about 5 years, with a good prospect of continuing its development
in the future to cover next 5–10 years or beyond. During the TTM work, scientists have to
deal with both structured and unstructured data. Various methods and approaches have
been developed for this purpose. Besides sole biblimetric/scientometric analysis of patents
and publications, approaches have been developed, for instance, to investigate emerging
research fronts on the basis of citation networks of scientific publications and patents in
order to discover uncommercialized gaps by comparing them (Shibata et al. 2010,2011).
Daim et al. (2006) propose a methodology for forecasting emerging technologies by
identifying hidden patterns and trends. This study complements patent and bibliometric
analysis with methods of forecasting such as scenario development, analysis of growth
curves, analogies, etc. Further, analysis of textual data has become possible with the
introduction of the methods of text mining, which have become popular for handling large
amounts of documents (Lee et al. 2009). Kostoff et al. (1997) proposed a database
tomography for information retrieval as an analytical system to work with large databases.
Kostoff et al. (2004,2008) develop a systematic approach with the aim of identifying
disruptive technology roadmaps by using literature-based discovery process.
1016 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
Table 1 A comparative analysis of TTM studies
Study Aim and trend type Coverage (subject
area)
Methodology
stages
Methods Information
sources
Type of result Integration into STI policy
(stakeholders)
Porter
(2005)
To develop an approach to
quick data processing for
monitoring emerging
technologies
Trend type: emerging
technologies
‘Solid oxide fuel
cells’
Spell out the
focal questions
and decide
how to answer
them
Get suitable data
Search (iterate)
Import into text
mining
software
(VantagePoint)
Clean the data
Analyze and
interpret
Represent the
information
Standardize and
semi-automate
where possible
Bibliometric
analysis
Patent
analysis
Expert
procedures
Patent
databases
(Derwent
World
Patent
Index)
Bibliometric
databases
(ISI WOK)
Web
(research
institution
web-sites)
2 main clusters:
Nano-surfaces
Rare-earth materials
For all players:
Information providers
Information professionals
Technology analysts
Researchers,
technologists, and
managers
Decision-makers
Kostoff
et al.
(2008)
To propose a generic
methodology for
identifying potential
discovery candidates
Trend type: potential
discoveries
‘Raynaud’s
phenomenon
(RP)’’,
‘Cataracts’’,
‘Parkinson’s
disease (PD)’
‘Multiple
sclerosis (MS)’
‘Water
purification
(WP)’
Retrieve core
literature of
target problem
Characterize
core literature
Expand core
literature
Generate
potential
discovery
Literature
review
Bibliometric
analysis
Expert
procedures
Bibliometric
databases
(SCI,
MEDLINE)
130 potential treatment
discoveries for ‘‘RP’’;
hundreds of potential
discoveries for
‘Cataracts’’; 16 clusters
for ‘‘PD’’; 7 potential
discoveries for ‘‘MS’’; 6
potential discoveries for
‘‘ W P ’’
To allow the
technologically advanced
nations to remain
competitive with the
developing nations,
which have large well-
trained low-cost labor
pools
Scientometrics (2016) 108:1013–1041 1017
123
Author's personal copy
Table 1 continued
Study Aim and trend type Coverage (subject
area)
Methodology
stages
Methods Information
sources
Type of result Integration into STI policy
(stakeholders)
Lee
et al.
(2009)
To propose an approach for
creating and utilizing
keyword-based patent
maps for use in new
technology creation
activity
Trend type: white spots
‘Personal digital
assistant (PDA)
technologies’
Development of
patent map
Identification of
patent vacancy
Testing vacancy
validity
Patent
analysis
Expert
procedures
Patent
databases
(USPTO)
6 patent vacancies (no
names):
High value vacancy (1)
Medium value vacancy (3)
Low value vacancies (2)
Innovative ideas for new
product development
(NPD) and new
technology creation
(NTC) processes
Shibata
et al.
(2008)
To perform a comparative
study in two research
domains in order to
develop a method of
detecting emerging
knowledge domains
Trend type: emerging
knowledge domains
‘Gallium nitride
(GaN)’’ and
‘Complex
networks’
Data collection
Statistical
methods
Clustering
Extracting the
role of each
paper
Topic detection
by natural
language
processing
(NLP)
Bibliometric
analysis,
Expert
procedures
Bibliometric
databases
(ISI WOS)
3 clusters for ‘‘GaN’’ (no
names)
9 clusters for ‘‘complex
networks’’:
Support or Disease
(Social)
Network analysis (Social)
Support (Social)
Small-World (Physics)
HIV (Social)
Child development
(Social)
General (Social)
City (Social)
Water (Physics)
For R&D managers and
policy makers
overviewing scientific
activities and detecting
emerging research
domains (tool for
future’’Research on
Research (R on R)’’:
incremental and
branching innovation)
1018 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
Table 1 continued
Study Aim and trend type Coverage (subject
area)
Methodology
stages
Methods Information
sources
Type of result Integration into STI policy
(stakeholders)
Tseng
et al.
(2007)
To automate the whole
process of creating final
patent maps for topic
analyses and improves
other patent analysis tasks
such as patent
classification,
organization, knowledge
sharing, and prior art
searches
Trend type: emerging topics
Multidisciplinary
(collection of
NSC patents
from all areas)
Text
segmentation
Text
summarization
Stop words and
stemming
Keyword and
phrase
extraction
Term association
Topic clustering
Topic mapping
Patent
analysis
Expert
procedures
Patent
databases
(USPTO)
6 topic clusters:
Chemistry
Electronics and semi-
conductors
Generality
Communication and
computers
Material
Biomedicine
Automatic tools to assist
patent engineers or
decision makers in patent
analysis:
To make decisions about
future technology
development
To inspire novel solutions
To predict business trends
Yoon
and
Park
(2004)
To describe the overall
process of developing
patent network for
analyzing up-to-date
trends of high
technologies and
identifying promising
avenues for new product
development
Trend type: promising
technologies
‘Wavelength
division
multiplexing
(WDM)’
Collect raw
patent
documents
Transform into
structured data
Analyze relation
among patents
Develop patent
network
Executive patent
analysis
Patent
analysis
Expert
procedures
Patent
databases
(USPTO)
7 technology keyword
clusters for ‘‘WDM’
For R&D managers,
academicians and policy
makers
Scientometrics (2016) 108:1013–1041 1019
123
Author's personal copy
Some more systematic methodological frameworks have also been developed for
monitoring technology trends. For instance, Porter and Newman (2011) suggest a sys-
tematic five-stage ‘technology mining’ process. The proposed stages include: (1) literature
review, (2) research profiling, (3) technology mining, (4) structured knowledge discovery,
and (5) literature-based discovery.
The ‘Quick Technology Intelligence Process’ (QTIP) is also used for the empirical
analysis of the S&T publications and patent data in rapid technology analysis (Porter
2005). This methodology employs the ‘Vantage Point’ software, which can fulfill the
functions of data cleaning, statistical analysis, trend analysis, and information visualiza-
tion. This type of electronic applications and automated tools are in a great demand. There
are several other techniques to help engineers conduct patent analysis in using text mining
techniques, such as data extraction, cluster generation, topic identification and information
mapping and visualization (Tseng et al. 2007).
Visualisation is one crucial part of the process. There have been attempts to provide
visualisation techniques in the systematic analysis of technology trends (Chen 2006; Cobo
et al. 2011; Kim et al. 2008; Lee et al. 2009; Morris et al. 2002; Shibata et al. 2008; Tseng
et al. 2007; Wang et al. 2010; Zhu and Porter 2002; Yoon and Park 2004). The important
task of these studies is the quick generation of helpful knowledge from text in format of
two-dimensional maps. Studies devoted to evaluation of different efficient software-tools
(like Thomson Data Analyzer, Aureka and others) for analyzing patent documents in
structured and unstructured format are designed to present observations on advantages and
weaknesses of their application (e.g. Ruotsalainen 2008).
Table 1presents a comparative summary of aforementioned studies undertaken on
TTM, and thus provides a comparative analysis between different approaches used.
Considering the aims of the studies and methods for attaining those aims, Table 1
illustrates that the studies usually focus on well-defined narrow technology areas and
monitor trends in those areas through bibliometric and patent analysis. These methods
primarily consider the frequency of occurrence and co-occurrence of keywords/terms,
which are usually obtained based on statistical data (i.e. the number of times a particular
keyword/term was used and co-occurred with other keywords/terms). Because the analyses
look at the frequency data, clusters are usually generated under the most frequently
referred keywords/terms, which are in most cases generic terms. This process typically
produce rather ‘broad tendencies’ than more specific ‘technology trends’. Thus, there is a
need for more customized approaches, which may go deeper in the text itself, consider the
context of the keywords/terms and extract more refined ‘patterns’ of technologies using
fuzzy clustering.
The table also shows that among the data sources used for TTM are scientific publi-
cations, patents and more recently the web. In most cases the studies focus either one of the
data sources, without a joint analysis of different sources. However these sources are
limited to the ‘R&D’ and ‘emergence’ stages of technology life cycle. If a technology trend
is considered in a broad sense, not only as an emerging technology, but to cover earlier
‘blue sky research’ stage and later ‘application’ and social impact’ stages, then there is a
need to expand the information sources to capture the full cycle. In this case using different
databases is considered to make the analysis of technology development more complete
and multifaceted. Martino (2003) suggests that sources such as newspapers articles and
business and popular press can be used to capture ‘application’ and ‘social impact’ stages
of technology development respectively. Furthermore, results of foresight exercises which
are frequently based on the synthesis of research with a long term time horizon and expert
opinions can be considered as a significant sources of information for ‘blue skies research’,
1020 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
which give the first weak signals of future technologies. Thus, the current paper posits that
for policy and strategy-oriented studies, it is certainly not enough to look at only publi-
cations or patents, but in a wider scope by including wider sources of data, which may give
complementary information to capture the entire spectrum of supply and demand
dynamics. Then the TTM activities will be able to provide more valuable intelligence for
decision making processes, which bring the third key premise of the approach presented in
this paper, the action orientation of trend monitoring activities.
Analysis of the studies represented in the table indicates that the studies are typically
undertaken as standing alone work. They are usually published and shared individually,
without a clear reference for the implications on the policy and strategy impacts. There is
no clear discussion on how to achieve impact for policy and strategy making. Hence, the
TTM work requires more systematic approaches beginning with a broader scanning phase
in a wide variety of information sources and databases, processing data by using multiple
tools and techniques in an integrated way and generating results with a further discussion
on their implications for public and corporate policy and strategy. The TTM approach
presented in the following sections is developed to address these points.
Methodology
The TTM methodology proposed in the current paper aims at addressing the key expec-
tations from the trend monitoring activities discussed above, including:
1. Linking the TTM with policy and strategy making processes, and increasing the
impact of the activity on action.
2. Broadening the possible sources of data to address various stages of technological
development.
3. Combining clusterization methods in a complementary way to create a flexible and
adaptive approach with more refined output for the purpose of identifying technology
trends.
In order to meet the first expectation of linking the TTM with policy and strategy, a five-
phase systematic process was designed as illustrated in Fig. 1.
The phases presented in the figure aim at providing a ‘systematic’ framework for
undertaking the TTM activity and ‘explicitly’ linking the activity with the implemen-
tation of its results (Saritas 2013). Thus, the first phase ‘intelligence’ is mainly concerned
with scanning and surveying activities. First, the scope of the study is defined in line
with the overall objectives of the TTM activity. The area under investigation is described
in detail and key terms are generated with the help of the domain experts. Next, a set of
relevant data sources are identified. The sources are scanned through the use of quan-
titative and qualitative methods such as Bibliometric Analysis, Patent Analysis, and
Literature Review.
In the ‘immersion’ phase, input data obtained in the first phase is ingested, transformed
and normalised through sorting, mapping and further analysis. The aim is to capture overall
technology development patterns, which are revealed through the analysis. Here data
clustering techniques are used to identify trends. Software like the Vantage Point, VOS-
viewer, and Carrot can be used at this phase. The present study developed an additional
clustering algorithm due to the weaknesses of the existing approaches in generating
clusters and identifying trends. This will be detailed below.
Scientometrics (2016) 108:1013–1041 1021
123
Author's personal copy
Following the identification of trends the ‘integration’ phase considers the networks of
trends, key actors, institutions, countries and examines the dynamic relationships between
them. With a timeline analysis, this phase will reveals the technology development
pathways, and thus the nature of technology trends.
The data on trends are then analysed and described in the ‘interpretation’ phase. This
phase benefits from expert opinions to capture the diversity of multiple interpretations and
different viewpoints. The narratives of emergence pathways, alternative future trajectories
in each cluster and the impacts of weak signals and wild cards on those clusters can be
explained in the course of inclusive discussions.
Finally the ‘intervention’ phase is concerned with the translation of the key messages
arising from the TTM process into policies and actions. Therefore, it is concerned with the
identification of priorities, actions, capacity requirements and organisational structures.
These are used for the formulation of Science, Technology and Innovation (STI) policies
and Research and Development (R&D) strategies. An ideal follow up step of the inter-
vention phase can be the evaluation of the findings and re-iteration of the TTM process.
The second feature of the proposed TTM methodology is to make use of broader
sources of data in a complementary way. As discussed earlier, different types of infor-
mation sources provide diverse knowledge about the development trajectories of emer-
gence, pacing, key and base technologies. This information can be extracted from
structured data (such as publications and patent databases), semi-structured data (such as
data coming from social networks and web forums), or unstructured data (such as free text
and presentations). This multi-source approach provides a more complete picture about the
Fig. 1 The TTM methodology
1022 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
technology trend, its drivers, supply and demand dynamics and further impacts. In order to
gain this valuable intelligence, flexible and adaptive tools will be needed with the capa-
bility to analyse wide variety of information sources, which brings the third key feature of
the methodology—the development and use of flexible and adaptive tools.
The TTM methodology developed in the paper involves the use of complementary
clusterization methods: This new approach was used to overcome the shortcomings of
currently available tools for clusterization, such as the inability of cluster refinement
beyond pre-defined attributes, exclusion of the time dimension from clustering, impossi-
bility to turn off unnecessary clusters from clustering results, and finally the lack of core
clustering metrics such as characterized vectors and descriptors of both documents and
clusters. The clusterization methods used at the present study are: (1) an ‘on-the-fly’
clusterization of documents based on the full text processing with Carrot software and (2)
the algorithm developed by HSE that combines fuzzy clustering by utilizing topic mod-
elling and other methods; intelligent application of stop-lists based on the common fre-
quent keywords in the document samples such as ‘patent’, ‘invention’, and ‘apparatus’,
which may appear in the same genre; and the selection of topics that are relevant to all
genres of documents in the same research area. The application of traditional clustering
tools leads to generation of clusters that are similar to domain hierarchy levels, which does
not allow for the identification of trends. Moreover, the majority of clusters generated are
too broad such as ‘energy sector’, ‘shipbuilding’, and ‘aerospace’. The application of two-
step clusterization helps to combine clusters identified by Carrot and fuzzy clusters gen-
erated by the HSE algorithm to create more refined trends while cleaning general and
‘noisy’ concepts from the clusters. The details of how the HSE algorithm was developed
and implemented will be demonstrated through a case study on ‘semantic technologies’.
‘Semantic technologies’ case study
The key notion of semantic technologies is to automatically extract meanings and
knowledge from unstructured text and store it in a machine-readable form that computers
can access and interpret. ‘Semantic technologies’ have been developing for over 15 years
and are characterized by a broad set of research and development, technologies, products
and services. They are used by a number of other fields from natural to social sciences.
Despite the long-term evolution and wide areas for potential applications, there are not yet
many well-established technology, product categories, industry standards and benchmark
companies associated to the semantic technologies. All these factors make the area
attractive as a subject for TTM. The process of TTM in semantic technologies is presented
in the next sections.
Stage 1: Formation of a list of terms
The first two stages of the methodology are concerned with collecting input and gathering
intelligence for TTM. First of all a set of terms and keywords are formed to be used for
searches. Terms or keywords are commonly used to define a technology area or devel-
opment. Typically, about 30–50 keywords are used in the TTM work to describe each
technology sub-area in a complete manner. On one hand, the list of keywords should not be
too large for an optimum search effort, on the other hand, the list should be large enough to
provide a unique description of the area under investigation. At this stage, it is important
Scientometrics (2016) 108:1013–1041 1023
123
Author's personal copy
that domain experts are involved in the process for the identification and refinement of the
keywords to best represent the area under investigation. A list of keywords was created for
the pilot area ‘semantic technologies’ in consultation with the domain experts, including:
‘semantic intelligence,’ ‘semantic business,’ ‘semantic BI,’ ‘semantic infrastructure,’
‘Resource Description Framework (RDF),’ ‘Ontology Web Language (OWL),’ and ‘Pro-
tocol and RDF Query Language (SPARQL)’, etc.
Stage 2: Data scanning in databases and collections
At this stage of the TTM methodology, a set of databases and collections were identified
for the analysis. The present study benefits from a wide range of information sources in
order to capture the full cycle of technological emergence and development process. The
use and relevancy of information sources can be determined according to the scope and
objectives of the TTM study. Below, some of the potentially useful information sources are
listed and described (Table 2).
Each of the information sources and databases, and how they were used for TTM
process in the semantic technologies domain are described briefly in the following sections.
For the purpose of the current study, collections from the above-mentioned databases were
generated covering 10 year period (i.e. 2002–2012).
Scientific articles
This is undoubtedly one of the most frequently used sources of information in most of the
TTM work, as the latest scientific advancements are commonly discussed and shared in the
scientific literature. Web of science (WoS) and scopus databases are frequently used for the
generation of a collection of articles. The articles are identified by using the list of key-
words generated in the first stage of the TTM process through bibliometric analysis. Data
on article titles, abstracts, keywords, author names, affiliations, and locations, and funders/
sponsors are collected for analysis. The collection of articles is useful for studying the
Table 2 Information sources and databases used for TTM
No. Information source Database
1 Scientific articles ISI web of science, scopus
2 Patents EPO
a
, USPTO
b
, JPO
c
3 Media Factiva
4 Foresight exercises European foresight monitoring network,
European foresight Platform
5 Conferences Conference websites
6 EC projects CORDIS Europa
7 The internet Websites
8 Dissertations ProQuest
9 Academic/non-academic presentations SlideShare database
a
European patent office
b
United States patent and trademark office
c
Japan patent office
1024 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
dynamics of S&T development and growth of interest in certain areas as well as for the
detection of highly cited fields. Bibliometric analysis for pilot area ‘semantic technologies’
was performed using a resource platform WoS, which provides information not only about
the ratings of scientists, countries, research directions, but also contains additional features
like ‘highly cited papers,’ ‘hot papers,’ and ‘research fronts.’ Research on ‘semantic
technologies’ in WoS generated 4994 publications.
Patents
Patents play a pivotal role in TTM as inventions patented represent important evidence of
scientific and engineering advancements in certain areas of S&T. Patents reflect the ability
of an individual or organization to transform scientific results into technological applica-
tions. They are also necessary condition for economic use of research results and, there-
fore, play the central role in the analysis of economic potential and determination of the
most promising sectors and actors such as individuals, organizations and countries. Patents
not only play a role in legally protecting inventions, but they are also first indications of the
introduction of new artefacts and services, which allow the possibility of detecting inno-
vation breakthroughs. Major patenting organizations including the European Patent Office,
United States Patent and Trademark Office, and Japan Patent Office provide patent data.
Analysis of a variety of databases through patent analysis ensures larger global coverage
and benchmarking between countries. The Derwent Innovations Index (DII) database
developed on the Web of Knowledge platform was used to create a collection of patents for
the semantic technologies area. The DDI database is designed for quick and accurate
search for patents granted in different countries and also provides additional descriptive
information about the importance of the patent, as well as analysis of its relations with
other documents. In the case of semantic technologies, the patent collection includes 623
documents.
Media
Media is a source of understanding S&T supply and demand dynamics in a wide variety of
socio-economic areas. Analyzing media provides the opportunity to monitor leading sci-
ence and technological news from business sites, and transcripts of essential news chan-
nels. As a large database, Factiva can be used to extract data. Factiva provides access to
more than 2000 newspapers (including ‘New York Times,’ ‘Wall Street Journal,’ ‘Fi-
nancial Times,’ and Russian newspaper ‘Vedomosti’), over 3000 magazines (including
‘The Economist,’ ‘Time,’ and ‘Forbes’), and more than 500 news feeds (including ‘Dow
Jones,’ ‘Reuters,’ and ‘The Associated Press’). For the semantic technologies area a col-
lection of 13,885 news was created by using the Factiva database. As in the case of articles
and patents, the keywords search method was used to analyze information from this
database.
Foresight projects
As forward looking activities Foresight projects with a time horizon of 5–100 (or longer)
years, are valuable sources of information for detecting technology trends and priorities. In
the scope of the TTM methodology, the European Foresight Platform (EFP), formerly
European Foresight Monitoring Network (EFMN), database was used. Since the initiation
Scientometrics (2016) 108:1013–1041 1025
123
Author's personal copy
of the EFMN as a European Commission 7th Framework Programme, the project mapped
over 2000 Foresight exercises. The issue analysis function of the EFMN mapping
methodology provided help with the identification and analysis of key emerging issues
relevant for the future of European S&T development. The EFP was used as a main source
of information for creating Foresight collection. The official website of the platform
provides short presentations (briefs) of the major Foresight studies conducted in different
countries of the world. As a result, 25 Foresight-projects were included in the collection for
semantic technologies.
Conferences
International conferences, seminars and forums are potentially useful sources of infor-
mation and can be beneficial for TTM. These events may be good outlets for introducing
novel technologies and major technological areas, and assessing dynamics and prospects of
their implementation. It is obvious that in this respect, business conferences are of a great
interest as they pay attention to current issues and reflect opinions of key experts–repre-
sentatives from specific areas of knowledge. These experts may not only be directly related
to the development of technologies, but also are interested in their implementation. One of
the goals of this sort of business activities is knowledge sharing via presentation of sig-
nificant results obtained in R&D activities. Best practices of leading industry companies
(key players) are also commonly shared. This may provide valuable information to fill
summary tables in the framework of methodology. For the semantic technologies area, a
list of conferences provided by an expert group is used for creating collection of 2434
records, including conference programs with brief descriptions of manuscripts and
presentations.
EC projects
The European Commission (EC) Framework Programme (FP) is one of the largest research
funding organisations in the world. Currently in its seventh iteration, with an eighth one
about to start, a large number of research projects that have been funded by the EC.
‘CORDIS Europa’ database can be used for collecting data about the projects. A keywords
search can be conducted in the ‘projects’ section, where results can be classified by
thematic areas and based on certain time periods. Useful information can be obtained on
long-standing and emerging technology trends, demand side and markets, and on S&T
performance of different countries. In a similar way, the database of the National Science
Foundation in the US can provide extremely useful information on technology trends. In
the case of semantic technologies, the collection of EC projects created using ‘CORDIS
Europa’ database included 76 projects.
World wide web
The Internet is the largest storage of useful information about S&T development. The main
advantage of the Internet as a source of information is its immediacy and wide scope of
available data on S&T developments, which may be found in public and private web sites,
news portals, articles, blogs and forums. However, it should be noted that besides extre-
mely useful information, there are large amount of ‘hypes’ on the Internet. Web-scraping
can be used as a method to obtain data about technology trends in various fields from the
1026 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
Internet resources. These Internet resources are initially discussed with experts to narrow
World Wide Web information to data of specific interest. Full-text search library ‘Lucence’
inlaid above web-spider ‘Nutch’ can be proposed as a basis for the Internet search using
web-scraping method. This open source search engine is well-known and fully docu-
mented. It has already been implemented as a basic infrastructure—return index, search
robot, parsers for various documents (HTML, PDF, etc.) and has a user-friendly interface.
For this reason, the important part of the process is to configure ‘Nutch’ in an appropriate
way (at what depth should spider ‘drop’ one iteration, how many documents should spider
download at each level, which links should it take into account, how many pages spider can
download simultaneously from one website). Similarly, web-spider ‘Nutch’ is used for
creating a collection of 994 web-documents in the area of semantic technologies.
Dissertations
International dissertation databases are also considered to be a useful source of information
about global technology trends to identify emerging and disruptive S&T areas. In this case,
electronic libraries of theses and abstracts can be used to search for data provided by
different scientists from all over the world. Some dissertations are presented in these
databases in full-text form, whereas the others can be accessed only in the form of
abstracts. The proposed method suggests using the ProQuest database, which includes the
most recent publications as well as archives covering the year 1971 up to present. ProQuest
is composed of multiple databases on various subjects and includes the latest editions of
publications such as ‘American Sociological Review’, ‘Econometrica’, ‘Journal of Polit-
ical Economy’. The advanced search option, which is used for the selection of scientific
literature, provides the capability to filter, display and store results received. For the
semantic technologies a collection of 6000 dissertations was created by using the ProQuest
database.
SlideShare presentations
The SlideShare presentation database is potentially an extremely useful data source to
access presentations related to various topics in a wide variety of S&T areas. SlideShare
features a vibrant professional and educational community that provides comments, ‘fa-
vorites’ and downloads content on a regular basis. SlideShare content spreads virally
through blogs and social networks such as LinkedIn, Facebook, and Twitter. Individuals
and research organizations upload documents to SlideShare for sharing ideas, conducting
research, connecting with each other and generating decisions for their businesses. In most
cases, presentations are prepared for conferences, business events and presale activities. In
addition, since time of presentation is typically limited, they contain the most concentrated
information about concrete technologies, their applications, and innovation ideas. In this
sense, they can act as a separate business and technology-oriented data source. In order to
form a collection in the area of semantic technologies, a list of keywords discussed with
experts was used to scan the SlideShare website (www.slideshare.net) with the purpose of
finding all relevant documents. Searches resulted with 110 presentations in the .pdf format,
which were then converted into text format for the analysis with text mining tools.
In addition to the abovementioned information sources, the analysis of academic pro-
grammes and curriculum of the leading academic institutions in the world, international
and national S&T policy documents, personal contacts with experts and stakeholder
Scientometrics (2016) 108:1013–1041 1027
123
Author's personal copy
communities specialised on certain areas of S&T and crowdsourcing approaches would be
immensely useful for TTM work.
Following the extraction of data from a wide variety of information sources, the third
stage of the proposed TTM methodology is concerned with the clustering of data obtained.
Stage 3: Data clusterization
This stage is concerned with sorting and mapping as part of the immersion process. At this
stage, the input extracted from different data sources was used for clusterization in order to
identify areas to formulate technology trends. As mentioned earlier, the purpose of using a
variety of data sources was not to limit research in a certain phase of technological
development, but to cover the entire spectrum of technological evolution as much as
possible. Thus, processing nine collections of different genres instead of a single document
collection helps extracting information relating to technological trends to cover both
research and scientific aspects (e.g. through scientific articles and dissertations, and
technology and production aspects (e.g. through patents and projects), as well as business
and marketing aspects (e.g. through conferences and presentations). Certainly, some data
sources provide information in relation to two or three aspects such as Foresight exercises,
media and the Internet. With regards to this, the parallel processing of the nine collections
in the current TTM study intended to:
Provide a diverse scope of the subject area ‘‘semantic technology’’.
Identify specific technological trends from the specific collections.
Avoid the suppression of business trends, where the semantic technologies domain as
well as trends with from different genres.
Each collection was processed as follows:
1. Empirical study with the involvement of experts for the selection of the control
parameters for clustering.
2. Clustering of the collections with software.
3. Creation of trends for each collection.
4. Harmonization of trends from different collections.
5. Formation of the final technology trends.
Empirical study for the selection of the control parameters for clustering algorithms was
carried out by variation and subsequent expert validation of the results in relation to the
following control parameters:
Maximum top-level clustering passes.
Cluster size.
Merge threshold.
TF label scorer weight.
Obtained clustering results for each group of control parameters were validated by ‘‘Se-
mantic Technologies’’ domain experts.
Keywords and phrases defined by experts as interim trends were marked by special
markers. Changing the control parameters clustering was performed by bi-directional
passage to maximize the number of the interim trends. List of the trends for the specific
collection was composed within the reaching the maximum number of the interim trends.
Then, eight interim lists were harmonized by experts within the entire set of information
1028 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
sources. The final list of the trends was formed as most frequent names of the harmonized
technological trends.
During this process data clusterization appeared to be a problematic area. This was
mainly due to the challenges of data refinement for the purpose of extracting useful
information such as on people, institutions and locations. Various publicly available or
commercial clusterization methods have been developed. However, most of them are
currently far from the precise clusterization of raw data. They have their pros and cons,
which may be assessed in line with the nature of the work undertaken. For the purpose of
the TTM study presented in the current paper, two methods of clusterization were selected.
First one is proprietary software available for data clusterization, namely ‘Carrot’ data
clusterization tool.
1
In addition to it, a second clusterization approach was developed and
applied by the programmers of the Higher School of Economics (HSE). The process of
clusterization began with data preparation and filtering as described in the next section.
Data preparation and pre-filtering
Any clusterization process involves some preparatory work in order the make input data
ready for the analysis. This process tends to take longer as the diversity of databases to be
analyzed increases. As presented above the current study makes us of a large set of
information sources, which involve structured data such as publication and patent data-
bases, and semi- and un-structured data such as SlideShare, conference programs and
dissertations.
First, all data was converted into text format to provide compatibility. However, each
collection was processed separately in order to take into account different stages of
technological development i.e. to capture blue sky (emerging) trends, or different levels of
technological development such as research and development, technology, product, market
stages. Upon the completion of the clusterization process, technology trends were grouped
under different stages of development. Below, how this process was undertaken will be
explained step by step.
Clustering with Carrot software
Using the Carrot method, a number of clusters can be generated with the use of various
cluster tuning attributes. Refining the clustering procedure by managing stop lists, maxi-
mum cluster size and count, weights of capitalized words, words in document titles, TF/
IDF ratio and other tuning parameters, the clusters in Fig. 2were obtained.
To refine the composition of the clusters and their interconnections, clusterization
results were rendered with a variety of visualization tools that are integrated into the
software package Carrot2 Document Clustering Workbench. For example in Fig. 3
selected cluster ‘Social Semantic Web’ is shown with its connections to other clusters,
which are indicated with small circles.
Obtained results of the trend extraction using Carrot software for the separate collec-
tions, and results of the trends harmonization are shown in Table 3. The table presents four
selected collections with top three clusters in each of the six trends identified.
Table 3demonstrates the variety of results generated from different collections. The
results of the Carrot clusterization process confirmed the one of the key premises of the
paper that ‘utilizing different information sources would provide intelligence on the
1
http://carrot2.org.
Scientometrics (2016) 108:1013–1041 1029
123
Author's personal copy
different levels of technological evolution’. Some evidence of to support this proposition
can be found in Table 3. For instance, regarding first trend, linked open data, the analysis
of scientific articles revealed the ‘‘Data linkage’’ cluster. This is a ‘supporting’ technology
for Semantic Technologies on the supply-side and it is at a lower level in the technological
architecture. Following the first raw towards right, it can be seen that the patents database
generated the ‘‘ontology derivation’’ cluster. This is a ‘solution’ approach, which ensures,
for instance, the success of the semantic web and structures that are implemented (Wouters
et al. 2002). Moving to conferences and presentations, it is seen that ‘‘Web of Data’
concept emerges, which is a ‘practical’ application that makes semantic knowledge of data
accessible and semantic services available.
The clusterization process using the Carrot algorithm generated six clusters, however,
left large amount of documents in the others (i.e. trash) category and left them unanalyzed.
Furthermore, the clusterization algorithm used in Carrot is a hierarchical and tends to
generate more broad and stable clusters such as ‘‘semantic technologies’’ and similar sub
Fig. 2 Carrot clustering results
1030 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
clusters under them, such as ‘‘semantic web’’. Therefore, it is difficult to explore emerging
new trends, which are usually merged under the general clusters or left out of analysis. The
following section will detail the main motivations of developing a new algorithm by the
HSE and how it compares and complements the other clusterization tools.
Clustering with the HSE clustering algorithms
The clusterization algorithm developed by the HSE based on earlier work undertaken by
Kuznetsov (2001) provides a number of important features, which are considered to be
missing in the other commonly used tools. Firstly, the limitations with current tools are
related to the restricted features for the multilevel clusterization. This is the main reason for
the fact that the analyses frequently generate stable clusters containing broad thematic
areas like ‘‘semantic web’’ or ‘‘ontology modeling,’’ which do not necessarily lead to any
specific technological trends. A second and perhaps more significant shortcoming arises
due to the complex conceptual nature of technology trends. The existing clusterization
tools assign each record (e.g. document) to a specific cluster and do not allow the use of the
same record for the analysis of cross-cutting clusters. This is because those tools have been
developed for the purpose of thematic clustering, therefore they lead to suppression
characteristic uni- and bigrams ensuring division of the document set to thematic subdo-
mains: ‘‘semantic technologies’’ -[(‘‘semantic web’’, ‘‘ontology modelling’’, ‘‘knowl-
edge formalization’’ etc.) even with hierarchical clustering. However, document
systematization for the purpose of identifying technology trends is different than the
thematic clustering.
Fig. 3 Carrot clustering results in the form of an Aduna cluster map
Scientometrics (2016) 108:1013–1041 1031
123
Author's personal copy
The algorithm proposed by the HSE aimed at enabling a more granulated processing of
the document collections and identify novel trends. Among the key features of the HSE
clustering algorithm are:
‘On the fly’ selection of the clustering elements: bigrams, trigrams, n-grams.
Use of different metrics of clustering: TF/IDF or TF only.
Compatibility with a variety of source document formats (pdf, doc, ppt, rtf).
Availability of tools for weighing meaningful linguistic marker for the new cluster
centroid formation.
With these capabilities, the HSE algorithm aims to address some of the limitations of the
Carrot software, which were identified through the study and considered to limit the
possibility of identifying unique trends.
In order to address the problems encountered with the existing applications, the HSE
algorithm was equipped with a multi-level clustering scheme with an intelligent expansion
of the clusterization stop-list. The new approach for automatic stop-list extension was
developed as a module in the new software. The need to include this method in the HSE
Table 3 Obtained results of the trend extraction and results of the trends harmonization
Articles Patents Conferences Presentations Harmonized
Linked data
Data linkage
Open data
Linked open data
Semantic
interpretation
Ontology
derivation
Linked data
Web of data
Metadata
Linked data
Graph database
Web of data
Linked open data
(LOD)
Semantic web
Folksonomies
Social bookmarking
Social data
Semantic graph
Contextual
workspaces
Social web
Social media
Social software
Social media
Social semantics
Semantic wiki
Social semantic
web
Ontology mapping
Interoperability
Knowledge representation
Information
Integration
Record linkage
Interoperability
Content mapping
Semantic
interoperability
Metadata
management
Semantic
repositories
Semantic
interoperability
Archetypes
metadata
Semantic
interoperability
Bioinformatics ontology
Bioinformatics platform
Bioinformatic resources
– Bioinformatics
Bioinformatics
e-resources
Bioinformatics
technology
Semantic
bioinformatics
Knowledge
system
Biological
systems
Semantic
bioinformatics
Mobile semantic
Multimodal mobile
interfaces
Mobile environments
Mobile web
Smart devices
Ubiquitous
computing
Mobile semantic
Semantic agents
Mobile devices
Mobile semantic
Semantic digital libraries
Affilation disambignation
Automated ontology
Digital resources
Digital documents
Digital library
Library cloud
Semantic digital
libraries
Library
collections
Data repositories
Semantic digital
libraries
Library system
Library services
Semantic digital
libraries
1032 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
clustering algorithm was because keywords and phrases that are not applied to technology
trends automatically expand the basic clusterization stop-list. Hence, during the analysis of
a particular collection of documents (such as patents or theses), high-frequency terms such
as ‘patent’ and ‘invention’ form the cluster characteristic vectors. To reduce the impact of
these terms in the formation of clusters, each of the nine document collections were
supplemented by a specific parity document collection of the same genre.
In order to reduce the ‘noise’ of frequent terms, N-gram frequency ratings were used.
These were compared online at different levels (i.e. the whole document collection, and
topics and sub-topics of clusters). A term is automatically identified as noise and included
in the stop-list if the null hypothesis shows that this term has a high frequency. The term is
then distributed identically over the document collection and clusters’ topics or sub-topics
instead of a particular cluster where it emerged. Finally, all the stop-lists are combined to
form a unified ‘master stop-list’. The terms identified were included in the vocabulary
following the ‘bag-of-words’ model. Then, frequency characteristics were generated from
the adjusted document collection specifics, based on TF/IDF. The term vectors of the
documents were formed by taking the terms as document attributes and frequency char-
acteristics as values of the attributes. The HSE algorithm is capable of customizing parity
collections for different areas. With this feature the HSE clustering algorithm became one
of the most flexible and adaptable approaches for clusterization for trend identification. All
clustering parameters and tuning options in the algorithm are customizable, which provides
different tuning options such as changing TF/IDF ratio and other granular customization of
various results with a wide set of clustering metrics.
The second improvement is related to the development of the fuzzy (soft) clusterization
algorithm. This method implements the topic modelling (Blei and Lafferty 2007; Griffths
and Steyvers 2004; Steyvers and Griffiths 2007) and provides a binding of the source
document simultaneously to a large number of clusters, which when combined with
intelligent stop-list expansion provides weight growth of the topics corresponding to
implicit technological trends.
For the purpose of clustering, the HSE algorithm benefits from various methods
including k-means with Euclidean distance metric, Manhattan distance metric and Pearson
correlation metric along with latent Dirichlet allocation (LDA) modeling with an extension
for N-gram support and formal concept analysis (FCA) based on the lattice theory. Using
the algorithm, it is possible to manage representation, structure and different analytical
dimensions of clustering results. In such customization, there are possibilities to change
and refine core clustering functions producing results based only on TF (without IDF)
calculation, getting unigram, bigram and trigram lists and other modifications of the core
and supplementary algorithms. To weight co-occurrence of categorical attributes (key-
words) and the individual patterns (documents) the FCCM fuzzy clustering algorithm was
used with modifications for large text corpora (Kummamuru et al. 2003).
Figure 4illustrates the results of the clusterization through the HSE algorithm described
above.
As highlighted in the figure, this algorithm helped to reveal additional and more novel
trends, which were not identified in other tools including ‘‘Semantic Business Intelligence’
and ‘‘Semantic e-Health’’ along with other e-applications for commerce, government, and
learning. The results also confirmed some of the other trends, which were found previously
such as ‘‘Linked Data’’.
Consultations and validations of the results generated through the Carrot and HSE
algorithm revealed eight trends in total, which are described in the next section of the
paper. In summary, the analysis has shown that standard clusterization tools may not reveal
Scientometrics (2016) 108:1013–1041 1033
123
Author's personal copy
all the trends. Combined and customized use of the clustering tools provides more diverse
outputs, particularly if the purpose is not only to generate clusters, but to discover hidden
patters in data.
Stage 4: Identification and description of trends
This stage begins with the integration of the output generated to identify technology trends.
For the pilot area, ‘semantic technologies,’ a wide variety of collections were generated
including 6000 dissertation abstracts; 4994 abstracts of scientific articles; 2434 materials
from conferences programs, call texts and manuscripts); 623 patents; 994 web articles; 110
SlideShare presentations; 76 abstracts from FP7 projects; and 25 Foresight projects. In the
framework of this study, all two clustering methods described earlier were used to grad-
ually process each document collection coming from different information sources through
continuous expert consultations. As a result of this process, the following list of eight
trends was identified:
1. Linked open data (LOD).
2. Social semantic web.
3. Semantic business intelligence.
4. Semantic interoperability.
5. Semantic bioinformatics.
6. Mobile semantic.
7. Semantic digital libraries.
8. Semantic-based e-Apps (semantic e-commerce, e-government, e-learning, e-health).
Fig. 4 HSE clustering results
1034 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
These results were presented in a final workshop with the participation of broader stake-
holders to discuss and prioritize the list of trends. The workshop resulted with a reduced
number of trends, which were considered to be the most relevant and promising. The
shorter list of trends included:
1. Linked open data (LOD).
2. Social semantic web.
3. Mobile semantic.
4. Semantic digital libraries.
5. Semantic-based e-Apps (semantic e-commerce, e-government, e-learning, e-health).
The final list of trends was taken to the next step for further elaboration and database
creation.
Stage 5: Creation of a trends database
After the final list of technology trends was generated, the interpretation stage elaborated
the trends further to provide a thorough description for their further use. A template was
designed to describe the most determinative features of trends in order to assess their
significance in terms of their impact on global socio-economic development in the long
term, as well as their potential impacts on Russia. For understanding the state-of-the-art in
the subject area, a brief description of scientific and technological context is included.
Promising products/services associated to the trend indicates the potentials for exploitation.
Further socio-economic benefits are indicated with a brief description of new consumer
properties (e.g. portability, multi-functionality, energy efficiency, etc.), disruptive capacity,
expected effects of its development (social, economic, ecological, political, etc.), as well as
key areas of application. One of the core features of trend descriptions is the life cycle
stage at which major work is carried out (basic research, applied research, prototypes, mass
production), which in turn may indicate approximate year of technology implementation
and expected market volume. Mechanisms for the efficient introduction of the technologies
were indicated with the development of other interconnected technologies as well as
alternative directions of technological development that have their comparative advantages
and disadvantages. Information on key players and leading countries in the subject area
were considered to be helpful to determine Russia’s position in global development of the
technology. In addition, assessing the technology development should take into account
drivers that stimulate it (for example, the need of reducing emissions, the need of pro-
cessing large amounts of information, legal requirements, etc.), as well as barriers, risks,
and uncertainties that may adversely affect the technology introduction. Additional liter-
ature, web-links, and other background information were also considered to be useful for
further elaboration and update of information on technology trends. Consequently, the final
template included the following features for the description of identified trends:
1. Name (full title of the technology trend).
2. Short description.
3. Promising products/services associated to the trend.
4. New consumer properties.
5. Expected effects (the most important results of technology application—in society,
economy, ecology, etc.)
6. Life cycle stage (the stage of technology development—basis research, applied
research, prototype, mass production).
Scientometrics (2016) 108:1013–1041 1035
123
Author's personal copy
7. Year of technology implementation.
8. Areas of application.
9. Market volume.
10. Competitive technologies (that can substitute studied technology, their advantages
and disadvantages).
11. Associated technologies related to studied trend.
12. Leading countries in the area (the most successful countries in technology
development).
13. Key players (including companies, universities, research organizations, or other
institutions that are mainly engaged in technology development actions).
14. Drivers (factors that can accelerate innovation process).
15. Disruptive capacity (potentials to be a game changer).
16. Barriers, risks and uncertainties against the trend.
17. Data sources and additional information (literature, web-links, and other
background information).
Once the descriptions were completed, a global trends database was generated with all the
trends mapped. The TTM methodology described above will be extended to cover other
sectors and topics at three levels: upper, intermediate and bottom. At the upper level, the
aforementioned six priority areas for the Russian Government will be considered. Each of
the priority areas will then be divided into five to six technological sub areas at the
intermediate level. Then five to eight technology trends are identified at the bottom level.
The TTM case study described above focused the ‘ICTs’ sector at the upper level; ‘se-
mantic technologies’ at the intermediate level, and identified five technology trends at the
bottom level. The database generated will be used to map the entire set of trends in all
priority areas.
Uses of the TTM results
Besides raising awareness on emerging trends, the TTM study and the trends database are
considered to be a useful source of information for a number of further efforts such as
national, regional or sectoral level Foresight projects and STI policy formulation and
corporate R&D strategy making. This is the final stage of the TTM process, which is
entitled ‘intervention’. Achieving impacts with the TTM work may be achieved in various
ways:
The analysis of publication, patent, media and other databases reveals trends in
scientific, technological and policy domains.
The trend monitoring study helps to identify the ‘weak signals’ of possible future
developments with potential opportunities and threats. Once identified, it will be
possible to prioritise trends and weak signals in the most promising areas of STI.
Through the bibliometric analyses, the results of the TTM study indicate leading
countries, institutions and individuals in certain STI domains. This makes it possible to
build domestic and international collaboration networks with the use of the TTM results.
A ‘gap analysis’ can be conducted by comparing and contrasting the results in the world
and in Russia. Strengths and weaknesses can be identified at the global and national
levels and future collaborators can be identified by using the networks of leading
countries, institutions and individuals.
1036 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
In order to serve for these purposes, a TTM database has been generated. This relational
database involves all the attributions used in the trends description and allows searching for
trends by using keywords, sectors, institutions, countries as well as associated grand
challenges and social, technological, economic, environmental and political domains they
may be related to. Advanced reporting functions have been added into to the database. The
reporting function helps to generate reports based on the requirements of the database
users. For instance, currently, regular TRENDletters are being issued periodically on the
national priority areas identified for Russia.
2
Examples of TRENDletters released so far
can be seen on the dedicated web page (in Russian language).
3
The TRENDletters are
distributed for different user groups including individuals, government institutions, busi-
ness firms, research institutions among the others, which are engaged in STI policy and
strategy processes.
Furthermore, the results of the TTM activity have been used in the Russian STI
Foresight 2030 exercise, which was addressed by the President Vladimir Putin at the
Federal Assembly of the Russian Federation.
4
The TTM studies are currently being used
for the on-going Foresight 2040 exercise.
Conclusions and discussion
This paper has presented a TTM methodology, which aims to detect and describe a set of
technology trends in selected domains of interest. Conceptually, the process begins with an
Intelligence gathering process, which involves undertaking comprehensive understanding
and scoping activities to best describe the area under investigation. After scoping the field
with selected keywords and terms, a wide variety of information sources are analysed
through the use of a combination of quantitative and qualitative techniques. This process
helps to extract a number of established and emerging new technologies, technology
application areas, along with their individual, institutional and geographical attributions.
The TTM process has validated one of the premises of the paper that using multiple
sources of data gives a more complete picture of the technological evolution process. As
the focus of the TTM process moved from publications towards patents and then to
conferences and presentations, it was observed that there is a move from ‘supporting’
technologies to ‘solution’ approaches and then to more ‘practical’ business-oriented
applications. Each data source is considered to indicate a different stage of the techno-
logical development life-cycle that is the emergence, growth, maturity or saturation phases,
with implications on the supply of and demand for new technologies.
Then, relationships between the technologies and application areas were investigated
through the clusterization processes to identify the emerging trends in the domain. Out of
the analyses with various clusterization tools readily available, it was concluded that there
is a need for a more customised approach for revealing technology trends than identifying
merely clusters. A proprietary clustering algorithm was developed by the HSE to overcome
the shortcomings of the existing tools. With its improved properties, the new algorithm
2
The priority areas include Information and Communication Technologies, Living Systems and Biotech-
nologies, Nanotechnologies, Transportation and Aerospace Technologies, Technologies for the Rational Use
of Natural Resources, and Energy Efficiency and Energy Saving Technologies.
3
http://issek.hse.ru/trendletter/. Last visited on October 27, 2015.
4
http://www.hse.ru/data/2014/03/03/1330240475/Foresight%202030.pdf. Last visited on October 27, 2015.
Scientometrics (2016) 108:1013–1041 1037
123
Author's personal copy
helped to identify additional technology trends, which represented the two out of five final
clusters shortlisted through consultations.
The trends identified were described in detail by using a wide variety of parameters to
cover future opportunities and threats for policy makers, corporations, research bodies and
other potential users. A database is currently being prepared with advanced search and
reporting options to enable the use of TTM results for multiple purposes, for example as an
input for future national, regional and sectoral Foresight studies.
The review of the technology monitoring, technology watch and technology mining
literatures reveal that the methodology developed is comparable to other similar efforts
with potentially useful new features. It fulfils the expectations of diversity with the use of
information sources, a systematic process, transparency, inclusivity and scalability, and
therefore is considered to be useful for other TTM efforts. Moreover, the proposed TTM
methodology has several features which distinguish it from similar efforts such as the ones
developed by Daim et al. (2006), Porter and Newman (2011), and Shibata et al.
(2008,2011). First of all, a systemic approach is proposed for the TTM process, this is a
more holistic process than the other methods, and is achieved by integrating the com-
prehensive technology intelligence process with implementation and policy making pro-
cesses. The results are not provided as raw materials, but instead involve precise strategies
and policies through the identification of threats, weak signals of change, collaboration
networks and gap analysis. Therefore, the operationalization of results and action-orien-
tation are considered to be one of the most prominent features of the approach described in
this paper.
Secondly, the proposed TTM approach does not only rely on quantitative methods (i.e.
bibliometric analysis, scientometrics, or technology mining). The process benefits from a
combination of quantitative and qualitative methods, which go beyond mere data collec-
tion and analysis. Hence, the process is more inclusive, creative and strategy focused with
the use of expert consultations, possibility of integrating scenario methods formulated
around the weak signals of future changes, and through the use of gap analysis and
strategic roadmaps for long, medium and short term strategies.
Thirdly, the proposed TTM approach benefits from a wide variety of data and infor-
mation sources. Most of the existing studies rely merely on the analysis of patent or
publication data. However, the approach described in the present paper draws upon a wide
variety of sources to gather intelligence for instance on research and development through
the analysis of publications; technology and product intelligence through the analysis of
patents; and market intelligence through media analysis. All these provide lenses for the
analysis of different levels of development through the STI life cycle.
Furthermore, the TTM study described above made use of a new clustering algorithm,
which has been developed and tested in the framework of several research projects
undertaken by the Higher School of Economics. The integrated use of currently available
algorithms with the more customized one for trend analysis helped to explore more future-
oriented and novel content for discussion. The joint use of clustering tools also increased
the reliability of the outputs. The process of consultations and workshops indicated that the
experts found the trends identified complete and well-spotted without the need for adding
any further trends into the list. The final list of technology trends were considered to be
valid and representative of the dynamics of the technology area under investigation.
As a whole, the proposed methodology contributes to the TTM research with a novel
approach, which allowed developing tools and process for exploring all relevant infor-
mation sources with original and flexible clusterization mechanism and providing a sys-
tematic framework to translate results into practice. Future steps of the study will involve
1038 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
the extended use of TTM methodology in different domains, while improving the capa-
bilities of extracting the hidden patterns in data through quantitative and qualitative
improvements, and better and further integration with policy and strategy making
processes.
Acknowledgments The article was prepared within the framework of the Basic Research Program at the
National Research University Higher School of Economics (HSE) and supported within the framework of a
subsidy by the Russian Academic Excellence Project ‘5-100’. The authors are grateful for the immense help
of Sergey Kuznetsov’s team (Higher School of Economics) in the development of clustering algorithms and
Mr. Evgeny Klochickin (Ph.D. candidate at the Manchester Institute of Innovation Research Manchester
Business School) in the process of extracting and analysing the data.
References
Abraham, B. P., & Moitra, S. D. (2001). Innovation assessment through patent analysis. Technovation,
21(4), 245–252.
Anto
´n, P. S., Silberglitt, R., & Schneider, J. (2001). The global technology revolution: Bio/nano/materials
trends and their synergies with information technology by 2015. Santa Monica, CA: RAND.
Battelle. (2014). Battelle database. http://www.battelle.org. Last visited on February 3, 2014.
Blei, D., & Lafferty, J. (2007). A correlated topic model of science. Annals of Applied Statistics, 1(1),
17–35.
Campbell, R. S. (1983). Patent trends as a technological forecasting tool. World Patent Information, 5(3),
137–143.
Chao, C.-C., Yang, J.-M., & Jen, W.-Y. (2007). Determining technology trends and forecasts of RFID by a
historical review and bibliometric analysis from 1991 to 2005. Technovation, 27(5), 268–279.
Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific
literature. Journal of the American Society for Information Science and Technology, 57(3), 359–377.
Cobo, M. J., Lopez-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting,
quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets
theory field. Journal of Informetrics, 5(1), 146–166.
Corrocher, N., Malerba, F., & Montobbio, F. (2003). The emergence of new technologies in the ICT field:
Main actors, geographical distribution and knowledge sources. TENIA project. http://eco.uninsubria.
it/dipeco/quaderni/files/QF2003_37.pdf. Last visited on February 3, 2014.
Daim, T. U., Rueda, G., Martin, H., & Gerdsri, P. (2006). Forecasting emerging technologies: Use of
bibliometrics and patent analysis. Technological Forecasting and Social Change, 73(8), 981–1012.
Deloitte. (2012). Tech trends 2012: Elevate IT for digital business.http://www.deloitte.com/assets/Dcom-
UnitedStates/Local%20Assets/Documents/us_cons_techtrends2012_013112.pdf. Last visited on
February 3, 2014.
Dereli, T., & Durmusoglu, A. (2009). A trend-based patent alert system for technology watch. Journal of
Scientific and Industrial Research, 68(8), 674–679.
Fraunhofer ISI. (2014). Emerging technologies.http://www.isi.fraunhofer.de/isi-en/t/index.php. Last visited
on February 3, 2014.
Gartner. (2013). Top 10 strategic technology trends for 2013.http://www.gartner.com/technology/research/
top-10-technology-trends. Last visited on February 3, 2014.
Griffiths, T., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of
Sciences, 101(1), 5228–5235.
Guo, H., Weingart, S., & Borner, K. (2011). Mixed-indicators model for identifying emerging research
areas. Scientometrics, 89(1), 421–435.
IBM. (2013). IBM five in five.http://www.ibm.com/smarterplanet/us/en/ibm_predictions_for_future/ideas/
index.html. Last visited on February 3, 2014.
IEA. (2013). Tracking clean energy progress 2013.http://www.iea.org/publications/TCEP_web.pdf. Last
visited on February 3, 2014.
iKNOW. (2014). iKNOW database.http://community.iknowfutures.eu. Last visited on February 3, 2014.
ITU. (2014). Technology watch.http://www.itu.int/en/ITU-T/techwatch/Pages/default.aspx. Last visited on
February 3, 2014.
Scientometrics (2016) 108:1013–1041 1039
123
Author's personal copy
Kajikawa, Y., Yoshikawa, J., Takeda, Y., & Matsushima, K. (2008). Tracking emerging technologies in
energy research: Toward a roadmap for sustainable energy. Technological Forecasting and Social
Change, 75(6), 771–782.
Kim, Y. G., Suh, J. H., & Park, S. C. (2008). Visualization of patent analysis for emerging technology.
Expert Systems with Applications, 34(3), 1804–1812.
Kim, Y., Tian, Y., Jeong, Y., Jihee, R., & Myaeng, S.-H. (2009). Automatic discovery of technology trends
from patent text. In Proceedings of the 2009 ACM symposium on applied computing (pp. 1480–1487).
http://ir.kaist.ac.kr/papers/2008/20081025_2009_SAC_Camera-ready.pdf. Last visited on February 3,
2014.
Kostoff, R. N. (1999). Science and technology innovation. Technovation, 19(10), 593–604.
Kostoff, R. N. (2003). Science and technology text mining: Global technology watch, office of naval
research, technical report.http://www.dtic.mil/get-tr-doc/pdf?AD=ADA415863. Last visited on
February 3, 2014.
Kostoff, R. N., Boylan, R., & Simon, G. R. (2004). Disruptive technology roadmaps. Technological
Forecasting and Social Change, 71(1–2), 141–159.
Kostoff, R. N., Briggs, M. B., Solka, J. L., & Rushenberg, R. L. (2008). Literature-related discovery (LRD):
Methodology. Technological Forecasting and Social Change, 75(2), 186–202.
Kostoff, R. N., Eberhart, H. J., & Toothman, D. R. (1997). Database tomography for information retrieval.
Journal of Information Science, 23(4), 301–311.
Kummamuru, K., Dhawale, A., & Krishnapuram, R. (2003). Fuzzy co-clustering of documents and key-
words. In Proceedings of the 12th IEEE international conference on the fuzzy systems (Vol. 2,
pp. 772–777).
Kuznetsov, S. (2001). Machine learning on the basis of formal concept analysis. Automation and Remote
Control, 62(10), 1543–1564.
Lee, H.-J., Lee, S., & Yoon, B. (2011). Technology clustering based on evolutionary patterns: The case of
information and communications technologies. Technological Forecasting and Social Change, 78(6),
953–967.
Lee, S., Yoon, B., & Park, Y. (2009). An approach to discovering new technology opportunities: Keyword-
based patent map approach. Technovation, 29(6–7), 481–497.
Lux Research. (2014). Lux research database.http://www.luxresearchinc.com. Last visited on February 3,
2014.
Martino, J. P. (2003). A review of selected recent advances in technological forecasting. Technological
Forecasting and Social Change, 70(8), 719–733.
Microsoft-Fujitsu. (2011). Key ICT trends and priorities (Vol. 1). http://download.microsoft.com/
documents/Australia/InsightsQuarterly/IQ_IG%20Full%20Report. Last visited on February 3, 2014.
Morris, S., DeYong, C., Wu, Z., Salman, S., & Yemenu, D. (2002). DIVA: A visualization system for
exploring document databases for technology forecasting. Computers and Industrial Engineering,
43(4), 841–862.
Mrakotsky-Kolm, E., & Soderlind, G. (2009). Final recommendations towards a methodology for tech-
nology watch at EU level. STACCATO Deliverable 2.2.1. http://publications.jrc.ec.europa.eu/
repository/bitstream/111111111/12930/1/reqno_jrc50348_staccato%20tech%20watch.pdf. Last vis-
ited on February 3, 2014.
NISTEP. (2010). The 9th science and technology foresight. National Institute of Science and Technology
Policy, NISTEP report no 140 ‘The 9th Delphi Survey’. March 2010. http://www.nistep.go.jp/achiev/
sum/eng/rep140e/pdf/rep140se.pdf. Last visited on February 3, 2014.
NRC. (2005). Avoiding surprise in an era of global technology advances. Committee on Defense Intelli-
gence Agency Technology Forecasts and Reviews, National Research Council. http://www.nap.edu/
catalog.php?record_id=11286. Last visited on February 3, 2014.
OECD. (2007). Infrastructure to 2030 (volume 2): Mapping policy for electricity, water and transport.
http://www.oecd.org/futures/infrastructureto2030/40953164.pdf. Last visited on February 3, 2014.
ONR. (2014). Office of Naval Research website.http://www.onr.navy.mil. Last visited on February 3, 2014.
Palomino, M. A., Vincenti, A., & Owen, R. (2013). Optimising web-based information retrieval methods for
horizon scanning. Foresight, 15(3), 159–176.
Porter, A. L. (2005). QTIP: Quick technology intelligence processes. Technological Forecasting and Social
Change, 72(9), 1070–1081.
Porter, A. L., & Cunningham, S. W. (2005). Tech mining: Exploiting new technologies for competitive
advantage. New York, NY: Wiley.
Porter, A. L., & Newman, N. C. (2011). Mining external R&D. Technovation, 31(4), 171–176.
Ruotsalainen, L. (2008). Data mining tools for technology and competitive intelligence. Espoo 2008. VTT
Tiedotteita—Research Notes 2451.
1040 Scientometrics (2016) 108:1013–1041
123
Author's personal copy
Saritas, O. (2013). Systemic foresight methodology. In D. Meissner, L. Gokhberg, & A. Sokolov (Eds.),
Foresight and science, technology and innovation policies: Best practices (pp. 83–117). Berlin:
Springer.
Shell. (2007). The Shell Global Scenarios to 2025. The future business environment: trends, trade-offs and
choices.http://www-static.shell.com/content/dam/shell/static/aboutshell/downloads/our-strategy/shell-
global-scenarios/exsum-23052005.pdf. Last visited on February 3, 2014.
Shibata, N., Kajikawa, Y., & Sakata, I. (2010). Extracting the commercialization gap between science and
technology—Case study of a solar cell. Technological Forecasting and Social Change, 77(7),
1147–1155.
Shibata, N., Kajikawa, Y., & Sakata, I. (2011). Detecting potential technological fronts by comparing
scientific papers and patents. Foresight, 13(5), 51–60.
Shibata, N., Kajikawa, Y., Takeda, Y., & Matsushima, K. (2008). Detecting emerging research fronts based
on topological measures in citation networks of scientific publications. Technovation, 28(11), 758–775.
Silberglitt, R., Anto
´n, P. S., Howell, D. R., & Wong, A. (2006). The global technology revolution 2020, In-
depth analysis: Bio/nano/materials/information trends, drivers, barriers and social applications.http://
www.rand.org/content/dam/rand/pubs/technical_reports/2006/RAND_TR303.pdf. Last visited on
February 3, 2014.
Smalheiser, N. R. (2001). Predicting emerging technologies with the aid of text-based data mining: The
micro approach. Technovation, 21(10), 689–693.
Steyvers, M., Griffiths, T. (2007). Probabilistic topic models. In D. McNamara, S. Dennis, W. Kintsch
(Eds.), Handbook of latent semantic analysis. Psychology Press, Hove. ISBN 978-0-8058-5418-3.
TechCast. (2014). TechCast database.http://www.techcast.org. Last visited on February 3, 2014.
TrendHunter. (2014). TrendHunter database.http://www.trendhunter.com. Last visited on February 3, 2014.
Tseng, Y. H., Lin, Ch. J., & Lin, Y. I. (2007). Text mining techniques for patent analysis. Information
Processing and Management, 43(5), 1216–1247.
UK Government Office for Science. (2010). Technology and innovation futures: UK growth opportunities
for the 2020s.http://www.northamptonshireobservatory.org.uk/docs/doc10-1252-technology-and-
innovation-futures[1]101105145732.pdf. Last visited on February 3, 2014.
Upham, S. P., & Small, H. (2010). Emerging research fronts in science and technology: Patterns of new
knowledge development. Scientometrics, 83(1), 15–38.
Wang, M. Y., Chang, D. S., & Kao, Ch.-H. (2010). Identifying technology trends for R&D planning using
TRIZ and text mining. R&D Management, 40(5), 491–509.
Wouters, C., Dillon, T., Rahayu, W., & Chang, E. (2002). A practical walkthrough of the ontology
derivation rules. In R. Cicchetti et al. (Eds.), DEXA 2002, LNCS 2453 (pp. 259–268).
Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend.
Journal of High Technology Management Research, 15(1), 37–50.
Zhu, D., & Porter, A. L. (2002). Automated extraction and visualization of information for technological
intelligence and forecasting. Technological Forecasting and Social Change, 69(5), 495–506.
Z-Punkt. (2014). Trend Radar 2020.http://www.z-punkt.de/trend-radar2020.html. Last visited on February
3, 2014.
Scientometrics (2016) 108:1013–1041 1041
123
Author's personal copy
... The development trend of technology refers to the pattern and path in which technology evolves continuously over a specific period [4]. Identifying the development trend of technology is the basis for policymakers, managers, and developers to understand the technological history and scout technological development direction [5]. ...
... A technology trend can be considered as a continuously growing technology area with a certain pattern, and the pattern should have existed for a certain period [4]. The technological development trend can be well identified by monitoring the changes in technology. ...
Article
High technologies play an important role in social progress and economic growth. Monitoring the development trend and competition status of high technology is crucial for policymakers, managers, and developers. This study proposes a framework that integrating the analysis of technological evolutionary trends and competitive status, the results of which will be helpful for engineers and developers for technology scouting and forecasting, but also for policymakers and managers for research and development (R&D) strategy. The electronic design automation (EDA) technology is selected as a case study. The patent analysis and bibliographic coupling are used to analyze the overview and identify the critical technical areas of EDA technology. This research also identifies the competition status of the main competitors in the EDA field from three aspects, including the market and technology fields, R&D capability, and comprehensive evaluation of patents. Our results show that the United States has an absolute advantage in the competition status of EDA technology. This study contributes to monitoring the development situation and competition status of high technologies, and will be of interest to policymakers and engineers.
... Furthermore, the overall context has to be linked to support the strategic (innovation) process. Depending on the object of investigation, the discipline of trend management uses various methods, such as online research and databases, Delphi studies, expert knowledge, patents, and scientific publications, but also customer complaints and the venture capital market (Ena et al., 2016;Maier et al., 2016). Especially tools such as text and web mining can be used to automatically collect and process data from the Internet, as it became a hub for discoveries and inventions (Johnson & Gupta, 2012). ...
... Due to the comprehensive nature of global value creation, it is extremely difficult to assess future developments. Changes in trends, trend breaks, and the probability of occurrence can significantly complicate its management (Ena et al., 2016). This complexity is composed of the increasing number of trend signals and thus the increase of possible relations among them as well as the degree of divergence (Thede, 2014). ...
Chapter
Disruptive technologies in the context of Industry 4.0, such as the Internet of Things, Big Data Analytics, smart devices, or Artificial Intelligence are developing at a rapid pace with an increasing impact on global value creation. Companies are facing major challenges in maintaining an overview of these developments and linking them to other trends. Thus, the aim of this article is the development of a comprehensive trend management process for the structured collection and processing of Industry 4.0 trend information. The process is divided into the steps identification, analysis and evaluation, processing, and preparation and is supplemented by continuous monitoring and improvement through knowledge management. Special attention is paid to the development of Industry 4.0, which enables an automated collection of information through the application of web and text mining, which can significantly improve the preparation of information.
... To monitor technological trends and forecast emerging technology, researchers have been using bibliometric analysis of many data sources, especially research articles and patents. Most bibliometric analysis works are based on one type of data source, although using multiple sources has been reported and recommended as they are complementary [3,7,[46][47][48]. In comparison, publications and patents capture different stages of the research-to-product pathway. ...
... Moreover, patents reflect the ability of a company to transform scientific results into technological applications. Thus, patents can serve as the sources for economic potential analysis together with legal status and protection [46]. ...
Article
Full-text available
Identifying emerging technology trends from patents helps to understand the status of the technology commercialization or utilization. It could provide research insights leading to advanced technological innovations that stimulate socially responsible research to address human dietary and medical needs. However, few studies have investigated emerging chitosan applications using patents. In this study, we report the application of a patent bibliometric predictive intelligence (PBPI) model to identify emergent topics and technology convergence related to chitosan applications from patents in the International Patent Classification system. Text mining was used to extract patterns from 5001 patents and each term was assigned an emergent score, following which we traced growth patterns, examined relationships between IPCs, emergent topics, and patents using correlation analysis and principal component analysis, and conducted matrix and cluster mapping analysis to understand industrial applications and explore patterns of technological convergence. Five major terms emerged in association with ascending and newly emergent topics over the last 13 years: “shelf life,” “antibacterial,” “good safety,” “absorbing water,” and “auxiliary materials.” These topics were closely linked with research in the biomedical and food production and preservation industries. A network analysis indicated that “antibacterial” terms exhibited the highest degree of convergence, followed by “shelf life.” These findings can inform strategies to determine new directions for chitosan research.
... The technology trend is assessed as a steadily growing practical technology area with a specific pattern, which has been existing for a particular time period [10]. Numerous approaches have been proposed recently to explore patterns and analyze and predict technology trends. ...
Article
Full-text available
The development of emerging technologies not only has recently affected current industrial production but also has generated promising manufacturing opportunities that impact significantly on social and economic factors. Exploring upcoming renovation tendencies of technologies prematurely is essential for governments, research and development institutes, and industrial companies in managing strategies to achieve dominant advantages in business competitiveness. Additionally, the prospective changes, the scientific research directions, and the focus of technologies are crucial factors in predicting promising technologies. On the other hand, Industry 4.0 revolutionizes standards and models by accompanying significant technology developments in numerous sectors, including the sector of Smart energy. Moreover, asset performance management is always a prominent topic that has attained prevalence over the last decade because numerous challenges force all industrial companies to optimize their asset usability. However, to the best of our knowledge, no study reported an analysis of technology trends of asset performance management in the Smart energy sector by using proper data mining methods. Hence, this paper aims to fill in this gap and provide an analysis of technology trends of asset performance management in the Smart energy sector by structuring and exploring research subjects, considering problems, and solving methods with numerous experiments on scientific papers and patent data.
... In search of solutions, a wide variety of approaches to technological monitoring have been proposed. By relying on simple keyword-based Internet searches, they range from interviews to sophisticated text mining systems searching for hidden patterns [6]. Among those search strategies, Technological Surveillance (TS) has gained prominence. TS seeks to establish a monitoring process that ranges from collecting data to communicating insights about it to decision-makers. ...
Article
Full-text available
For many years, the organizations had monitored the technological environments to anticipate changes with potentially positive or negative impacts on their business using technological surveillance process. However, the new Big Data scenarios turned the traditional tools and methods no longer sufficient. This paper proposes an automated technological surveillance method by using a map-reduce model to deal with Big Data scenarios divided into five processes: planning, collection, organization, intelligence, and communication. We implemented a system prototype to validate the proposed approach. It was developed in Python and Javascript, using ontologies for knowledge modeling, NoSql database to store and parallel processing of the publications. The system collected 2,918 publications, identified the monitored technologies, extracted the metadata, analyzed them, and generated charts for the stakeholders. In conclusion, the method demonstrated be feasible to automate the technology watch process in Big Data scenarious and dramatically reduced the workload involved when implemented by a system, offering a solid approach to automatically identify a set of technologies with increasing popularization in Web portals.
... e feasibility of technology forecasting for self-driving technology should be discussed first. Some restrictions should be fulfilled so that technology forecasting can be applied to a certain technology [10,11]. In this paper, self-driving technology is assumed that (a) the data in the past contain all the information necessary for the future and (b) the technology trend of self-driving technology is considered to be a continuously growing technology area with a certain pattern, and the pattern of this trend has existed for a certain period of time. ...
Article
Full-text available
Due to the demand for safety and convenience in traveling, self-driving technology has developed very fast in the past decades. In this paper, a novel technology forecasting model is developed. The topic-based text mining and expert judgment approaches are combined to forecast the technology trends efficiently and accurately. To improve the reliability of the results, multidimensional information including scientific papers, patents, and industry data is considered. Then, the model is utilized to forecast the development trends of self-driving technology in China. Data ranging from 2002 to 2019 are adopted with proper data cleaning. Topic clustering for papers and patents is performed, and the hierarchical structures are constructed. On this basis, the results of technology’s evolution based on papers and patents are compared and the development trends are obtained. With these results, it is speculated that technology on “Decision” will be the next hotspot in patents. The research results of this paper will provide reference and guidance for Chinese enterprises and government in decision-making on self-driving technology.
Article
Carbon capture and storage (CCS) is an important strategic technology choice for the future reduction of greenhouse gas emissions, which plays a crucial role as part of an economically sustainable route. Although some studies have discussed CCS demonstration projects and early developments, CCS has not been investigated as an innovative niche for sustainable transition. In an attempt to fill this gap, this study studies CCS as an innovative niche and assess the niche development of CCS in China by using a strategic niche management (SNM) framework. The evolution of three interlinked niche processes (expectations, social networks and learning process) is analysed with a comprehensive analytical approach that includes policy, social network and bibliometric analysis. Results show that a CCS niche has formed in China, but it is at a relatively low level of niche development in terms of the three internal processes. The expectations for CCS evolved linearly and are constantly being reinforced. The actor network supporting CCS niche development is sparse, and network cohesion is gradually decreasing. CCS learning continues to deepen but remains insufficient; it focuses on technical aspects and provides little attention to policy and social issues. The results indicate that expectations have been well-established at various levels in China, but the network and the learning process still need to be improved. Our findings help to identify problems in the development of technology and provide useful references for technology planning and the further development of CCS.
Chapter
To stay competitive in an environment of rapidly changing science, it is important to monitor the development of existing technology and to discover new and promising technologies. Similarly, it is necessary for a firm to establish a technology development strategy through emerging technology forecast to gain a competitive edge while utilizing limited resources. Numerous methods of emerging technology trend analysis and forecast (TTAF) have been proposed; however, no study described data mining methods’ review of this research area in a systematic and structured procedure. Hence, this paper intends to give a review of TTAF data mining methods and shortages by surveying and constructing challenging problems, research and resolving approaches. Moreover, the study highlights adopted data mining methods and types of data sources. Specifically, 50 documents from SCOPUS over a ten-year timespan between 2010 and 2019 were systematically reviewed, and each performing step was followed properly in accordance with systematic mapping study.
Chapter
Technological Surveillance systems stand out as a structured way to assist organizations in monitoring their internal and external technological environments, in order to anticipate changes. However, since the volume of digital data available keeps growing, it becomes increasingly complex to keep this type of system running without proper automation. This paper proposes an automated MapReduce-based method for technological Surveillance in Big Data scenarios. A prototype was developed to monitor key technologies in specialized portals in the Furniture and Wood sector, in order to illustrate the proposed method. The proposal was evaluated by industry experts, and the preliminary results obtained are very promising.
Article
Online communities are a rapidly growing knowledge repository that provides scholarly research, technical discussion, and social interactivity. This abundance of online information increases the difficulty of keeping up with new developments difficult for researchers and practitioners. Thus, we introduced a novel method that analyses both knowledge and social sentiment within the online community to discover the topical coverage of emerging technology and trace technological trends. The method utilizes the Weibull distribution and Shannon entropy to measure and link social sentiment with technological topics. Based on question-and-answer and social sentiment data from Zhihu, which is an online question and answer (Q&A) community with high-profile entrepreneurs and public intellectuals, we built an undirected weighting network and measured the centrality of nodes for technology identification. An empirical study on artificial intelligence technology trends supported by expert knowledge-based evaluation and cognition provides sufficient evidence of the method's ability to identify technology. We found that the social sentiment of hot technological topics presents a long-tailed distribution statistical pattern. High similarity between the topic popularity and emerging technology development trends appears in the online community. Finally, we discuss the findings in various professional fields that are widely applied to discover and track hot technological topics.
Chapter
Full-text available
Based on the ideas of systems thinking, the Systemic Foresight Methodology (SFM) proposes a framework for designing and implementing Foresight activities. This framework recognises the complexities that emerge due to multifaceted interplays between the Social, Technological, Economic, Ecological, Political and Value (STEEPV) systems. To conducting Foresight systemically, we need to undertake a set of ‘systemic’ thought experiments, in which systems (e.g. human and social systems, industrial/sectoral systems, and innovation systems) are understood and modelled, and hopefully intervened in, for a successful change programme. These experiments are conducted in a series of iterative phases that we label (1) Intelligence (scoping, surveying and scanning phase) (2) Imagination (creative and diverging phase), (3) Integration (ordering and converging phase), (4) Interpretation (strategy phase), (5) Intervention (action phase), and (6) Impact (evaluation phase); (7) an Interaction phase (participation) goes on throughout the activity. The paper describes each of the phases and proposes a set of quantitative and qualitative methods, which can be combined to form research, policy, technology, and innovation paths. The ideas discussed in the light of two Systemic Foresight cases, dealing with Higher Education and Renewable Energy sectors. SFM was used to provide a methodological orientation for these the Foresight exercises, where a variety of methods were selected and combined in line with the objectives of and available resources for the Foresight exercises.
Article
Full-text available
Approximately 80 % of scientific and technical information can be found from patent documents alone, according to a study carried out by the European Patent Office. Patents are also a unique source of information since they are collected, screened and published according to internationally agreed standards. In addition to being an extremely valuable source of technology intelligence, patent documents offer a business competitive intelligence. Being aware of the state of the art of relevant technology areas is crucial for a company's innovation process. Knowledge of developed techniques and products forestalls overlapping R&D projects and thereby prevents unnecessary investment. Equally important is the recognition of other actors operating in the field. Benchmarking and evaluating a competitor's R&D and market strategies aids in managing one's own processes and locating possible parties for collaboration or cross-licensing. Since the patent system was established, more than 60 million patent applications have been published. It would be impossible to find and analyze relevant documents manually. This publication describes the results and observations obtained in a study testing four sophisticated patent analysis and visualization tools. The tools were tested with two cases, evaluating their ability to offer technology and business intelligence from patent documents for companies' daily business.
Technical Report
As part of the PASR 2006 supporting activity STACCATO an investigation into a European ¿Technology Watch¿ for security was conducted using amongst other means an expert workshop. This activity was run in parallel to the competence mapping of the security industry and academia of Europe which resulted in a database and separate report (STACCATO Deliverable 2.1.1). Primary tasks for the technology watch could be 1: policy support: i.e. identifying security areas/topics in need of greater European focus and feedback on outcome of enacted policy; 2: ¿technology warning¿, monitoring emerging technologies for possible side effects detrimental to societal security; 3. technology transfer support, particularly for SME¿s since they are generally not capable of fielding their own technology watch efforts. A Technology watch effort should be based on one or more already existing stakeholder networks of established European actors in the security arena ¿ a ¿club of the willing¿, containing at least 10 actors from both industry and academia. Such an organisation would need to be relied upon to be neutral and impartial, thus the work process must be fully transparent and open to scrutiny, i.e. with all conclusions traceable to their individual sources. The organisation and supporting processes need to promote speed and flexibility, in particular in order to help accelerate the standardisation process of emerging technologies and solutions.
Article
Database tomography is an information extraction and analysis system which operates on textual databases. Its primary use to date has been to identify pervasive technical thrusts and themes, and the interrelationships among these themes and sub-themes, which are intrinsic to large textual databases. Its two main algorithmic components are multiword phrase frequency analysis and phrase proximity analysis. This paper shows how database tomography can be used to enhance information retrieval from large textual databases through the newly developed process of simulated nucleation. The principles of simulated nucleation are presented, and the advantages for information retrieval are delineated. An application is described of developing, from Science Citation Index and Engineering Compendex, a database of journal articles focused on near-Earth space science and technology.
Article
Purpose – Web-based information retrieval offers the potential to exploit a vast, continuously updated and widely available repository of emerging information to support horizon scanning and scenario development. However, the ability to continuously retrieve the most relevant documents from a large, dynamic source of information of varying quality, relevance and credibility is a significant challenge. The purpose of this paper is to describe the initial development of an automated web-based information retrieval system and its application within horizon scanning for risk analysis support. Design/methodology/approach – Using an area of recent interest for the insurance industry, namely, space weather — the changing environmental conditions in near-Earth space — and its potential risks to terrestrial and near-Earth insurable assets, the authors benchmarked the system against current information retrieval practice within the emerging risks group of a leading global insurance company. Findings – The results highlight the potential of web-based horizon scanning to support risk analysis, but also the challenges of undertaking this effectively. The authors addressed these challenges by introducing a process that offers a degree of automation — using an API-based approach — and improvements in retrieval precision — using keyword combinations within automated queries. This appeared to significantly improve the number of highly relevant documents retrieved and presented to risk analysts when benchmarked against current practice in an insurance context. Originality/value – Despite the emergence and increasing use of web-based horizon scanning in recent years as a systematic approach for decision support, the current literature lacks research studies where the approach is benchmarked against current practices in private and public sector organisations. This paper therefore makes an original contribution to this field, discussing the way in which web-based horizon scanning may offer significant added value for the risk analysts, for what may be only a modest additional investment in time.
Article
Open Innovation presses the case for timely and thorough intelligence concerning research and development activities conducted outside one’s organization. To take advantage of this wealth of R&D, one needs to establish a systematic “tech mining” process. We propose a 5-stage framework that extends literature review into research profiling and pattern recognition to answer posed technology management questions. Ultimately one can even discover new knowledge by screening research databases.Once one determines the value in mining external R&D, tough issues remain to be overcome. Technology management has developed a culture that relies more on intuition than on evidence. Changing that culture and implementing effective technical intelligence capabilities is worth the effort. P&G's reported gains in innovation call attention to the huge payoff potential.
Article
Technology trend analysis anticipates the direction and rate of technology changes, and thus supports strategic decision-making for innovation. As technological convergence and diversification are regarded as emerging trends, it is important to compare the growth patterns of various technologies in a particular industry to help understand the industry characteristics and analyse the technology innovation process. However, despite the potential value of this approach, conventional approaches have focused on individual technologies and paid little attention to synthesising and comparing multiple technologies. We therefore propose a new approach for clustering technologies based on their growth patterns. After technologies with similar patterns are identified, the underlying factors that lead to the patterns can be analysed. For that purpose, we analysed patent data using a Hidden Markov model, followed by clustering analysis, and tested the validity of the proposed approach by applying it to the ICT industry. Our approach provides insights into the basic nature of technologies in an industry, and facilitates the analysis and forecasting of their evolution.
Article
Empirical technology analyses need not take months; they can be done in minutes. One can thereby take advantage of wide availability of rich science and technology publication and patent abstract databases to better inform technology management. To do so requires developing templates of innovation indicators to answer standard questions. Then, one can automate routines to generate composite information representations (“one-pagers”) that address the issues at hand, the way that the target users want.