IDENTIFYING EMERGING TOPICS BY
COMBINING DIRECT CITATION AND CO-
Henry Small1, Kevin W. Boyack2 and Richard Klavans3
SciTech Strategies, Inc., Bala Cynwyd, PA 19004 USA
SciTech Strategies, Inc., Albuquerque, NM 87122 USA
SciTech Strategies, Inc., Berwyn, PA 19312 USA
We present a novel approach to identifying emerging topics in science and technology.
An existing co-citation cluster model is combined with a new method for clustering based
on direct citation links. Both methods are run across multiple years of Scopus data, and
emergent co-citation threads in a specific year are matched against the direct citation
clusters to obtain the emergent topics ranked by a difference function. The topics are
classified and characterized in various ways in order to understand the motive forces
behind their emergence, whether scientific discovery, technological innovation, or
exogenous events. Cross-sectional analysis of citation links and paper age are used to
study the process of emergence for discovery based science topics.
Research Fronts and Emerging Issues (Topic 4); Modeling the Science System, Science
Dynamics and Complex System Science (Topic 11)
Researchers in information science have long pondered how and why scientific
topics emerge. Derek Price famously analyzed the emergence of the topic of N-
rays using a citation network represented as a matrix (1965). Eugene Garfield
studied the development of genetics by constructing a node and link citation
network that he called a historiography (Garfield, Sher, & Torpie, 1964). Later
on co-citation clusters were used to detect emergence (Small, 1977), and more
recently co-authorship networks (Bettencourt, Kaiser, Kaur, Castillo-Chavez, &
Wojick, 2008) and direct citations (Shibata, Kajikawa, Takeda, & Matsushima,
2008) have been used for the same purpose.
Methods differ in the degree of foreknowledge used. Most rely on a case study
approach where a literature search is conducted for a specific topic expected to be
emergent, and then methods are used to verify that, in fact, emergence has
occurred. These might be termed local methods because only a literature local to
the targeted topic is used. More a priori or global approaches, in contrast, make
no assumptions about what new areas might have emerged. Global approaches are
based on a comprehensive analysis of an entire literature database by methods
such as cluster analysis using co-citation, bibliographic coupling (Boyack &
Klavans, 2010), or other methods such as topic modelling (Blei & Lafferty,
2007). An important new methodology which uses simple citation links has
recently been developed (Waltman & Van Eck, 2012) which uses a variant of
modularity clustering and takes normalized direct citation links as input. The
method arrives at an assignment of papers to clusters by maximizing a function
that rewards linked papers if they are in the same cluster and penalizes them if the
papers in the same cluster are not linked. An optimization algorithm is used to
maximize the function. Interestingly this new method turns the original local
methods of Price and Garfield into global methods with the ability to
automatically break up huge multiyear citation link databases into what are, in
effect, separate historiographs. In this paper we will use a unique marrying of two
global methodologies, direct citation clustering and co-citation clustering, for the
purpose of identifying emerging topics in science and technology.
The co-citation method forms clusters of cited papers based on their joint citation
in an annual slice of a citation database, and assigns current papers from that
annual slice to one or more of the clusters based on their referencing patterns.
The resulting clusters tend to be small and narrowly focused at the scientific
problem level. The annual solutions are then merged to form threads which
connect clusters in adjacent year slices based on shared cited papers (Klavans &
Boyack, 2011). This merges the yearly cluster slices into a longitudinal picture.
The resulting threads can be classified by their duration. For example, possibly
emergent threads for a given year are considered to be those that begin in the
previous or current year, that is, are only one or two years old. It is then possible
to identify all papers from a given year that belong to potentially emergent
Unlike co-citation which relies on the joint citation of earlier papers, the direct
citation clusters are based simply on the citation of individual papers by each
other and finds local concentrations of citation links by maximizing a modularity
criterion. The process generates clusters that are much larger and more broadly
focused than the co-citation model. The resulting direct citation networks, like
the co-citation threads, are of varying duration and involve different numbers of
papers per year.
Once the co-citation threads and direct citation clusters are in hand, the task is to
select those direct citation clusters that are the most emergent in specific years.
The approach used is to count the papers in the direct citation clusters that belong
to emergent threads (one or two years old) in the co-citation model. This is done
on a year by year basis, so the direct citation clusters having the highest emergent
counts in a given year can be identified. In addition, the number of papers in a
matching direct citation cluster in a set of prior years (greater than two years prior
to the emergent year) is subtracted from the emergent year counts to avoid
selecting areas with high publication activity in prior years. This ensures that the
emergent topics are increasing in size in addition to containing many papers
belonging to emergent threads. There are of course numerous variations of
selection criteria that could be attempted, but by combining evidence from both
forms of analysis we can take advantage of the high precision of the co-citation
model and the stronger growth characteristics of the direct citation model. The
difference between the emergent year counts and the prior year counts provides a
metric on which to rank the emergent topics in a given year. We call this the
Figure 1 is an example of how a direct citation cluster is matched with emerging
co-citation threads. The topic is computed tomography angiography and the year
of emergence is 2007. The graph shows the growth in number of citing papers by
year in the direct citation cluster, superimposed on which are matching co-citation
threads which start in 2007 and hence are considered emergent. The numbers of
papers in emergent threads that match the direct citation cluster are given in the
thread boxes. Only some of the matching threads are shown. The sum of the
matching papers minus papers prior to 2005 in the direct citation cluster gives the
Figure 1. Matching a direct citation cluster and emerging co-citation threads on the
topic of computed tomography angiography. The matching papers in 2007 are given
in the thread boxes. The number of papers in the direct citation cluster is above each
The data set used is a 15 year Scopus database (1996–2010) under a special
arrangement with Elsevier. Direct citation clustering was carried out on this
compilation using CWTS open access software (Waltman & Van Eck, 2012).
Existing co-citation clusters and threads were also used covering the same time
period. The years 2007-2010 were selected for identification of the top 25
emerging topics. The emerging threads (one or two years old) were identified for
each year and their papers matched against the direct citation cluster papers for
the same year. The number of matching papers minus the papers in the direct
citation cluster greater than two years prior to the emergent year gave the
emergence differential which was used to rank the topics in each year. A total of
71 distinct topics were selected across the four years, 50 of which appeared in
only one year, and the remaining 21 in two or more years. Six topics were in the
top 25 for three years, but none appeared in all four years. We will focus here on
the topics for 2010 which are listed in Table 1.
The first column of table 1 gives the rank number of the direct citation cluster
determined by sorting the emergence differential. A topic name is given in the
second column which is based on a manual analysis of the titles and abstracts of
2010 papers in the intersection of the direct citation and emerging co-citation
clusters. The third column labelled “type” is a categorization of the type of event
mainly responsible for the emergence. We consider three types of events:
discovery, innovation and exogenous. The categorization was made by
examination of the 2010 papers in the topic and the papers they cited.
“Discovery” refers to scientific areas where an unexpected finding is made or
fundamental knowledge is gained. An example is the first topic on the list, iron-
based high temperature superconductivity, which was a discovery of
superconductivity in a new class of materials not previously thought to be a good
candidate for superconductivity.
The “innovation” category refers to areas of technology where existing science or
technology is used to create new devices or capabilities that serve specific
purposes. An example is cognitive radio which takes a new approach to assigning
radio spectrum. The third category “exogenous” refers to factors external to
science and technology, such as natural disasters, health threats, or societal events
with major impacts such as the launch of a new web product or a government
standard. An example is the second topic on the list, the swine flu pandemic of
2009, in which the global spread of a virus mobilized the health care community
to understand and combat the disease. If an innovation or discovery topic also
involves an exogenous event, a combined code is used. For example, the flu
pandemic is considered both a discovery and exogenous because a new virus was
discovered and it was a worldwide health event. Another example is topic 18 on
crystallographic evaluation where a new software service was introduced to
validate crystal structures. It should also be clear that discovery topics can also
involve elements of technological innovation and vice versa. What is sought here
is the main catalyst of emergence.
Table 1. 2010 top 25 emerging topics. Abbreviations: r = rank; dis = discovery; inn =
innovation; exo = exogenous; year Ev = year of event; year HC = year of most cited
paper; year Em = year of first emergence; Ev to HC = time lag from event to most
cited paper; Ev to Em = time lag from event to first emergence; H = H index.
swine flu (H1N1) pandemic
spectrum sensing in cognitive
graphene nanosheets and
Horava-Lifshitz quantum gravity
graphene oxide nanosheets
induced pluripotent stem-cells
signal recovery from compressed
graphene transistors and optical
zigzag graphene nanoribbons
cardiovascular events in type 2
spectrum allocation in cognitive
IDH1 and IDH2 mutations in
H1N1 pandemic and seasonal flu
mechanical properties of graphene
online social networking
cognitive radio networks
“Discovery” was the most common category with 12 topics. The combination of
“discovery/exogenous” had four topics, and these were mostly medical such as
the flu virus or a drug trial (topic 12). “Innovation” had only three topics, for
example, a new mathematical approach to signal compression (topic 9). The
combination “innovation/exogenous” had, however, six instances, suggesting that
technology areas often have an exogenous component. Many of these
combinations were computer science oriented involving, for example, a new
programming system (topic 8) or launch of a new web service (topic 21) that
stimulates research. Overall “discovery” applied to about two-thirds of topics,
“innovation” to one-third, and about 40 percent of topics had “exogenous”
A more detailed analysis of the causative factors for emergence suggests that in
most cases the publication of a new idea is what sets the stage for the emergence.
Fifteen of the 25 topics follow this pattern. In other cases the causative event was
the launch of a technology such as cloud computing services (topic 23) or a new
data management framework from Google (topic 8). Also government actions
such as DARPA’s architecture for cognitive radio (topic 24), or the failure of a
clinical trial (topic 12) can spark new research.
The fourth column labelled “year Ev” gives the year of the event. In cases where
a specific paper is driving emergence, this is the publication year of the paper.
This year may or may not correspond to the year of the most cited paper given in
the fifth column labelled “year HC”. Citation counts are determined by collecting
all references from the 2010 papers that are in the intersection of the direct
citation cluster and the emerging co-citation threads. Hence, this count is local to
a specific set of 2010 papers and differs from the global citation count found in
Scopus. Local citation counts are used because we want to assess the importance
of the paper to the specific topic. Examples of where the most cited paper differs
from the paper that appears to have directly stimulated the topic are some of the
graphene related areas. The most cited paper for these topics is usually the
original graphene discovery paper by Novoselov and Geim (2004), while the
paper most germane to the specific graphene topic often corresponds to a less
cited paper, but usually within the top three or four.
The sixth column labelled “year Em” is the year in which the topic was observed
to emerge in the top 25 going back to 2007. Because we have generated top 25
lists for each year from 2007 to 2010, it is possible that a given topic will be in the
top 25 for multiple prior years. This is illustrated in Figure 2 which plots the rank
of topics which have appeared in the top 25 in three consecutive years from 2007
to 2010. For example, the iron-based superconductor topic was ranked first for
three consecutive years from 2008-2010, while induced pluripotent stem-cells
rose from rank 19 in 2008 to rank 7 in 2010, and social tagging fell from rank 1 in
2007 to rank 19 in 2010. Fourteen of the 25 topics in 2010 appeared in the
ranking for the first time in 2010, and it is likely that several of these topics will
fall out of the top 25 ranking in 2011.
The seventh and eighth columns labelled “Ev to HC” and “Ev to Em” give two
time lags of interest: the time lag from the emergence event to publication of the
most cited paper, and the lag from the event to the year of first emergence. In the
former, lags will be positive if the most cited paper is published after the
emergence event and negative if the most cited paper precedes the key event. The
negative time lags are due to the graphene discovery paper being published prior
to the highly cited paper closest to the topic in content. Positive time lags tend to
be associated with exogenous stimuli, such as a software system, web products, or
government standards that stimulate research and result in highly cited papers at
later dates. Across all topics, the average lag from event to most cited paper is
near zero. The second type of lag shown in the column labelled “Ev to Em” is
more a measure of our system’s ability to detect emergence at an early stage.
Large positive lags indicate a delay in detection, and there are no negative lags.
The average delay in detection across the 25 topics is 2.5 years, and the largest
lags include both discovery and innovation cases where delays may be due to
technical or conceptual problems, as was possibly the case with some of the
graphene topics which were technically difficult.
Figure 2. Change in rank of topics in top 25 that appear in three or more years 2007
The last column labelled “H” gives the H index, the number of papers N cited at
or above N times. This indicates the number and citedness of highly cited papers
in the topic. The data suggest that low H values are associated with topics which
are driven by exogenous events, such as swine flu, cloud computing, and social
tagging. As one would expect, the H indexes are higher for topics associated with
specific discovery or innovation papers. The highest H index is for iron-based
superconductivity (topic 1), clearly a discovery based topic, while the lowest is
online social networking (topic 21) which is focused on analyses of data from
social network services such as Twitter and Facebook.
The topics were also coded for indications of any practical applications that
researchers hoped to achieve. Interestingly all of the topics, with the exception of
quantum gravity (topic 5), foresaw some type of practical application. About half
the topics envisioned specific devices or physical products, while the other half
anticipated improvements in services, for example, health care or software.
In the absence of a definitive list of emerging topics against which to evaluate this
list, we fall back on other types of evidence to corroborate that the topics are of
current importance, such as awards to authors of most cited papers or recognition
in the science press. The awards should be relevant to the topics and post-date the
highly cited work in question. Two Nobel Prizes were related to the topics, one
for graphene awarded to Novoselov and Geim in 2010 (topics 4, 6, 10, 11, 16,
20), and another to Shinya Yamanaka in 2012 for induced pluripotent stem-cells
(topic 7). Graphene was also named a runner-up to “Breakthrough of the Year”
by Science in 2009. Both graphene and induced pluripotent stem-cells have been
the object of recent bibliometric studies (Chen, Hu, Liu, & Tseng, 2012; Shapira,
Youtie, & Arora, 2012; Shibata, Kajikawa, Takeda, Sakata, & Matsushima,
Other highly cited authors also received recognition. In 2009 Hideo Hosono
received the Bernd T. Matthias Prize for his discovery of iron-based high
temperature superconductivity (topic 1), and in 2008 the topic was named a
runner up to “Breakthrough of the Year” by Science. Sir John Pendry was
awarded the UNESCO-Niels Bohr gold medal in 2009 and the 2010 Willis E.
Lamb Award for Laser Science and Quantum Optics for his work on
transformative optics and meta-materials (topic 13). In 2008 David Dohono
received the IEEE Information Theory Society Paper Award for his work on
compressed sensing (topic 9), an award he shared with the author of the second
most cited paper in the topic Emmanuel Candes. In 2010 Anthony Spek received
the Kenneth Trueblood award for his work in chemical crystallography and
crystallographic computing (topic 18). In addition, the swine flu virus (topics 2
and 17) was named “virus of the year” by Science in 2009, and in 2008 IDH1 and
IDH2 mutations in cancer (topic 15) was named a runner up to “Breakthrough of
the Year” by Science (topic 15).
While this search for awards is necessarily incomplete, it provides evidence that
at least some of the topics and their highly cited authors have received recent
recognition for work that has topical relevance.
Citations during emergence
To gain a better understanding of the process of emergence, the pattern of
citations was examined during the period of emergence for the first ranked topic –
iron-based superconductivity. The analysis is based on all citation links extracted
from the direct citation cluster for this topic. In this case a specific discovery
paper had appeared in 2008 which was critical to the topic. The procedure was to
make annual time slices into the citation network and compute the most cited
papers in each year.
Table 2 gives the ten most cited papers for each of three years, 2007-2009 which
spans the year of emergence 2008. We use letter codes to identify the papers and
also show the age of the cited papers with respect to the citing year. The
discovery paper is indicated by an asterisk, and the letter code for the paper is
underlined if the paper continues from the prior year.
First we observe a dramatic increase in the H index across the time slices
coinciding with the appearance of the discovery paper at the top of the ranking in
2008 when H goes from 3 to 30. Of course, this goes hand in hand with a rapid
increase in the number of papers and citations in the direct citation cluster.
Second we see a decrease in the age of the cited papers. In the year of emergence
the top seven papers have an age of 0, that is, were published in the citing year.
Third we see a low continuity of cited papers prior to emergence and a high
continuity of cited papers following emergence. Of course, high post-emergence
continuity leads to an aging of the highly cited work, which will continue unless
new papers become highly cited.
Table 2. Iron-based superconductivity top 10 papers by year during emergence
showing paper age, citations and continuity.
_ underline – continuing from previous year
* discovery paper
This suggests that the discovery event was sufficiently persuasive to immediately
dominate the community, stimulate a new crop of compelling findings and carry
this interest forward in time. We do not know yet whether this pattern holds for
other topics in the list, particularly those that are not so clearly associated with
specific discovery papers. Nevertheless the results suggest a general pattern which
might hold for discovery-based science where the combined factors of citedness,
age, and continuity are important indicators.
Despite the fact that citation data are often regarded as biased toward science, we
are struck by how strongly technology-based topics are represented. These topics
were generally categorized as innovation. Eight of the topics are clearly
technology-based, and a number of other more science-based areas such as
epitaxial graphene, metal-organic frameworks and transformative optics have
important technological components. Five of the technology topics are oriented
toward computer science, and their appearance possibly reflects the strong
representation of this subject in the Scopus database.
Since one factor in our detection methodology is growth in the direct citation
network, we could ask whether the topics identified are prone to bandwagon
effects. Such a tendency could be the result of an availability of a large pool of
researchers with adequate support to be able to rapidly exploit a new finding.
Such might be the case, for example, with the high temperature superconductivity
community within materials science and applied physics. Another way to pose
this question is to ask why we do not see more topics in basic physics, chemistry,
and biology, and whether such topics may have less dramatic growth
characteristics? Perhaps varying the selection parameters for matching direct
citation clusters and co-citation threads would give a stronger representation of
Another feature of the list that requires further research is the repetition of topics
within the top 25, such as the appearance of six graphene related topics and three
on cognitive radio. It is perhaps not surprising that a material of such practical
and theoretical interest as graphene should have such a strong representation. It is
usually possible to draw subtle distinctions between the various subtopics dealing
with graphene, and these distinctions are usually apparent in the citing papers as
well as a different mix of highly cited papers. The most likely explanation for
this repetition is an overly granular setting of the underlying direct citation
clustering parameters, or perhaps also the proneness of citation data to
A more fundamental question regarding the methodology we have used to
identify emerging topics is whether alternative methodologies would perform
equally well, or whether known cases of emergence during the 2007-2010 period
were missed. For example, could either the direct citation clusters or co-citation
threads be used on their own to detect emergence? Direct citation clusters have
measurable growth properties so a slope analysis looking for inflection points
might be possible. Alternatively, emergent co-citation threads could be grouped
using some alternative bibliometric measure independent of the direct citation
clustering and used as an emergence indicator. These possibilities remain to be
explored, but what we can say now is that the two methods, based on different
citation metrics and algorithms, can be used in a complementary manner that
takes advantage of the longitudinal and cross-sectional strengths of the respective
methods. Lacking any definitive list of emerging topics for the period, we cannot
say whether areas have been missed, but a good source of intelligence on this
question can be obtained from the Breakthrough of the Year listings in Science,
where we have seen some confirmation of our selections, but not a one-to-one
It seems clear that specific highly cited papers have played a key role in
emergence in 17 of 25 topics, including technological areas such as cognitive
radio and compressed sensing. It is likely that most of these discoveries and
innovations could not have been anticipated, even though with hindsight we
might be able to identify precursor papers in the direct citation network that might
foretell possible forthcoming breakthroughs. One task for future research will be
to use this list of topics and similar lists from other years to see if common
preconditions to discovery and innovation can be found. It is also of interest to
study the fate of these emerging topics in later years. Did work continue, decline
or disappear? We would not be surprised if some were proved to be errors, dead
ends, or continued under their own inertia until well past their prime. Having a
reasonably certain inventory of emergent topics as a quasi-gold standard opens up
many new research possibilities, for example, studies of sentiment words changes
during emergence, or correlated social network or institutional factors.
The role of exogenous events, which was a factor in 40 percent of topics, also
deserves further attention. Previous bibliometric case studies have been carried
out on topics such as the 9/11 and anthrax terrorist attacks (Chen, 2006; Morris,
Yen, Wu, & Asnake, 2003), but perhaps more common exogenous events are
disease or natural disaster-related. We do not know how pervasive such
influences are or in general the role that extra-scientific factors have in
emergence. As we delve more deeply into other topics, we may find further
evidence of exogenous stimuli. For example, in metal-organic frameworks (topic
25) it was not immediately obvious that the DOE had issued new targets for
Regarding our methodology, we do not know whether we can reduce the average
time lag of 2.5 years from the so-called emergence event to our detection of
emergence. This may depend on our ability to identify emergent co-citation
threads earlier perhaps by adjusting our threading threshold, since we know that
the slope of the direct citation cluster growth curve will not be steep at earlier
stages. Perhaps an indicator of network structure can also be devised.
In modelling the emergence process at the paper level we need to further
investigate the factors of citedness, paper age, and continuity of the highly cited
papers. These variables might eventually be part of an emergence index, in
conjunction with the topic growth rate. Obviously the precision of topic paper
identification is critical in such an analysis, and the combination of direct citation
and co-citation methods used here has probably contributed to this accuracy.
Clearly at this stage we are engaged in detection and not prediction of emergence.
Perhaps the most important implication of the present work is that detection by
citation-based methods is broadly feasible using a global approach to data
analysis rather than a local or case study approach which up to now has been the
predominant approach. Whether detection can be enhanced by a deeper analysis
of full texts, or application, for example, of word-based methods remains to be
Scopus data from 1996 to 2010 were generously provided by Elsevier under an
agreement with SciTech Strategies, Inc. We would like to thank Ludo Waltman
and Nees Jan van Eck and CWTS for use of the direct citation clustering software.
This research is supported by the Intelligence Advanced Research Projects
Activity (IARPA) via Department of Interior National Business Center
(DoI/NBC) contract number D11PC20152. The U.S. Government is authorized to
reproduce and distribute reprints for Governmental purposes notwithstanding any
copyright annotation thereon. Disclaimer: The views and conclusions contained
herein are those of the authors and should not be interpreted as necessarily
representing the official policies or endorsements, either expressed or implied, of
IARPA, DoI/NBC, or the U.S. Government.
Bettencourt, L. M. A., Kaiser, D. I., Kaur, J., Castillo-Chavez, C., & Wojick, D.
(2008). Population modeling of the emergence and development of
scientific fields. Scientometrics, 75(3), 495-518.
Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. Annals
of Applied Statistics, 1(1), 17-35.
Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic
coupling, and direct citation: Which citation approach represents the
research front most accurately? Journal of the American Society for
Information Science and Technology, 61(12), 2389-2404.
Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and
transient patterns in scientific literature. Journal of the American Society
for Information Science and Technology, 57(3), 359-377.
Chen, C., Hu, Z., Liu, S., & Tseng, H. (2012). Emerging topics in regenerative
medicine: A scientometric analysis in CiteSpace. Expert Opinion on
Biological Therapy, 12(5), 593-608.
Garfield, E., Sher, I. H., & Torpie, R. J. (1964). The use of citation data in writing
the history of science. Philadelphia: Institute for Scientific Information.
Klavans, R., & Boyack, K. W. (2011). Using global mapping to create more
accurate document-level maps of research fields. Journal of the American
Society for Information Science and Technology, 62(1), 1-18.
Morris, S. A., Yen, G., Wu, Z., & Asnake, B. (2003). Time line visualization of
research fronts. Journal of the American Society for Information Science
and Technology, 54(5), 413-422.
Novoselov, K. S., Geim, A. K., Morozov, S. V., Jiang, D., Zhang, Y., Dubonos,
S. V., et al. (2004). Electric field effect in atomically thin carbon films.
Science, 306(5696), 666-669.
Price, D. J. D. (1965). Networks of scientific papers. Science, 149, 510-515.
Shapira, P., Youtie, J., & Arora, S. (2012). Early patterns of commercial activity
in graphene. Journal of Nanoparticle Research, 14(4), art. num. 811.
Shibata, N., Kajikawa, Y., Takeda, Y., & Matsushima, K. (2008). Detecting
emerging research fronts based on topological measures in citation
networks of scientific publications. Technovation, 28, 758-775.
Shibata, N., Kajikawa, Y., Takeda, Y., Sakata, I., & Matsushima, K. (2010).
Detecting emerging research fronts in regenerative medicine by the
citation network analysis of scientific publications. Technological
Forecasting & Social Change, 78(2), 274-282.
Small, H. (1977). A co-citation model of a scientific specialty: A longitudinal
study of collagen research. Social Studies of Science, 7(139-166).
Waltman, L., & Van Eck, N. J. (2012). A new methodology for constructing a
publication-level classification system of science. Journal of the
American Society for Information Science and Technology, 63(12), 2378-