Article

Using Global Mapping to Create More Accurate Document-Level Maps of Research Fields

Authors:
  • SciTech Strategies, United States
  • SciTech Strategies, Inc.
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We describe two general approaches to creating document-level maps of science. To create a local map, one defines and directly maps a sample of data, such as all literature published in a set of information science journals. To create a global map of a research field, one maps “all of science” and then locates a literature sample within that full context. We provide a deductive argument that global mapping should create more accurate partitions of a research field than does local mapping, followed by practical reasons why this may not be so. The field of information science is then mapped at the document level using both local and global methods to provide a case illustration of the differences between the methods. Textual coherence is used to assess the accuracies of both maps. We find that document clusters in the global map have significantly higher coherence than do those in the local map, and that the global map provides unique insights into the field of information science that cannot be discerned from the local map. Specifically, we show that information science and computer science have a large interface and that computer science is the more progressive discipline at that interface. We also show that research communities in temporally linked threads have a much higher coherence than do isolated communities, and that this feature can be used to predict which threads will persist into a subsequent year. Methods that could increase the accuracy of both local and global maps in the future also are discussed.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... So far only a small set of studies exists that systematically investigate the validity of solutions obtained and the difference made by alternative choices (see e.g. Haunschild et al. 2018, Sjögårde 2018, Klavans & Boyack 2017, Velden et al. 2017, Šubelj et al 2016, Boyack & Klavans 2010, Klavans & Boyack 2011, Shibata et al. 2009). We are concerned that failure to invest into the systematic comparisons and careful validation of the interpretation of algorithmically extracted topical structures undermines our ability to provide robust interpretations of their results and sound guidance on the choice and appropriate use of algorithmic topic extraction or field classification approaches. ...
... Such an external perspective captures the embedding of publications in a field into the global network of scientific publications and is expected to highlight interdisciplinary connections to other areas of research (Boyack 2017). Klavans and Boyack (2011) argue that under certain conditions 3 a global science map can be expected to produce a more 'accurate' map of a field than local maps can -where accuracy is measured by textual coherence of the clusters obtained. However, as Haunschild et al. (2018) found in a case study of the topic of 'overall water splitting', also global maps may fail to adequately capture research fields. ...
... But for a few exceptions, the projection clusters constitute only about 5% of a microfield in the CWTS classification. Adopting the terminology used by Klavans and Boyack (2011) when comparing a local and a global mapping of the field of information science, microfield m402 may be considered a 'core' microfield for invasion science (53% of its publications overlap with the invasion data set and constitute projection cluster C1 on 'invasive plants'). Three microfields may be considered 'boundary' microfields, namely m2749 (34% overlap, projection cluster C5 on 'marine aquatic invasion, ballast water, ascidians'), m1774 (17% overlap, projection cluster C2 on 'freshwater aquatic invasion, great lakes'), and m2568 (17% overlap, projection cluster C10 on 'freshwater aquatic invasion, crayfish'). ...
Preprint
Full-text available
In our paper we seek to address a shortcoming in the scientometric literature, namely that, given the proliferation of algorithmic approaches to topic detection from bibliometric data, there is a relative lack of studies that validate and create a deeper understanding of the topical structures these algorithmic approaches generate. To take a closer look at this issue, we investigate the results of the new Leiden algorithm when applied to the direct citation network of a field-level data set. We compare this internal perspective which is constructed from the citation links within a data set of 30,000 publications in invasion biology, with an external perspective onto the topic structures in this research specialty, which is based on a global science map in form of the CWTS microfield classification underlying the Leiden Ranking. We present an initial comparative analysis of the results and lay out our next steps that will involve engaging with domain experts to examine how the algorithmically identified topics relate to understandings of topics and topical perspectives that operate within this research specialty.
... Global 2 subject maps of science have been shown to be more accurate and useful than local maps Klavans & Boyack, 2011;Rafols, Porter, & Leydesdorff, 2010). Similarly, global classifications have some of the same advantages. ...
... Waltman and van Eck mention that "the choice of parameter values should be guided by the purpose for which a classification system is intended to be used." (2012, p. 2383) Boyack and Klavans have focused on which citation relation to be used (Boyack et al., 2011;Klavans & Boyack, 2017), rather than the granularity of the classification. Similarly to Waltman and van Eck, Boyack and Klavans point out that the "proper level of granularity likely depends on the specific question being asked, and is a question that we do not address in this study." ...
... Marshakova-Shaikevich, 1973;Small, 1973), textual similarity (e.g. Ahlgren & Colliander, 2009;Boyack et al., 2011) or combined approaches (e.g. Colliander, 2015;Glänzel & Thijs, 2017). ...
Article
Full-text available
The purpose of this study is to find a theoretically grounded, practically applicable and useful granularity level of an algorithmically constructed publication-level classification of research publications (ACPLC). The level addressed is the level of research topics. The methodology we propose uses synthesis papers and their reference articles to construct a baseline classification. A dataset of about 31 million publications, and their mutual citations relations, is used to obtain several ACPLCs of different granularity. Each ACPLC is compared to the baseline classification and the best performing ACPLC is identified. The results of two case studies show that the topics of the cases are closely associated with different classes of the identified ACPLC, and that these classes tend to treat only one topic. Further, the class size variation is moderate, and only a small proportion of the publications belong to very small classes. For these reasons, we conclude that the proposed methodology is suitable to determine the topic granularity level of an ACPLC and that the ACPLC identified by this methodology is useful for bibliometric analyses.
... There is no ground truth classification and different methodological choices results in different, sometimes equally valid, representations of research delineation (Glänzel & Schubert, 2003;Gläser et al., 2017;Klavans & Boyack, 2017;Mai, 2011;Sjögårde & Ahlgren, 2018;Smiraglia & van den Heuvel, 2013;Velden et al., 2017;Waltman & van Eck, 2012). Nevertheless, the results have been compared to a wide range of baselines and many different applications have been evaluated and compared (Ahlgren et al., 2020;Boyack et al., 2011;Boyack & Klavans, 2010Donner, 2021;Haunschild et al., 2018;Sjögårde & Ahlgren, 2018, 2020Šubelj et al., 2016;Waltman et al., 2020). ...
... Including citations that are external to the analyzed data set improves accuracy of a clustering solution (Ahlgren et al., 2020;Donner, 2021;Klavans & Boyack, 2011). This is an advantage of a global approach, compared to a local one in which a clustering is based on a restricted set of publications. ...
Article
Full-text available
Overlay maps of science are global base maps over which subsets of publications can be projected. Such maps can be used to monitor, explore, and study research through its publication output. Most maps of science, including overlay maps, are flat in the sense that they visualize research fields at one single level. Such maps generally fail to provide both overview and detail about the research being analyzed. The aim of this study is to improve overlay maps of science to provide both features in a single visualization. I created a map based on a hierarchical classification of publications, including broad disciplines for overview and more granular levels to incorporate detailed information. The classification was obtained by clustering articles in a citation network of about 17 million publication records in PubMed from 1995 onwards. The map emphasizes the hierarchical structure of the classification by visualizing both disciplines and the underlying specialties. To show how the visualization methodology can help getting both overview of research and detailed information about its topical structure, I studied two cases: (1) coronavirus/Covid-19 research and (2) the university alliance called Stockholm trio. Peer Review https://publons.com/publon/10.1162/qss_a_00216
... They basically use two methods: a Bottom-up approach based on grouping sets of articles according to a criterion (bibliographical coupling or co-citation networks), and a Top-down approach dependent on existing classifications (Moschini et al, 2020). They usually are looking for communication vessels between authors, institutions, canonical cited authors, cited or citing journals, cited or citing documents, and usually illustrate the diversity of research areas represented by these items using maps (Chen, 2017;Klavans & Boyack, 2011;Porter & Rafols, 2009). ...
... This methodological approach of evaluative bibliometrics has been used to compare researchers, research groups, institutions, journals, and even countries. However, this method is not exempt of criticism, based on the idea that the indexation process (the assignment of a journal to a subject category or a subject area, especially through a machine-based process) is heuristic and subjective (Pudovkin & Garfield, 2002), and some journals (especially those involving interdisciplinary topics) are not sufficiently disciplinarily oriented to be used for the normalization (Klavans & Boyack, 2011;Leydesdorff & Bornmann, 2016). ...
Article
This paper analyzed the growth and multidisciplinary nature of Artificial Intelligence research during the last 60 years. Web of Science coverage since 1960 was considered, and a descriptive research was performed. A top-down approach using Web of Science subject categories as a proxy to measure multidisciplinarity was developed. Bibliometric indicators based on the core of subject categories involving articles and citing articles related to this area were applied. The data analysis within a historical and epistemological perspective allowed to identify three main evolutionary stages: an emergence period (1960–1979), based on foundational literature from 1950s; a re-emergence and consolidation period (1980–2009), involving a “paradigmatic” phase of development and first industrial approach; and a period of re-configuration of the discipline as a technoscience (2010–2019), where an explosion of solutions for productive systems, wide collaboration networks and multidisciplinary research projects were observed. The multidisciplinary dynamics of the field was analyzed using a Thematic Dispersion Index. This indicator clearly described the transition from the consolidation stage to the re-configuration of the field, finding application in a wide diversity of scientific and technological domains. The results demonstrated that epistemic changes and qualitative leaps in Artificial Intelligence research have been associated to variations in multidisciplinarity patterns.
... Las diferencias de citación existentes entre los diversos dominios del conocimiento, ha provocado que se utilicen estos esquemas de clasificación como elemento clave para la normalización de indicadores basados en citas (Ruiz-Castillo y Waltman, 2014;Waltman y van Eck, 2012). Algunos autores han aprovechado el fenómeno de la asignación de múltiples categorías temáticas a una revista para estudiar la interdisciplinariedad o los fenómenos complejos que se ponen de manifiesto en la estructura de las revistas (Bordons et al., 2004;Morillo et al., 2001), lo cual es un principio que respalda diversas metodologías para el mapeo de dominios (Klavans y Boyack, 2011;Moya-Anegón et al., 2004). ...
... Leydesdorff y Bornmann (2016) dudan de la claridad analítica de los esquemas en el proceso de normalización. De hecho, cuando se normalizan los indicadores de citación, las revistas que no tienen definida con claridad una orientación disciplinar (o que son propiamente de naturaleza interdisciplinaria) pueden constituir un problema (Klavans y Boyack, 2011;Leydesdorff y Bornmann, 2016). En estos casos, sus contenidos no van a expresar las diferencias de citación existentes entre los campos que abarcan, sino que serán dependientes de la categoría a la cual sea asignada la revista. ...
... Considerations concerning the scope of selection have been discussed in the literature in terms of local, global, and hybrid approaches [18]. Global maps of science by definition provide a comprehensive coverage of all scientific disciplines [19], whereas local maps typically focus on selected areas of interest. ...
... Global maps of science aim to represent all scientific disciplines [18]. Commonly used units of analysis in global maps of science include journals and, to a much less extent, articles. ...
Article
Full-text available
Systematic scientometric reviews, empowered by computational and visual analytic approaches, offer opportunities to improve the timeliness, accessibility, and reproducibility of studies of the literature of a field of research. On the other hand, effectively and adequately identifying the most representative body of scholarly publications as the basis of subsequent analyses remains a common bottleneck in the current practice. What can we do to reduce the risk of missing something potentially significant? How can we compare different search strategies in terms of the relevance and specificity of topical areas covered? In this study, we introduce a flexible and generic methodology based on a significant extension of the general conceptual framework of citation indexing for delineating the literature of a research field. The method, through cascading citation expansion, provides a practical connection between studies of science from local and global perspectives. We demonstrate an application of the methodology to the research of literature-based discovery (LBD) and compare five datasets constructed based on three use scenarios and corresponding retrieval strategies, namely a query-based lexical search (one dataset), forward expansions starting from a groundbreaking article of LBD (two datasets), and backward expansions starting from a recently published review article by a prominent expert in LBD (two datasets). We particularly discuss the relevance of areas captured by expansion processes with reference to the query-based scientometric visualization. The method used in this study for comparing bibliometric datasets is applicable to comparative studies of search strategies.
... Scientometric studies can provide domain-specific insights without direct involvements of domain experts (6). Researchers differentiate science mapping efforts in terms of global, local, and hybrid science maps (12). Global maps of science by definition provide a comprehensive coverage of all scientific disciplines (13), whereas local maps typically focus on some areas of research but are not expected to represent all areas. ...
... Global maps of science aim to present a holistic view of the literature of all scientific disciplines (12). Commonly used units of analysis in global maps of science include journals and to a much less extent articles. ...
Preprint
Full-text available
http://arxiv.org/abs/1906.04800 DOI: 10.1371/journal.pone.0223994 Systematic scientometric reviews, empowered by scientometric and visual analytic techniques, offer opportunities to improve the timeliness, accessibility, and reproducibility of conventional systematic reviews. While increasingly accessible science mapping tools enable end users to visualize the structure and dynamics of a research field, a common bottleneck in the current practice is the construction of a collection of scholarly publications as the input of the subsequent scientometric analysis and visualization. End users often have to face a dilemma in the preparation process: the more they know about a knowledge domain, the easier it is for them to find the relevant data to meet their needs adequately; the little they know, the harder the problem is. What can we do to avoid missing something valuable but beyond our initial description? In this article, we introduce a flexible and generic methodology, cascading citation expansion, to increase the quality of constructing a bibliographic dataset for systematic reviews. Furthermore, the methodology simplifies the conceptualization of globalism and localism in science mapping and unifies them on a consistent and continuous spectrum. We demonstrate an application of the methodology to the research of literature-based discovery and compare five datasets constructed based on three use scenarios, namely a conventional keyword-based search (one dataset), an expansion process starting with a groundbreaking article of the knowledge domain (two datasets), and an expansion process starting with a recently published review article by a prominent expert in the domain (two datasets). The unique coverage of each of the datasets is inspected through network visualization overlays with reference to other datasets in a broad and integrated context.
... Scientometric studies can provide domain-specific insights without direct involvements of domain experts (6). Researchers differentiate science mapping efforts in terms of global, local, and hybrid science maps (12). Global maps of science by definition provide a comprehensive coverage of all scientific disciplines (13), whereas local maps typically focus on some areas of research but are not expected to represent all areas. ...
... Global maps of science aim to present a holistic view of the literature of all scientific disciplines (12). Commonly used units of analysis in global maps of science include journals and to a much less extent articles. ...
Preprint
Systematic scientometric reviews, empowered by scientometric and visual analytic techniques, offer opportunities to improve the timeliness, accessibility, and reproducibility of conventional systematic reviews. While increasingly accessible science mapping tools enable end users to visualize the structure and dynamics of a research field, a common bottleneck in the current practice is the construction of a collection of scholarly publications as the input of the subsequent scientometric analysis and visualization. End users often have to face a dilemma in the preparation process: the more they know about a knowledge domain, the easier it is for them to find the relevant data to meet their needs adequately; the little they know, the harder the problem is. What can we do to avoid missing something valuable but beyond our initial description? In this article, we introduce a flexible and generic methodology, cascading citation expansion, to increase the quality of constructing a bibliographic dataset for systematic reviews. Furthermore, the methodology simplifies the conceptualization of globalism and localism in science mapping and unifies them on a consistent and continuous spectrum. We demonstrate an application of the methodology to the research of literature-based discovery and compare five datasets constructed based on three use scenarios, namely a conventional keyword-based search (one dataset), an expansion process starting with a groundbreaking article of the knowledge domain (two datasets), and an expansion process starting with a recently published review article by a prominent expert in the domain (two datasets). The unique coverage of each of the datasets is inspected through network visualization overlays with reference to other datasets in a broad and integrated context.
... The clusters are thus separated using some thresholds for the parameters. Thus, these thresholds are arbitrarily selected and described as "good structures" for analysis (Janssens et al., 2008;Klavans and Boyack, 2011). Clusters are generated by comparing the density of edges of nodes of a cluster with those of other clusters. ...
Article
Purpose The study aims to analyze the intellectual structure in sustainable procurement (SP) research to identify the knowledge research clusters and provide potential avenues for future research. Design/methodology/approach The study conducted a bibliometric analysis to analyze the intellectual structure in the area of SP. Overall, 1,294 articles were selected from the Scopus database published between 2000 and 2022. The analysis was conducted using bibliometric R package, Biblioshiny and VOSviewer software. Further, content analysis of research clusters was carried out to set the future research agenda. Findings The study identifies four major knowledge research clusters of SP, namely, (1) green supply chain practices, (2) socially responsible purchasing, (3) environmental purchasing and (4) public procurement and policy. The study suggests a few research directions in the SP field. Moreover, the future research directions are aligned with specific organizational theories applicable in the area of SP research. Research limitations/implications The study is dependent on the Scopus database for the source of research publications on SP. Future studies may consider other research database sources. Practical implications Identifying knowledge research clusters of SP research is of paramount importance for developing policies in the near future. These policy initiatives pave the way for the adoption of SP practices in the business. The findings indicate the issues managers encounter while implementing SP in organizations. Originality/value The study offers valuable insights concerning parameters such as significant publication outlets, influential countries concerning the number of publications, impactful authors, title keywords and identifying major knowledge research clusters of SP to suggest future research directions. Further, the present study highlights emerging areas that require further research, including process governance, supplier diversity, innovation, the role of emerging technologies and the application of organizational theories in SP.
... Who are the leading authors in IIMB Management Review?. The leading authors in a discipline are deemed as "superstars" (Klavans and Boyack, 2011) or the most prolific and active authors (Anand et al., 2020). The leading author's publication is based on the affiliation or contribution to an article but does not denote the position of the scholar in the author list (e.g. ...
Article
Purpose Advanced bibliometric methods have emerged as key tools in mapping the history and trends of a discipline. This paper aims to demonstrate on applying various bibliometric methods to track a journal’s impact and review its knowledge contribution. In doing so, the authors take the case of IIMB Management Review (IMR) journal focused on management discipline, in consideration of its 10 years of publication presence. Design/methodology/approach Using bibliometric and Scopus metric methods, the authors map and analyze the productivity of IMR Journal and map its knowledge contributions. Findings The authors identify the IMR journal’s impact, its growth, the most prolific authors/affiliations, key research hotspots, cross-country collaboration and emerging trends over the past decade. Originality/value A 10-year longitudinal review helps the target group identify the main themes. It also provides key empirical insights to the journal editorial board and library managers for future planning and growth of the journal.
... Scientometric tools can reveal the invisible colleges in a discipline (Ding, 2011). These tools detect the mathematical patterns on metadata, to visualize the big picture regarding the research topic (Klavans and Boyack, 2011). ...
... Such ground truths have not yet been used by bibliometrics.2 In the context of this strategy, 'validity' is sometimes replaced by 'accuracy'(Ahlgren et al., 2020;Klavans & Boyack, 2011, 2017b. However, this move is merely rhetorical, which becomes obvious when one asks what the authors want to reconstruct with greater accuracy. ...
Article
Full-text available
In this paper we utilize an opportunity to construct ground truths for topics in the field of atomic, molecular and optical physics. Our research questions in this paper focus on (i) how to construct a ground truth for topics and (ii) the suitability of common algorithms applied to bibliometric networks to reconstruct these topics. We use the ground truths to test two data models (direct citation and bibliographic coupling) with two algorithms (the Leiden algorithm and the Infomap algorithm). Our results are discomforting: none of the four combinations leads to a consistent reconstruction of the ground truths. No combination of data model and algorithm simultaneously reconstructs all micro-level topics at any resolution level. Meso-level topics are not reconstructed at all. This suggests (a) that we are currently unable to predict which combination of data model, algorithm and parameter setting will adequately reconstruct which (types of) topics, and (b) that a combination of several data models, algorithms and parameter settings appears to be necessary to reconstruct all or most topics in a set of papers.
... Bibliometric multidisciplinary, interdisciplinary, or transdisciplinary analyses use two main approaches: A Bottom-up approach, based on clustering sets of articles according to a bibliographic criterion, either bibliographical coupling or co-citation networks; and a Top-to-bottom approach dependent on existing classifications schemes (Wagner et al., 2011). These strategies search for communication channels between authors, institutions, cited or citing authors, cited or citing journals, cited or citing documents, and mapping techniques to illustrate the diversity of research areas (Chen, 2017;Klavans and Boyack, 2011;Mochini et al., 2020). In this paper, we used the second approach to create a battery of indicators that expose the multidisciplinary nature of pandemics. ...
Article
Full-text available
Objective. We analyzed the scientific output after COVID-19 and contrasted it with studies published in the aftermath of seven epidemics/pandemics: Severe Acute Respiratory Syndrome (SARS), Influenza A virus H5N1 and Influenza A virus H1N1 human infections, Middle East Respiratory Syndrome (MERS), Ebola virus disease, Zika virus disease, and Dengue. Design/Methodology/Approach. We examined bibliometric measures for COVID-19 and the rest of the studied epidemics/pandemics. Data were extracted from Web of Science, using its journal classification scheme as a proxy to quantify the multidisciplinary coverage of scientific output. We proposed a novel Thematic Dispersion Index (TDI) for the analysis of pandemic early stages. Results/Discussion. The literature on the seven epidemics/pandemics before COVID-19 has shown explosive growth of the scientific production and continuous impact during the first three years following each emergence or re-emergence of the specific infectious disease. A subsequent decline was observed with the progressive control of each health emergency. We observed an unprecedented growth in COVID-19 scientific production. TDI measured for COVID-19 (29,4) in just six months, was higher than TDI of the rest (7,5 to 21) during the first three years after epidemic initiation. Conclusions. COVID-19 literature showed the broadest subject coverage, which is clearly a consequence of its social, economic, and political impact. The proposed indicator (TDI), allowed the study of multidisciplinarity, differentiating the thematic complexity of COVID-19 from the previous seven epidemics/pandemics. Originality/Value. The multidisciplinary nature and thematic complexity of COVID-19 research were successfully analyzed through a scientometric perspective.
... Choosing the best data to use to determine the relatedness between two different objects is not always straightforward. Co-occurrence in the feature space associated with the objects (e.g., co-term, co-citation) is the most commonly used basis for relatedness [3,25]. As mentioned above, for study sections, we find that we can break the feature space into three main groups, with data types as follows: ...
Preprint
Full-text available
The National Institutes of Health (NIH) is the largest source of funding for biomedical research in the world. This funding is largely effected through a competitive grants process. Each year the Center for Scientific Review (CSR) at NIH manages the evaluation, by peer review, of more than 55,000 grant applications. A relevant management question is how this scientific evaluation system, supported by finite resources, could be continuously evaluated and improved for maximal benefit to the scientific community and the taxpaying public. Towards this purpose, we have created the first system-level description of peer review at CSR by applying text analysis, bibliometric, and graph visualization techniques to administrative records. We identify otherwise latent relationships across scientific clusters, which in turn suggest opportunities for structural reorganization of the system based on expert evaluation. Such studies support the creation of monitoring tools and provide transparency and knowledge to stakeholders
... Then, using citation expansion, we retrieved all references that cited the initial set for the subsequent analysis (Fig 1). In the previous analysis and definition of the IS field research evolution and research front, several studies in the IS field adopted a collection of 12 journals (S1 Appendix) as a representative body of the relevant literature [72][73][74][75][76][77][78][79][80][81][82]. However, the selected journals might not necessarily and sufficiently represent the IS field [83]. ...
Article
Full-text available
We propose a method to measure the potential scholarly impact of researchers based on network structural variations they introduced to the underlying author co-citation network of their field. We applied the method to the information science field based on 91,978 papers published between 1979 and 2018 from the Web of Science. We divided the entire period into eight consecutive intervals and measured structural variation change rates (ΔM) of individual authors in corresponding author co-citation networks. Four types of researchers are identified in terms of temporal dynamics of their potential scholarly impact—1) Increasing, 2) Decreasing, 3) Sustained, and 4) Transient. The study contributes to the understanding of how researchers’ scholarly impact might evolve in a broad context of the corresponding research community. Specifically, this study illustrated a crucial role played by structural variation metrics in measuring and explaining the potential scholarly impact of a researcher. This method based on the structural variation analysis offers a theoretical framework and a practical platform to analyze the potential scholarly impact of researchers and their specific contributions.
... The usage of journals as an appropriate level for classification has been problematized even for journals with unique, non-multidisciplinary classification in WoS, given that a journal may publish articles from different disciplines and would not be the right unit to capture interdisciplinary activities (Abramo, D'Angelo, & Zhang, 2018;Klavans & Boyack, 2010). Boyack and Klavans (2011) suggest that "few journals are truly disciplinary" (p. 123). ...
Article
Full-text available
Classification of bibliographic items into subjects and disciplines in large databases is essential for many quantitative science studies. The Web of Science classification of journals into approximately 250 subject categories, which has served as a basis for many studies, is known to have some fundamental problems and several practical limitations that may affect the results from such studies. Here we present an easily reproducible method to perform reclassification of the Web of Science into existing subject categories and into 14 broad areas. Our reclassification is at the level of articles, so it preserves disciplinary differences that may exist among individual articles published in the same journal. Reclassification also eliminates ambiguous (multiple) categories that are found for 50% of items and assigns a discipline/field category to all articles that come from broad-coverage journals such as Nature and Science. The correctness of the assigned subject categories is evaluated manually and is found to be ∼95%.
... Domain analyses of information science have consistently demonstrated two key poles in the domain-bibliometrics and information retrieval (most recently and concisely summarized by Klavans and Boyack 2011). Knowledge organization (sometimes also called information organization) is a key sub-domain of information science, which is devoted to the conceptual order of knowledge. ...
Article
Domain analysis is the study of the evolution of discourse within research communities. Domain analytical studies of knowledge organization are here drawn together for meta-analysis to demonstrate coherence of theoretical poles within the domain. Despite geopolitical and cultural diversity, the domain shows theoretical coherence.L’analyse de domaine est l’étude de l’évolution du discours au sein d’une communauté de recherche. Les études sur l’analyse du domaine en organisation de l’information sont combinées aux fins d’une méta-analyse dans le but de démontrer la cohérence des pôles théoriques du domaine. Malgré la diversité géopolitique et culturelle, le domaine fait preuve de cohérence théorique.
... Author co-citation analysis was first put forward by White and Griffith 20 . White and McCain 11 studied the knowledge structure of IS through an author co-citation analysis based on 12 IS core journals, and exerted great influence on the follow-up methodology of IS. (Figure 4) [37][38][39][40][41] . ...
Article
This study not only analyses the centres of research in Information Science (IS), including the migration of central topics and central countries, but also analyses the relationship between the shifting of centres of research and their transformation. In addition, this study explores the relationship between the formation of the centre of research and the academic influence of the country on IS itself. We collected 25,150 articles, including 313,293 references about citation analysis, from databases SCI-E and SSCI between 1977 and 2016 as our data source. The following findings were obtained through this study: the transfer (transfer time) of central research topics in the IS domain has accelerated, from 12 to 8 years between 1980 and 1990, to 6 to 4 years between 2000 and 2010, and to 3 years between 2011 and 2016. The number of central research topics has also grown, from one between 1997 and 2006, to two from 2006 to 2013 to three from 2013 to 2016. The geographical centres of IS research were the US and Britain between the 1970s and 1980s, but gradually migrated through neighbouring countries, and finally to Asia by 2000. China, which became the centre of research for IS in 2005 for the first time, has been ranked first since 2011. In addition, countries acting as centres of research enjoy not only a high output of literature but also great academic influence. The theoretical and practical implications of our findings are discussed.
... Identifying LIS research topics is a long-term concern among LIS researchers. Several methods have been applied to topic analysis, including expert judgment (Tuomaala et al. 2014;Tveit 2017), keyword analysis (González-Alcaide et al. 2008;Mondal et al. 2017;Xiao et al. 2015), controlled subject term analysis , coword analysis (Milojević et al. 2011(Milojević et al. ), cocitation analysis (Å ström 2007Klavans and Boyack 2011), and topic modeling Yan 2014). Despite changes in numbers and names of topics in related studies using these methods, the main research topics of LIS, such as information behavior, information retrieval, and bibliometrics, remain identifiable. ...
Article
This study investigated the external contributors of library and information science (LIS) knowledge who were unaffiliated with LIS-related institutions but published their research results in LIS journals. Differences between the contributors to library science (LS) and contributors to information science (IS) were considered. Articles published in 39 strongly LIS-oriented journals indexed in the Web of Science database between 2005 and 2014 were analyzed. The results demonstrated that 46.5% of the LIS articles were written by at least one non-LIS author; authors’ backgrounds ranged across 29 disciplines. An increasing trend was observed in degrees of interdisciplinarity of LS and IS. An increase in proportion of articles by LIS and non-LIS authors was identified in LS and IS as well. Those with medical backgrounds were the primary non-LIS authors contributing to the LS field and collaborated the most frequently with LIS authors. Those with computer science backgrounds were the most prevalent non-LIS contributors to the IS field and preferred to publish individually. A critical difference was also identified in research topics between LS and IS. The foundations of LIS and scientometrics were the largest research topics in LS and IS, respectively.
... Employing DCA and ACA, Astrom (2007) detected the research fronts of the IS domain research between 1990 and 2004 in two main aspects: information measurement and IR. Klavans and Boyack (2011) drew local and global maps using DCA and identified the difference between the IS knowledge structures in these two kinds of maps. Strotmann (2008a, b, c, 2014) employed ACA to trace the knowledge structure of the IS domain over more than 30 years. ...
Article
Characterizing the structure of knowledge, the evolution of research topics, and the emergence of topics has always been an important part of information science (IS). Our previous scientometric review of IS provided a snapshot of this fast-growing field up to the end of 2008. This new study aims to identify emerging trends and new developments appearing in the subsequent 7574 articles published in 10 IS journals between 2009 and 2016, including 20,960 references. The results of a document co-citation analysis show great changes in the research topics in the IS domain. The positions of certain core topics found in the previous study, namely, information retrieval, webometrics, and citation behavior, have been replaced by scientometric indicators (H-index), citation analysis (citation performance and bibliometrics), scientific collaboration, and information behavior in the most recent period of 2009–2016. Dual-map overlays of journals show that the knowledge base of IS research has shifted considerably since 2010, with emerging topics including scientific evaluation indicators, altmetrics, science mapping and visualization, bibliometrics, citation analysis, and scientific collaboration.
... Specialties may also split up when increased complexity necessitates a research community to focus on several particular parts of problems, in its research, its communication channels, and its practical organization. This dynamic nature of science and its research community structures, repeatedly reorganizing itself and its contributions to an ever increasing volume of scientific knowledge, can be made apparent in bibliometric visualizations of timelines (Chen, Ibekwe-SanJuan and Hou, 2010), of threads and isolates constituting a specialty (Klavans and Boyack, 2011) and of term usage in a domain over time (Milojevic et al., 2011). This dynamic nature may also partly explain why the concepts of specialties, fields, disciplines, domains, ... are often ill-defined in studies, the terms being used sometimes interchangeably and sometimes in hierarchy. ...
Article
Bibliometric methods for the analysis of highly specialized subjects are increasingly investigated and debated. Information and assessments well-focused at the specialty level can help make important decisions in research and innovation policy. This paper presents a novel method to approximate the specialty to which a given publication record belongs. The method partially combines sets of key values for four publication data fields: source, title, authors and references. The approach is founded in concepts defining research disciplines and scholarly communication, and in empirically observed regularities in publication data. The resulting specialty approximation consists of publications associated to the investigated publication record via key values for at least three of the four data fields. This paper describes the method and illustrates it with an application to publication records of individual scientists. The illustration also successfully tests the focus of the specialty approximation in terms of its ability to connect and help identify peers. Potential tracks for further investigation include analyses involving other kinds of specialized publication records, studies for a broader range of specialties, and exploration of the potential for diverse applications in research and research policy context.
... At the end of the 20th century, the lower price of computer equipment, along with an increased capacity of processing and storage components, not to mention big data and new algorithms, set the stage for a grand era in the first decade of this millennium, with the appearance of a wide variety of free software for science mapping analysis ( Cobo et al., 2011a). Among many other achievements, science mapping become the keystone of visualizing and analyzing computer graphics ( Chen et al., 2001), using ISI categories to represent science ( Moya-Anegón et al., 2004), mapping the backbone of science ( Boyack et al., 2005), evaluating large maps of disciplines ( Klavans and Boyack, 2006), visualizing the citation impact of scientific journals ( Leydesdorff, 2007a), mapping interdisciplinarity ( Leydesdorff, 2007b), viewing the marrow of science ( Moya-Anegón et al., 2007b), creating dynamic animations of journal maps ( Leydesdorff and Schank, 2008), mapping the structure and evolution of chemistry research ( Boyack et al., 2009), proposing a consensus map of science ( Klavans and Boyack, 2009), creating a journal map using Scopus data ( ), mapping the geography of science ( Leydesdorff and Persson, 2010), clustering over two million biomedical publications ( Boyack et al., 2011), creating more accurate document-level maps of research fields ( Klavans and Boyack, 2011), detecting and visualizing the evolution of the fuzzy sets theory field ( Cobo et al., 2011b), proposing a new global science map ( Leydesdorff et al., 2013a,b;Boyack and Klavans, 2014), analyzing the investigation in integrative and complementary medicine ( Moral-Muñoz et al., 2014), analyzing intelligent transportation systems ( Cobo et al., 2014), showing the evolution of bases knowledge systems ( Cobo et al., 2015), showing the scientific evolution of social work ( Martínez et al., 2015), outlining animal science research ( Rodriguez-Ledesma et al., 2015), studying the conceptual evolution of marketing research ( Murgado-Armenteros et al., 2015) identifying and depicting the intellectual structure and research fronts in nanoscience and nanotechnology in the world ( Muñoz-Écija et al., 2017), and exploring the scientific evolution of e-Government ( Alcaide-Muñoz et al., in press), among other brave new initiatives. The proposal by Leydesdorff and Rafols (2009) of creating overlay maps can be seen as a powerful contribution integrating visualization, intellectual structure, evolution, and benchmarking, for any kind of scientific domain. ...
Article
Full-text available
Since the discovery of the promising properties of graphene, research in the field has attracted numerous grants and sponsors, leading to an exponential rise in the number of papers and applications. This article presents a global map of graphene research and its intellectual structure, drawn using the terms of more than 50,000 documents extracted from Scopus database, years 1998–2015. The unit of analysis consisted of descriptors (including Author Keywords and Indexed Keywords), with the co-occurrence of descriptor as the unit of measure, using fractional counting. The main research lines identified are: Fundamental Research, Functionalization and Biomedical Applications, Technology and Devices, Materials Science, Energy Storage, Optics and Chemical Properties and Sensors. Using overlay maps, we depict the graphene research efforts of the United States, the European Union (Europe-28), and China, and project their evolution through longitudinal maps to facilitate comparison. The United States was initially at the head of world output in graphene research, but was surpassed by China in 2011 and by Europe in 2014, as a result of their respective scientific policies and financial support. The output of China has since been so intense that it can be said to mark graphene research trends. We believe this information may be valuable for the core community involved in this scientific field, as it offers a large-scale analysis showing how research has changed over time. It is therefore also helpful for policy makers and research planners. The resulting maps are a useful and attractive tool for the graphene research community, as they reveal the main lines of exploration at a glance. The methodology described here could be re-created in any other field of science to uncover and display its intellectual structure and evolution over time.
... It is possible to identify a number of different foci depending on the sources used, the elements and fields analyzed, and the periods considered. Sometimes, these studies use elaborate theoretical techniques such as science mapping (Klavans and Boyack 2011). ...
Article
This paper offers an overview of the bibliometric study of the domain of library and information science (LIS), with the aim of giving a multidisciplinary perspective of the topical boundaries and the main areas and research tendencies. Based on a retrospective and selective search, we have obtained the bibliographical references (title and abstract) of academic production on LIS in the database LISA in the period 1978–2014, which runs to 92,705 documents. In the context of the statistical technique of topic modeling, we apply latent Dirichlet allocation, in order to identify the main topics and categories in the corpus of documents analyzed. The quantitative results reveal the existence of 19 important topics, which can be grouped together into four main areas: processes, information technology, library and specific areas of information application.
... Some research groups were able to exploit their access to local versions of Thomson Reuters' or Elsevier's citation databases and the opportunity to develop programmes that operate directly on these databases. Combined with increased computing power and the publication of ever more efficient algorithms, this access enabled the clustering of all publications indexed in the Web of Science (respectively Scopus) based on co-citations or direct citations (Boyack et al. 2005;Klavans and Boyack 2011;Waltman and van Eck 2012). ...
Article
Full-text available
Science studies are persistently challenged by the elusive structures of their subject matter, be it scientific knowledge or the various collectivities of researchers engaged with its production. Bibliometrics has responded by developing a strong and growing structural bibliometrics, which is concerned with delineating fields and identifying thematic structures. In the course of these developments, a concern emerged and is steadily growing. Do the sets of publications, authors or institutions we identify and visualise with our methods indeed represent thematic structures? To what extent are results of topic identification exercises determined by properties of knowledge structures, and to what extent are they determined by the approaches we use? Do we produce more than artefacts? These questions triggered the collective process of comparative topic identification reported in this special issue. The introduction traces the history of bibliometric approaches to topic identification, identifies the major challenges involved in these exercises, and introduces the contributions to the special issue.
... It was deemed one of the challenges of bibliometrics by van Raan (1996) and is still considered as such despite the significant progress and a plethora of methods available. Major developments since van Raan's paper include approaches that cluster the whole Web of Science based on journal-tojournal citations (Leydesdorff 2004;Leydesdorff and Rafols 2009;Leydesdorff and Rafols 2012), co-citations, or direct citations (Boyack, Klavans, and Börner 2005;Boyack and Klavans 2010;Klavans and Boyack 2011;Waltman and van Eck 2012), the advance of hybrid approaches that combine citation-based and term-based techniques (Glenisson, Glänzel, Janssens, and Moor 2005;Glänzel and Thijs 2015), and term-based probabilistic methods (topic modelling, cf. Yau, Porter, Newman, and Suominen 2014). ...
Article
Full-text available
In spite of recent advances in field delineation methods, bibliometricians still don't know the extent to which their topic detection algorithms reconstruct `ground truths', i.e. thematic structures in the scientific literature. In this paper, we demonstrate a new approach to the delineation of thematic structures that attempts to match the algorithm to theoretically derived and empirically observed properties all thematic structures have in common. We cluster citation links rather than publication nodes, use predominantly local information and search for communities of links starting from seed subgraphs in order to allow for pervasive overlaps of topics. We evaluate sets of links with a new cost function and assume that local minima in the cost landscape correspond to link communities. Because this cost landscape has many local minima we define a valid community as the community with the lowest minimum within a certain range. Since finding all valid communities is impossible for large networks, we designed a memetic algorithm that combines probabilistic evolutionary strategies with deterministic local searches. We apply our approach to a network of about 15,000 Astronomy & Astrophysics papers published 2010 and their cited sources, and to a network of about 100,000 Astronomy & Astrophysics papers (published 2003--2010) which are linked through direct citations.
... Also the number of studies examined is also restricted depending on the time required for expert evaluation. The visualization methods based on the bibliometric data aim to reveal patterns which are based on mathematical relation by making use of the meta data of the scientific studies (Boyack &Klavans, 2010;Klavans & Boyack, 2011). Meta data citation information is the most frequently used information in the evaluation of the scientific relations. ...
Article
Full-text available
The technological integration issue in education is examined in educational researches with its different aspects and especially in recent years, comes into a more important position all over the world. Technological Pedagogical Content Knowledge (TPACK) studies, which we confront frequently in this context, increase scientific accumulation of knowledge for using technology effectively in teaching different subject areas and provide teacher training a new dimension. In addition to the researches that contribute to form theoretic framework, application oriented studies also attract the attention. This study aims to reveal the scholarly communication of the researchers, to specify the documents and authors efficient in the field and to reveal extensive conclusions in the context of document and author by examining the researches that are conducted about TPACK. In this sense, it is thought that this study will reveal the current situation on this subject and contribute to the planning of future researches. The study is conducted with 543 documents in total which are books, reviews and researches about TPACK that are acquired from Web of Science (WoS) and Scopus databases. By using bibliometrical method, the scholarly communication pattern in TPACK area is tackled in the context of author and document and the prominent authors and documents on yearly basis and is presented with scientific mapping method by visualizing. Thus, extensive conclusions are revealed about the documents about this subject and the authors of these documents
... Document-level metrics represent early-stage social or public engagement indicators of how (and by whom) a work is being shared, used, commented on, and disseminated further. [124,125] Who is reading the new work? Who is tweeting about the new work? ...
Article
Full-text available
Accurate quantification of scholarly productivity continues to pose a significant challenge to academic medical institutions seeking to standardize faculty performance metrics. Numerous approaches have been described in this domain, from subjective measures employed in the past to rapidly evolving objective assessments of today. Metrics based on publication characteristics include a variety of easily categorized, normalized, referenced, and quantifiable data points. In general, such measures can be broadly grouped as being author‑, manuscript‑, and publication/journal‑specific. Commonly employed units of measurement are derived from the number of publications and/or citations, in various combinations and derivations. In aggregate, these metrics are utilized to more objectively assess academic productivity, mainly for the purpose of determining faculty promotion and tenure potential; evaluating grant application/renewal competitiveness; journal/publication, and institutional benchmarking; faculty recruitment, retention, and placement; as well as various departmental and institutional performance assessments. This article provides an overview of different measures of academic productivity and scientific impact, focusing on bibliometric data utilization, including advantages and disadvantages of each respective methodological approach. The following core competencies are addressed in this article: Interpersonal skills and communication, practice‑based learning and improvement, systems‑based practice. Key Words: Academic productivity metrics, bibliometric indices, impact factor, promotion and tenure
... Following this approach (Chen et al, 2010) introduced a cluster summarization technique to identify clusters of the co-citation network that correspond to scientific communities. In addition to identifying clusters, (Klavans and Boyack, 2011) compared the local and global map of Information Science and set up a model for how science evolves based on data from the 2000-2008 time interval. ...
Article
Our current societies increasingly rely on electronic repositories of collective knowledge. An archetype of these databases is the Web of Science (WoS) that stores scientific publications. In contrast to several other forms of knowledge -- e.g., Wikipedia articles -- a scientific paper does not change after its "birth". Nonetheless, from the moment a paper is published it exists within the evolving web of other papers, thus, its actual meaning to the reader changes. To track how scientific ideas (represented by groups of scientific papers) appear and evolve, we apply a novel combination of algorithms explicitly allowing for papers to change their groups. We (i) identify the overlapping clusters of the undirected yearly co-citation networks of the WoS (1975-2008) and (ii) match these yearly clusters (groups) to form group timelines. After visualizing the longest lived groups of the entire data set we assign topic labels to the groups. We find that in the entire Web of Science multidisciplinarity is clearly over-represented among cutting edge ideas. In addition, we provide detailed examples for papers that (i) change their topic labels and (ii) move between groups.
... Following the same method and the journals of White and McCain (1998), Zhao and Strotmann (2008a) enriched the classic ACA such that it employs both orthogonal and oblique rotations in the factor analysis; they then mapped the field of IS for the period 1996-2005. Zhao and Strotmann (2008c) also found a number of differences between all-and first-author-based ACA in IS. Klavans and Boyack (2011) mapped IS at the document level using both local and global methods to provide a case illustration of the differences between the methods. Jeong, Song, and Ding (2014) proposed a new method for measuring the similarity between co-cited authors by considering authors' citation content in IS, and they found that their proposed approach provides more details about the sub-disciplines in the domain than traditional ACA. ...
Article
Full-text available
We introduce the author keyword coupling analysis (AKCA) method to visualize the field of information science (2006-2015). We then compare the AKCA method with the author bibliographic coupling analysis (ABCA) method in terms of first- and all-author citation counts. We obtain the following findings: (1) The AKCA method is a new and feasible method for visualizing a discipline's structure, and the ABCA and AKCA methods have their respective strengths and emphases. The relation within the ABCA method is based on the same references (knowledge base), whereas that within the AKCA method is based on the same keywords (lexical linguistic). The AKCA method appears to provide a less detailed picture, and more uneven sub-areas of a discipline structure. The relationships between authors are narrow and direct and feature multiple levels in AKCA. (2) All-author coupling provides a comprehensive picture; thus, a complete view of a discipline structure may require both first- and all-author coupling analyses. (3) Information science evolved continuously during the second decade of the World Wide Web. The KDA (knowledge domain analysis) camp became remarkably prominent, while the IR camp (information retrieval) experienced a further decline in hard IR research, and became significantly smaller; Patent analysis and Open Access emerged during this period. Mapping of Science and Bibliometric evaluation also experienced substantial growth.
... The resulting clusters tend to be small and narrowly focused at the scientific problem level. The annual solutions are then merged to form threads which connect clusters in adjacent year slices based on shared cited papers (Klavans & Boyack, 2011). This merges the yearly cluster slices into a longitudinal picture. ...
Conference Paper
Full-text available
We present a novel approach to identifying emerging topics in science and technology. An existing co-citation cluster model is combined with a new method for clustering based on direct citation links. Both methods are run across multiple years of Scopus data, and emergent co-citation threads in a specific year are matched against the direct citation clusters to obtain the emergent topics ranked by a difference function. The topics are classified and characterized in various ways in order to understand the motive forces behind their emergence, whether scientific discovery, technological innovation, or exogenous events. Cross-sectional analysis of citation links and paper age are used to study the process of emergence for discovery based science topics.
Thesis
Full-text available
The science system is large, and millions of research publications are published each year. Within the field of scientometrics, the features and characteristics of this system are studied using quantitative methods. Research publications constitute a rich source of information about the science system and a means to model and study science on a large scale. The classification of research publications into fields is essential to answer many questions about the features and characteristics of the science system. Comprehensive, hierarchical, and detailed classifications of large sets of research publications are not easy to obtain. A solution for this problem is to use network-based approaches to cluster research publications based on their citation relations. Clustering approaches have been applied to large sets of publications at the level of individual articles (in contrast to the journal level) for about a decade. Such approaches are addressed in this thesis. I call the resulting classifications “algorithmically constructed, publications-level classifications of research publications” (ACPLCs). The aim of the thesis is to improve interpretability and utility of ACPLCs. I focus on some issues that hitherto have not received much attention in the previous literature: (1) Conceptual framework. Such a framework is elaborated throughout the thesis. Using the social science citation theory, I argue that citations contextualize and position publications in the science system. Citations may therefore be used to identify research fields, defined as focus areas of research at various granularity levels. (2) Granularity levels corresponding to conceptual framework. In Articles I and II, a method is proposed on how to adjust the granularity of ACPLCs in order to obtain clusters corresponding to research fields at two granularity levels: topics and specialties. (3) Cluster labeling. Article III addresses labeling of clusters at different semantic levels, from broad and large to narrow and small, and compares the use of data from various bibliographic fields and different term weighting approaches. (4) Visualization. The methods resulting from Articles I-III are applied in Article IV to obtain a classification of about 19 million biomedical articles. I propose a visualization methodology that provides overview of the classification, using clusters at coarse levels, as well as the possibility to zoom into details, using clusters at a granular level. In conclusion, I have improved interpretability and utility of ACPLCs by providing a conceptual framework, adjusting granularity of clusters, labeling clusters and, finally, by visualizing an ACPLC in a way that provides both overview and detail. I have demonstrated how these methods can be applied to obtain ACPLCs that are useful to, for example, identify and explore focus areas of research.
Article
Full-text available
We propose a new concept for measuring the affinity between fields of academic research. The importance of interdisciplinary research has been increasingly emphasized in recent years. The degree of interdisciplinarity of a research article can be determined using bibliographic information from the cited literature. However, the properties of the affinity of each field to other fields have not yet been discussed. Therefore, we employ our method to quantify the affinity between 27 research fields using academic journal data from the citation and abstract database Scopus. We show that the affinity between fields should be viewed from two perspectives: the affinity of other fields to the field of interest, and the affinity of the field of interest to other fields. We identify the fields of “Arts and Humanities” and “Social Sciences”, and “Earth and Planetary Sciences” and “Environmental Sciences”, as those with the highest bidirectional affinity. We also demonstrate that affinity to “Medicine” is particularly high, with seven fields of interest having the highest affinity to this field: “Biochemistry, Genetics and Molecular Biology”, “Immunology and Microbiology”, “Neuroscience”, “Pharmacology, Toxicology and Pharmaceutics”, “Nursing”, “Dentistry”, and “Health Professions”.
Article
Full-text available
For the reconstruction of topics in bibliometric networks, one has to use algorithms. Specifically, researchers often apply algorithms from the class of network community detection algorithms (such as the Louvain algorithm), which are general-purpose algorithms not intentionally programmed for a bibliometric task. Each algorithm has specific properties “inscribed”, which distinguishes it from the others. It can thus be assumed that different algorithms are more or less suitable for a given bibliometric task. However, the suitability of a specific algorithm when it is applied for topic reconstruction is rarely reflected upon. Why choose this algorithm and not another? In this study I assess the suitability of four community detection algorithms for topic reconstruction, by first deriving the properties of the phenomenon to be reconstructed – topics – and comparing if these match with the properties of the algorithms. The results suggest that the previous use of these algorithms for bibliometric purposes cannot be justified by their specific suitability for this task. Peer Review https://publons.com/publon/10.1162/qss_a_00217
Preprint
Full-text available
Overlay maps of science are global base maps over which subsets of publications can be projected. Such maps can be used to monitor, explore, and study research through its publication output. Most maps of science, including overlay maps, are flat in the sense that they visualize research fields at one single level. Such maps generally fail to provide both overview and detail about the research being analyzed. The aim of this study is to improve overlay maps of science to provide both features in a single visualization. I created a map based on a hierarchical classification of publications, including broad disciplines for overview and more granular levels to incorporate detailed information. The classification was obtained by clustering articles in a citation network of about 17 million publication records in PubMed from 1995 onwards. The map emphasizes the hierarchical structure of the classification by visualizing both disciplines and the underlying specialties. To show how the visualization methodology can help getting both overview of research and detailed information about its topical structure, I projected two overlay maps onto the base map: (1) open access publishing and (2) coronavirus/Covid-19 research.
Thesis
Full-text available
In bibliometrischen Analysen werden häufig Zitationsbeziehungen dazu genutzt thematische Cluster in Forschungsfeldern zu identifizieren. Aufgrund mangelnder Informationen wird dabei nicht zwischen verschiedenen thematischen Bezügen der Zitierungen unterschieden, also ob sich ein Zitat auf etwa eine Theorie, eine Methode, einen Forschungsgegenstand oder anderes bezieht. In thematischen Mappings vermischen sich diese Bezüge und sind nicht mehr zu unterscheiden. Es wurde bisher nicht systematisch untersucht, ob und wie diese Vermischung das Mapping beeinflusst. Die vorliegende Arbeit beschäftigt sich mit (1) der Analyse epistemischer Funktionen von Zitationsbeziehungen und (2) deren Auswirkungen auf das bibliometrische Mapping thematischer Strukturen. Dazu werden im ersten Teil der Analyse die Referenzen einer begrenzten Anzahl an Publikationen nach ihrer Funktion kodiert und ausgewertet und im zweiten Teil der Analyse Netzwerkkarten unter Berücksichtigung der Zitationsfunktionen erstellt und analysiert. Dabei werden Methoden der qualitativen Inhaltsanalyse und netzwerkanalytische Verfahren eingesetzt.
Article
The fast development of the emerging research topics field results in hundreds of theoretical and empirical publications. However, to our knowledge, there is no comprehensive and objective literature review on this field until now. To this end, a citation network consisting of 1607 papers between 1965 and early 2019 is explored to discover the knowledge diffusion trajectory of the emerging research topics field by the key-route main path analysis approach, armed with the traversal weight of search path link count. From the convergence–divergence patterns in the local and global main paths, the development of emerging research topics field can be divided into three different stages: the emergence, exploration and development stages. In the meanwhile, several research drifts can also be observed: (1) from citation-based approaches to machine learning based ones, (2) from the measurement to the identification, and (3) from the papers to the patents. Finally, the directions of future research are suggested.
Article
Using research papers of WoS database as the data source, the present paper adopts the method of Documents Co-citation Analysis, and employs the tool of CiteSpace to analyze the theme, evolution and research trend of Library and Information Science (LIS) research in the recent 30 years from 1989 to 2018. The findings demonstrate that (1) research on LIS develops along the path from the ordering of information to the digging and application of information value. And in this process, the factor of “people” and the network information resources have been receiving more and more attentions; (2) over the past 30 years, research on LIS has mainly focused on areas such as information retrieval, social media, information system, information behavior, bibliometrics and webometrics, scientific evaluation and knowledge management; and (3) in the future, LIS research will mainly focus on six theme areas: metrology research, open government, scientific evaluation, big data, social media and information system. These six areas are all significantly affected by social media.
Article
It is increasing important to identify automatically thematic structures from massive scientific literature. The interdisciplinarity enables thematic structures without natural boundaries. In this work, the identification of thematic structures is regarded as an overlapping community detection problem from the large-scale citation-link network. A mixed-membership stochastic blockmodel, armed with stochastic variational inference algorithm, is utilized to detect the overlapping thematic structures. In the meanwhile, in order to enhance readability, each theme is labeled with soft mutual information based method by several topical terms. Extensive experimental results on the astro dataset indicate that mixed-membership stochastic blockmodel primarily uses the local information and allows for the pervasive overlaps, but it favors similar sized themes, which disqualifies this approach from being used to extract the thematic structures from scientific literature. In addition, the thematic structures from the bibliographic coupling network is similar to those from the co-citation network.
Article
Abstract: The successful introduction and application of smart wearable technologies (SWTs) will allow the production of new generations of innovative and high value-added products. To this aim, we have built a unique database of 1313 patents included in the Thomson Innovation database, which were registered between 2001 and 2015 in the SWTs domain. This study shows the development trends in SWTs both in general and in different products classes, identifies leading countries and companies in different technological classes, and recognises basic patents within each year. Further, cluster analysis is used to identify the most relevant technological clusters and their evolution over time. This study offers a complete overview of the state of the art and of the evolution of SWTs over time, so providing important insights to researchers and managers who, based on this study results, will be able to make more informed decisions on research directions and technology strategies.
Article
Stakeholders in the science system need to decide where to place their bets. Example questions include: Which areas of research should get more funding? Who should we hire? Which projects should we abandon and which new projects should we start? Making informed choices requires knowledge about these research options. Unfortunately, to date research portfolio options have not been defined in a consistent, transparent and relevant manner. Furthermore, we don't know how to define demand for these options. In this article, we address the issues of consistency, transparency, relevance and demand by using a model of science consisting of 91,726 topics (or research options) that contain over 58 million documents. We present a new indicator of topic prominence - a measure of visibility, momentum and, ultimately, demand. We assign over $203 billion of project-level funding data from STAR METRICS to individual topics in science, and show that the indicator of topic prominence, explains over one-third of the variance in current (or future) funding by topic. We also show that highly prominent topics receive far more funding per researcher than topics that are not prominent. Implications of these results for research planning and portfolio analysis by institutions and researchers are emphasized. Available at https://arxiv.org/abs/1709.03453.
Article
Full-text available
Publications in reputable journals are a crucial condition for successful scientific profiling and accomplishment of significant academic results. The primary goal of this paper is to conduct an analysis of the market of economic journals that belong to M20 category and thus have corresponding impact factors. The aim is to emphasize the position of Serbian researchers in this specific market. The empirical results have revealed that journals from the most developed countries have a dominant role on the market and that Serbian researchers publish the results of their studies primarily in neighboring countries. Recommendations are to bring eminent journals into focus of Serbian researchers, but also to encourage further development of domestic journals so that they could be more active in the international market. In addition, the focus of Serbian researchers should be directed toward the hard core of economic science and the goal of further development of economic disciplines should be more clearly emphasized.
Article
A dataset containing 111,616 documents in astronomy and astrophysics (Astro-set) has been created and is being partitioned by several research groups using different algorithms. For this paper, rather than partitioning the dataset directly, we locate the data in a previously created model of the full Scopus database. This allows comparisons between using local and global data for community detection, which is done in an accompanying paper. We can begin to answer the question of the extent to which the rest of a large database (a global solution) affects the partitioning of a smaller journal-based set of documents (a local solution). We find that the Astro-set, while spread across hundreds of partitions in the Scopus map, is concentrated in only a few regions of the map. From this perspective there seems to be some correspondence between local information and the global cluster solution. However, we also show that the within-Astro-set links are only one-third of the total links that are available to these papers in the full Scopus database. The non-Astro-set links are significant in two ways: (1) in areas where the Astro-set papers are concentrated, related papers from non-astronomy journals are included in clusters with the Astro-set papers, and (2) Astro-set papers that have a very low fraction of within-set links tend to end up in clusters that are not astronomy-based. Overall, this work highlights limitations of the use of journal-based document sets to identify the structure of scientific fields.
Chapter
Science is a driving force of positive social evolution. And in the course of this evolution, research systems change as a consequence of their complex dynamics. Research systems must be managed very carefully, for they are dissipative, and their evolution takes place on the basis of a series of instabilities that may be constructive (i.e., can lead to states with an increasing level of organization) but may be also destructive (i.e., can lead to states with a decreasing level of organization and even to the destruction of corresponding systems). For a better understanding of relations between science and society, two selected topics are briefly discussed: the Triple Helix model of a knowledge-based economy and scientific competition among nations from the point of view of the academic diamond. The chapter continues with a part presenting the minimum of knowledge necessary for understanding the assessment of research activity and research organizations. This part begins with several remarks on the assessment of research and the role of research publications for that assessment. Next, quality and performance as well as measurement of quality and latent variables by sets of indicators are discussed. Research activity is a kind of social process, and because of this, some differences between statistical characteristics of processes in nature and in society are mentioned further in the text. The importance of the non-Gaussianity of many statistical characteristics of social processes is stressed, because non-Gaussianity is connected to important requirements for study of these processes such as the need for multifactor analysis or probabilistic modeling. There exist entire branches of science, scientometrics, bibliometrics, informetrics, and webometrics, which are devoted to the quantitative perspective of studies on science. The sets of quantities that are used in scientometrics are mentioned, and in addition, we stress the importance of understanding the inequality of scientific achievements and the usefulness of knowledge landscapes for understanding and evaluating research performance. Next, research production and its assessment are discussed in greater detail. Several examples for methods and systems for such assessment are presented. The chapter ends with a description of an example for a combination of qualitative and quantitative tools in the assessment of research: the English–Czerwon method for quantification of scientific performance.
Chapter
Anticipating future pathways of Science, Technologies, and Innovations is a complex task in any R&D field and is even more challenging for the complex landscape of promising R&D directions in multiple fields. As a solution, this study analyzes research papers in Scientometrics and Technology mining. It presents an approach and text mining tools for building maps of science of a special kind which is called the Map of Science Squared. Nodes of maps corresponding to R&D fields and locations (e.g., as centers of excellence) are created, weighted, and coupled whenever possible based on processing full texts or abstracts of research papers. The questions to answer with this are as follows: (1) Do Scientometrics and Technology mining cover the full range of topics both in terms of breadth and depth? (2) Do research papers appear “at the right time,” i.e., just or soon after emergence of a topic? (3) Do researchers link R&D fields in non-traditional ways through their studies? (4) What fields are locally bound? (5) What conclusions on future pathways of Science, Technologies, and Innovations can be drawn on the basis of the analysis of the Scientometrics and Technology mining agenda?
Chapter
In domain analysis, the purpose is to reveal the contours of held knowledge, whether that be in the form of live discourse or recorded documentation, by analyzing the elements of specific communities who share a common ontology, or knowledge base. The objectives of domain analysis are to map and visualize the intellectual parameters of shared knowledge in a given community, such that results can be put to use in knowledge organization systems for the furtherance of the community’s own discourse and for its intellectual contributions at large. The evolution of empirical methods can be observed in the literature of the domain, which is found predominantly in the proceedings of international ISKO conferences and in the journal Knowledge Organization. Nearly a 100 studies have been conducted and reported in the domain’s formal literature and summarized here. The majority of analytical studies use empirical methods such as metric or terminological techniques, but large numbers of discourse analyses, genre analyses, and epistemological analyses also are attempted. Fewer critical studies and historical analyses have been generated but some are represented. A wide variety of domains has been analyzed in these studies. Approximately 50 domains have been studied 1–4, and 22 studies reported domain analysis of aspects of knowledge organization.
Conference Paper
Full-text available
As data availability and computing resources increase, the ability to create more detailed and accurate global models of science is also increasing. This article reports on two advances in methodology aimed at creating more accurate versions of these highly detailed, dynamic, global models and maps of science. 1) A combined cocitation/ bibliographic coupling approach for assigning current papers to co-citation clusters is introduced, and is found to significantly increase the accuracy of the resulting clusters. 2) A sequentially hybrid approach to producing useful visual maps from models is introduced. Two maps and models - one based on linked annual cocitation/ bibliographic coupling models, and one based on direct citation - are created from a 16-year (1996-2011) set of Scopus data comprising over 20 million documents. The two models are compared and are found to be very complementary to each other.
Conference Paper
Full-text available
We present a novel method for generating an author-based map of the Information Science literature. This method allows each author to have multiple positions on a map (i.e. multiple identities). This is accomplished by using coco-citation analysis, where the rows and columns in the author matrix are co-cited author pairs rather than single cited authors. We compare our results to those found in White & McCain (1998) and Zhao & Strotmann (2008b) to gain some initial insights into the accuracy of this method. We then illustrate how this approach allows an author to have multiple positions on a map of science.
Article
Full-text available
The visualization of scientific field structures is a classic of scientometric studies. This paper presents a domain analysis of the library and information science discipline based on author co-citation analysis (ACA) and journal cocitation analysis (JCA). The techniques used for map construction are the self-organizing map (SOM) neural algorithm, Ward's clustering method and multidimensional scaling (MDS). The results of this study are compared with similar research developed by Howard White and Katherine McCain [1]. The methodologies used allow us to confirm that the subject domains identified in this paper are, as well, present in our study for the corresponding period. The appearance of studies pertaining to library science reveals the relationship of this realm with information science. Especially significant is the presence of the management on the journal maps. From a methodological standpoint, meanwhile, we would agree with those authors who consider MDS, the SOM and clustering as complementary methods that provide representations of the same reality from different analytical points of view. Even so, the MDS representation is the one offering greater possibilities for the structural representation of the clusters in a set of variables.
Article
Full-text available
A useful level of analysis for the study of innovation may be what we call “knowledge communities”—intellectually cohesive, organic inter-organizational forms. Formal organizations like firms are excellent at promoting cooperation, but knowledge communities are superior at fostering collaboration—the most important process in innovation. Rather than focusing on what encourages performance in formal organizations, we study what characteristics encourage aggregate superior performance in informal knowledge communities in computer science. Specifically, we explore the way knowledge communities both draw on past knowledge, as seen in citations, and use rhetoric, as found in writing, to seek a basis for differential success. We find that when using knowledge successful knowledge communities draw from a broad range of sources and are extremely flexible in changing and adapting. In marked contrast, when using rhetoric successful knowledge communities tend to use very similar vocabularies and language that does not move or adapt over time and is not unique or esoteric compared to the vocabulary of other communities. A better understanding of how inter-organizational collaborative network structures encourage innovation is important to understanding what drives innovation and how to promote it.
Article
Full-text available
We document an open-source toolbox for drawing large-scale undirected graphs. This toolbox is based on a previously implemented closed-source algorithm known as VxOrd. Our toolbox, which we call OpenOrd, extends the capabilities of VxOrd to large graph layout by incorporating edge-cutting, a multi-level approach, average-link clustering, and a parallel implementation. At each level, vertices are grouped using force-directed layout and average-link clustering. The clustered vertices are then re-drawn and the process is repeated. When a suitable drawing of the coarsened graph is obtained, the algorithm is reversed to obtain a drawing of the original graph. This approach results in layouts of large graphs which incorporate both local and global structure. A detailed description of the algorithm is provided in this paper. Examples using datasets with over 600K nodes are given. Code is available at www.cs.sandia.gov/~smartin.
Article
Full-text available
The European Conference on Case-Based Reasoning (CBR) in 2008 marked 15 years of international and European CBR conferences where almost seven hundred research papers were published. In this report we review the research themes covered in these papers and identify the topics that are active at the moment. The main mechanism for this analysis is a clustering of the research papers based on both co-citation links and text similarity. It is interesting to note that the core set of papers has attracted citations from almost three thousand papers outside the conference collection so it is clear that the CBR conferences are a sub-part of a much larger whole. It is remarkable that the research themes revealed by this analysis do not map directly to the sub-topics of CBR that might appear in a textbook. Instead they reflect the applications-oriented focus of CBR research, and cover the promising application areas and research challenges that are faced.
Article
Full-text available
The claim that co-citation analysis is a useful tool to map subject-matter specialties of scientific research in a given period, is examined. A method has been devel- oped using quantitative analysis of content-words re- lated to publications in order to: (1) study coherence of research topics within sets of publications citing clus- ters, i.e., (part of) the "current work" of a specialty; (2) to study differences in research topics between sets of publications citing different clusters; and (3) to evalu- ate recall of "current work" publications concerning the specialties identified by co-citation analysis. Empirical support is found for the claim that co-citation analysis identifies indeed subject-matter specialties. However, different clusters may identify the same specialty, and results are far from complete concerning the identified "current work." These results are in accordance with the opinion of some experts in the fields. Low recall of co-citation analysis concerning the "current work" of specialties is shown to be related to the way in which researchers build their work on earlier publications: the "missed" publications equally build on very recent ear- lier work, but are less "consensual" and/or less "atten- tive" in their referencing practice. Evaluation of national research performance using co-citation analysis ap- pears to be biased by this "incompleteness."
Article
Full-text available
In the past several years studies have started to appear comparing the accuracies of various science mapping approaches. These studies primarily compare the cluster solutions resulting from different similarity approaches, and give varying results. In this study we compare the accuracies of cluster solutions of a large corpus of 2,153,769 recent articles from the biomedical literature (2004–2008) using four similarity approaches: co-citation analysis, bibliographic coupling, direct citation, and a bibliographic coupling-based citation-text hybrid approach. Each of the four approaches can be considered a way to represent the research front in biomedicine, and each is able to successfully cluster over 92% of the corpus. Accuracies are compared using two metrics—within-cluster textual coherence as defined by the Jensen-Shannon divergence, and a concentration measure based on the grant-to-article linkages indexed in MEDLINE. Of the three pure citation-based approaches, bibliographic coupling slightly outperforms co-citation analysis using both accuracy measures; direct citation is the least accurate mapping approach by far. The hybrid approach improves upon the bibliographic coupling results in all respects. We consider the results of this study to be robust given the very large size of the corpus, and the specificity of the accuracy measures used. © 2010 Wiley Periodicals, Inc.
Article
Full-text available
Science mapping is discussed in the general context of information visualization. Attempts to construct maps of science using citation data are reviewed, focusing on the use of co-citation clusters. New work is reported on a dataset of about 36,000 documents using simplified methods for ordination, and nesting maps hierarchically. An overall map of the dataset shows the multidisciplinary breadth of the document sample, and submaps allow drilling down to the document level. An effort to visualize these data using advanced virtual reality software is described, and the creation of document pathways through the map is seen as a realization of Bush's (1945) associative trails.
Article
Full-text available
Research on the effects of collaboration in scientific research has been increasing in recent years. A variety of studies have been done at the institution and country level, many with an eye toward policy implications. However, the question of how to identify the most fruitful targets for future collaboration in high-performing areas of science has not been addressed. This paper presents a method for identifying targets for future collaboration between two institutions. The utility of the method is shown in two different applications: identifying specific potential collaborations at the author level between two institutions, and generating an index that can be used for strategic planning purposes. Identification of these potential collaborations is based on finding authors that belong to the same small paper-level community (or cluster of papers), using a map of science and technology containing nearly 1 million papers organized into 117,435 communities. The map used here is also unique in that it is the first map to combine the ISI Proceedings database with the Science and Social Science Indexes at the paper level.
Article
Full-text available
Cohesive intellectual communities called “schools of thought” can provide powerful benefits to those developing new knowledge, but can also constrain them. We examine how developers of new knowledge position themselves within and between schools of thought, and how this affects their impact. Looking at the micro and macro fields of management publications from 1956 to 2002 with an extensive dataset of 113,000+ articles from 41 top journals, we explore the dynamics of knowledge positioning for management scholars. We find that it is significantly beneficial for new knowledge to be a part of a school of thought, and that within a school of thought new knowledge has more impact if it is in the intellectual semi-periphery of the school.
Article
Full-text available
This article describes recent improvements in mapping the world-wide scientific literature. Existing research is extended in three ways. First, a method for generating maps directly from the data on the relationships between hundreds of thousands of documents is presented. Second, quantitative techniques for evaluating these large maps of science are introduced. Third, these techniques are applied to data in order to evaluate eight different maps. The analyses suggest that accuracy can be increased by using a modified cosine measure of relatedness. Disciplinary bias can be significantly reduced and accuracy can be further increased by using much lower threshold levels. In short, much larger samples of papers can and should be used to generate more accurate maps of science.
Article
Full-text available
Summary Mapping of science and technology can be done at different levels of aggregation, using a variety of methods. In this paper, we propose a method in which title words are used as indicators for the content of a research topic, and cited references are used as the context in which words get their meaning. Research topics are represented by sets of papers that are similar in terms of these word-reference combinations. In this way we use words without neglecting differences and changes in their meanings. The method has several advantages, such as high coverage of publications. As an illustration we apply the method to produce knowledge maps of information science.
Article
Full-text available
We compare a new method for measuring research leadership with the traditional method. Both methods are objective and reliable, utilize standard citation databases, and are easily replicated. The traditional method uses partitions of science based on journal categories, and has been extensively used to measure national leadership patterns in science, including those appearing in the NSF Science & Engineering Indicators Reports and in prominent journals such as Science and Nature. Our new method is based on co-citation techniques at the paper level. It was developed with the specific intent of measuring research leadership at a university, and was then extended to examine national patterns of research leadership. A comparison of these two methods provides compelling evidence that the traditional method grossly underestimates research leadership in most countries. The new method more accurately portrays the actual patterns of research leadership at the national level.
Article
Full-text available
A multiple-perspective co-citation analysis method is introduced for characterizing and interpreting the structure and dynamics of co-citation clusters. The method facilitates analytic and sense making tasks by integrating network visualization, spectral clustering, automatic cluster labeling, and text summarization. Co-citation networks are decomposed into co-citation clusters. The interpretation of these clusters is augmented by automatic cluster labeling and summarization. The method focuses on the interrelations between a co-citation cluster's members and their citers. The generic method is applied to a three-part analysis of the field of Information Science as defined by 12 journals published between 1996 and 2008: 1) a comparative author co-citation analysis (ACA), 2) a progressive ACA of a time series of co-citation networks, and 3) a progressive document co-citation analysis (DCA). Results show that the multiple-perspective method increases the interpretability and accountability of both ACA and DCA networks. Comment: 33 pages, 11 figures, 10 tables. To appear in the Journal of the American Society for Information Science and Technology
Article
Full-text available
The visualization of scientific field structures is a classic of scientometric studies. This paper presents a domain analysis of the library and information science discipline based on author co-citation analysis (ACA) and journal cocitation analysis (JCA). The techniques used for map construction are the self-organizing map (SOM) neural algorithm, Ward’s clustering method and multidimensional scaling (MDS). The results of this study are compared with similar research developed by Howard White and Katherine McCain [1]. The methodologies used allow us to confirm that the subject domains identified in this paper are, as well, present in our study for the corresponding period. The appearance of studies pertaining to library science reveals the relationship of this realm with information science. Especially significant is the presence of the management on the journal maps. From a methodological standpoint, meanwhile, we would agree with those authors who consider MDS, the SOM and clustering as complementary methods that provide representations of the same reality from different analytical points of view. Even so, the MDS representation is the one offering greater possibilities for the structural representation of the clusters in a set of variables.
Article
The claim that co-citation analysis is a useful tool to map subject-matter specialties of scientific research in a given period, is examined. A method has been developed using quantitative analysis of content-words related to publications in order to: (1) study coherence of research topics within sets of publications citing clusters, i.e., (part of) the “current work” of a specialty; (2) to study differences in research topics between sets of publications citing different clusters; and (3) to evaluate recall of “current work” publications concerning the specialties identified by co-citation analysis. Empirical support is found for the claim that co-citation analysis identifies indeed subject-matter specialties. However, different clusters may identify the same specialty, and results are far from complete concerning the identified “current work.” These results are in accordance with the opinion of some experts in the fields. Low recall of co-citation analysis concerning the “current work” of specialties is shown to be related to the way in which researchers build their work on earlier publications: the “missed” publications equally build on very recent earlier work, but are less “consensual” and/or less “attentive” in their referencing practice. Evaluation of national research performance using co-citation analysis appears to be biased by this “incompleteness.”
Chapter
This chapter is a comprehensive analysis of one literature-based methodology for ‘modeling’ the intellectual organisation and content of scientific disciplines. The method, called ‘co-citation bibliometric modeling’, provides a detailed description of the international research front. It may describe new inter- or multi-disciplinary developments in science, identify the most rapidly evolving subdisciplinary topic areas, and characterise the research activity of nations and organisations. As a result, it has been or is being explored as an intelligence tool for use in science and technology policy by 6 national governments, and may be of interest to large, high technology-based corporations. The objective here is to provide:
Article
This study presents an extensive domain analysis of a discipline - information science - in terms of its authors. Names of those most frequently cited in 12 key journals from 1972 through 1995 were retrieved from Social Scisearch via DIALOG. The top 120 were submitted to author co-citation analyses, yielding automatic classifications relevant to histories of the field. Tables and graphics reveal: (1) The disciplinary and institutional affiliations of contributors to information science; (2) the specialty structure of the discipline over 24 years; (3) authors' memberships in 1 or more specialties; (4) inertia and change in authors' positions on 2-dimensional subject maps over 3 8-year subperiods, 1972-1979, 1980-1987, 1988-1995; (5) the 2 major subdisciplines of information science and their evolving memberships; (6) "canonical" authors who are in the top 100 in all three subperiods; (7) changes in authors' eminence and influence over the subperiods, as shown by mean co-citation counts; (8) authors with marked changes in their mapped positions over the subperiods; (9) the axes on which authors are mapped, with interpretations; (10) evidence of a paradigm shift in information science in the 1980s; and (11) evidence on the general nature and state of integration of information science. Statistical routines include ALSCAL, INDSCAL, factor analysis, and cluster analysis with SPSS; maps and other graphics were made with DeltaGraph. Theory and methodology are sufficiently detailed to be usable by other researchers.
Conference Paper
This paper deals with document-document similarity approaches, the issue of similarity order, and clustering methods, in the context of science mapping. Using two data sets of bibliographic records, associated with the fields of information retrieval and scientometrics, we investigate how well two document-document similarity approaches, a text-based approach and bibliographic coupling, agree with ground truth classifications (obtained by subject experts), under first-order and second-order similarities, and under four different clustering methods. The clustering methods are average linkage, complete linkage, Ward's method and consensus clustering. The performance of first-order and second-order similarities is compared within the two document-document similarity approaches, and under each clustering method. We also compare the performance of the clustering methods. The results show that the text-based approach consistently outperformed bibliographic coupling with regard to the information retrieval data set, but performed consistently worse than the latter approach regarding the scientometrics data set. For the similarity order issue, second-order similarities performed better than firstorder in 12 out of 16 cases. Average linkage had the best overall performance among the clustering methods, followed by consensus clustering. The main conclusion of the study is that second-order similarities seem to be a better choice than first-order in the science mapping context.
Article
A citation analysis was applied to articles published in the Journal of the American Society for Information Science. The document set consisted of 209 genuine articles from the 1986–1990 SSCI® CD-ROM. To find the intellectual base of these articles a cocitation analysis was made. A map of the most cocited authors shows considerable resemblance to a map of information science produced by other methods. Citation-based bibliographic coupling was applied to the same set of documents in order to define research fronts, i.e., clusters of articles using similar parts of the intellectual base. It is also shown that the research front map has a close correspondence with the man of the intellectual base. © 1994 John Wiley & Sons, Inc.
Article
Science mapping projects have been revived by the advent of virtual reality software capable of navigating large synthetic three dimensional spaces. Unlike the earlier mapping efforts aimed at creating simple maps at either a global or local level, the focus is now on creating large scale maps displaying many thousands of documents which can be input into the new VR systems. This paper presents a general framework for creating large scale document spaces as well as some new methods which perform some of the individual processing steps. The methods are designed primarily for citation data but could be applied to other types of data, including hypertext links.
Article
Previous attempts to map science using the co-citation clustering methodology are reviewed, and their shortcomings analyzed. Two enhancements of the methodology presented in Part I of the paper-fractional citation counting and variable level clustering—are briefly described and a third enhancement, the iterative clustering of clusters, is introduced. When combined, these three techniques improve our ability to generate comprehensive and representative mappings of science across the multidisciplinaryScience Citation Index (SCI) data base. Results of a four step analysis of the 1979SCI are presented, and the resulting map at the fourth iteration is described in detail. The map shows a tightly integrated network of approximate disciplinary regions, unique in that for the first time links between mathematics and biomedical science have brought about a closure of the previously linear arrangement of disciplines. Disciplinary balance between biomedical and physical science has improved, and the appearance of less cited subject areas, such as mathematics and applied science, makes this map the most comprehensive one yet produced by the co-citation methodology. Remaining problems and goals for future work are discussed.
Article
It is shown that the mapping of a particular area of science, in this case information science, can be done using authors as units of analysis and the cocitations of pairs of authors as the variable that indicates their “distances” from each other. The analysis assumes that the more two authors are cited together, the closer the relationship between them. The raw data are cocitation counts drawn online from Social Scisearch (Social Sciences Citation Index) over the period 1972–1979. The resulting map shows (1) identifiable author groups (akin to “schools”) of information science, (2) locations of these groups with respect to each other, (3) the degree of centrality and peripherality of authors within groups, (4) proximities of authors within group and across group boundaries (“border authors” who seem to connect various areas of research), and (5) positions of authors with respect to the map's axes, which were arbitrarily set spanning the most divergent groups in order to aid interpretation. Cocitation analysis of authors offers a new technique that might contribute to the understanding of intellectual structure in the sciences and possibly in other areas to the extent that those areas rely on serial publications. The technique establishes authors, as well as documents, as an effective unit in analyzing subject specialties.
Article
Based on a set of information science papers this study demonstrates that "all author" citationcounts should be preferred when visualizing the structure of research fields. "First author" citationstudies distort the picture in terms of most influential researchers, while the subfield structuretends to be just about the same for both methods.
Article
A hybrid text/citation-based method is used to cluster journals covered by the Web of Science database in the period 2002–2006. The objective is to use this clustering to validate and, if possible, to improve existing journal-based subject-classification schemes. Cross-citation links are determined on an item-by-paper procedure for individual papers assigned to the corresponding journal. Text mining for the textual component is based on the same principle; textual characteristics of individual papers are attributed to the journals in which they have been published. In a first step, the 22-field subject-classification scheme of the Essential Science Indicators (ESI) is evaluated and visualised. In a second step, the hybrid clustering method is applied to classify the about 8300 journals meeting the selection criteria concerning continuity, size and impact. The hybrid method proves superior to its two components when applied separately. The choice of 22 clusters also allows a direct field-to-cluster comparison, and we substantiate that the science areas resulting from cluster analysis form a more coherent structure than the “intellectual” reference scheme, the ESI subject scheme. Moreover, the textual component of the hybrid method allows labelling the clusters using cognitive characteristics, while the citation component allows visualising the cross-citation graph and determining representative journals suggested by the PageRank algorithm. Finally, the analysis of journal ‘migration’ allows the improvement of existing classification schemes on the basis of the concordance between fields and clusters.
Article
In an earlier study by the authors, full-text analysis and traditional bibliometric methods were combined to map research papers published in the journal Scientometrics. The main objective was to develop appropriate techniques of full-text analysis and to improve the efficiency of the individual methods in the mapping of science. The number of papers was, however, rather limited. In the present study, we extend the quantitative linguistic part of the previous studies to a set of five journals representing the field of Library and Information Science (LIS). Almost 1000 articles and notes published in the period 2002–2004 have been selected for this exercise. The optimum solution for clustering LIS is found for six clusters. The combination of different mapping techniques, applied to the full text of scientific publications, results in a characteristic tripod pattern. Besides two clusters in bibliometrics, one cluster in information retrieval and one containing general issues, webometrics and patent studies are identified as small but emerging clusters within LIS. The study is concluded with the analysis of cluster representations by the selected journals.
Article
A co-citation cluster analysis of a three year (1975–1977) cumulation of the Social Sciences Citation Index is described, and clusters of information science documents contained in this data-base are identified using a journal subset concentration measure. The internal structure of the information science clusters is analyzed in terms of co-citations among clusters, and external linkages to fields outside information science are explored. It is shown that clusters identified by the journal concentration method also cohere in a natural way through cluster co-citation. Conclusions are drawn regarding the relationship of information science to the social sciences, and suggestions are made on how these data might be used in planning an agenda for research in the field.
Article
Using an enriched author co-citation analysis (ACA) approach, we map Information Science (IS) for 1996-2005, a decade of explosive development of the World Wide Web, to examine its development during these years compared to previous decades examined by the influential study of IS by W&M98, which demonstrated the power of ACA in visualizing disciplines. The Web, we find, has had a truly profound impact on IS, driving the creation of new disciplines and the revitalization or obsolescence of old ones, but most importantly, beginning to bridge at last the ancient chasm between the “literatures” and “retrieval” IS camps. Simultaneously, the development of the field towards cognitive aspects has intensified. We take the opportunity of this enriched ACA study of IS, which employs both orthogonal and oblique rotations in the Factor Analysis (FA), and which reports both pattern and structure matrices for the latter, to compare the relative merits of these several FA methods in ACA. Each provides interesting information not available from the others, we find, especially when FA results are also visualized in the novel manner we introduce here. We verify these methodological findings in a brief ACA of the XML research field.
Article
Author co-citation analysis (ACA) has frequently been applied over the last two decades for mapping the intellectual structure of a research field as represented by its authors. However, what is mapped in ACA is actually the structure of intellectual influences on a research field as perceived by its active authors. In this exploratory paper, by contrast, we introduce author bibliographic coupling analysis (ABCA) as a method to map the research activities of active authors themselves for a more realistic picture of the current state of research in a field. We choose the Information Science (IS) field and study its intellectual structure both in terms of current research activities as seen from ABCA and in terms of intellectual influences on its research as shown from ACA. We examine how these two aspects of the intellectual structure of the IS field are related, and how they both developed during the “first decade of the Web”, 1996-2005. We find that these two citation-based author mapping methods complement each other, and that, in combination, they provide a more comprehensive view of the intellectual structure of the IS field than either of them can provide on its own.
Article
A citation analysis was applied to articles published in the Journal of the American Society for Information Science. The document set consisted of 209 genuine articles from the 1986–1990 SSCI® CD-ROM. To find the intellectual base of these articles a cocitation analysis was made. A map of the most cocited authors shows considerable resemblance to a map of information science produced by other methods. Citation-based bibliographic coupling was applied to the same set of documents in order to define research fronts, i.e., clusters of articles using similar parts of the intellectual base. It is also shown that the research front map has a close correspondence with the man of the intellectual base. © 1994 John Wiley & Sons, Inc.
Article
In their 1998 article "Visualizing a discipline: An author cocitation analysis of information science, 1972-1995," White and McCain used multidimensional scaling, hierarchical clustering, and factor analysis to display the specialty groupings of 120 highly-cited ("paradigmatic") information scientists. These statistical techniques are traditional in author cocitation analysis (ACA). It is shown here that a newer technique, Pathfinder Networks (PFNETs), has considerable advantages for ACA. In PFNETs, nodes represent authors, and explicit links represent weighted paths between nodes, the weights in this case being cocitation counts. The links can be drawn to exclude all but the single highest counts for author pairs, which reduces a network of authors to only the most salient relationships. When these are mapped, dominant authors can be defined as those with relatively many links to other authors (i.e., high degree centrality). Links between authors and dominant authors define specialties, and links between dominant authors connect specialties into a discipline. Maps are made with one rather than several computer routines and in one rather than many computer passes. Also, PFNETs can, and should, be generated from matrices of raw counts rather than Pearson correlations, which removes a computational step associated with traditional ACA. White and McCain's raw data from 1998 are remapped as a PFNET. It is shown that the specialty groupings correspond closely to those seen in the factor analysis of the 1998 article. Because PFNETs are fast to compute, they are used in AuthorLink, a new Web-based system that creates live interfaces for cocited author retrieval on the fly.
Article
This study presents an extensive domain analysis of a discipline—information science—in terms of its authors. Names of those most frequently cited in 12 key journals from 1972 through 1995 were retrieved from Social Scisearch via DIALOG. The top 120 were submitted to author co-citation analyses, yielding automatic classifications relevant to histories of the field. Tables and graphics reveal: (1) The disciplinary and institutional affiliations of contributors to information science; (2) the specialty structure of the discipline over 24 years; (3) authors' memberships in 1 or more specialties; (4) inertia and change in authors' positions on 2-dimensional subject maps over 3 8-year subperiods, 1972–1979, 1980–1987, 1988–1995; (5) the 2 major subdisciplines of information science and their evolving memberships; (6) “canonical” authors who are in the top 100 in all three subperiods; (7) changes in authors' eminence and influence over the subperiods, as shown by mean co-citation counts; (8) authors with marked changes in their mapped positions over the subperiods; (9) the axes on which authors are mapped, with interpretations; (10) evidence of a paradigm shift in information science in the 1980s; and (11) evidence on the general nature and state of integration of information science. Statistical routines include ALSCAL, INDSCAL, factor analysis, and cluster analysis with SPSS; maps and other graphics were made with DeltaGraph. Theory and methodology are sufficiently detailed to be usable by other researchers. © 1998 John Wiley & Sons, Inc.
Article
Based on articles published in 1990–2004 in 21 library and information science (LIS) journals, a set of cocitation analyses was performed to study changes in research fronts over the last 15 years, where LIS is at now, and to discuss where it is heading. To study research fronts, here defined as current and influential cocited articles, a citations among documents methodology was applied; and to study changes, the analyses were time-sliced into three 5-year periods. The results show a stable structure of two distinct research fields: informetrics and information seeking and retrieval (ISR). However, experimental retrieval research and user oriented research have merged into one ISR field; and IR and informetrics also show signs of coming closer together, sharing research interests and methodologies, making informetrics research more visible in mainstream LIS research. Furthermore, the focus on the Internet, both in ISR research and in informetrics—where webometrics quickly has become a dominating research area—is an important change. The future is discussed in terms of LIS dependency on technology, how integration of research areas as well as technical systems can be expected to continue to characterize LIS research, and how webometrics will continue to develop and find applications.
Article
We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields. © 2010 Wiley Periodicals, Inc.
Article
Although it is generally understood that different citation counting methods can produce quite different author rankings, and although “optimal” author co-citation counting methods have been identified theoretically, studies that compare author co-citation counting methods in author co-citation analysis (ACA) studies are still rare. The present study applies strict all-author-based ACA to the Information Science (IS) field, in that all authors of all cited references in a classic IS dataset are counted, and in that even the diagonal values of the co-citation matrix are computed in their theoretically optimal form. Using Scopus instead of SSCI as the data source, we find that results from a theoretically optimal all-author ACA appear to be excellent in practice, too, although in a field like IS where co-authorship levels are relatively low, its advantages over classic first-author ACA appear considerably smaller than in the more highly collaborative ones targeted before. Nevertheless, we do find some differences between the two approaches, in that first-author ACA appears to favor theorists who presumably tend to work alone, while all-author ACA appears to paint a somewhat more recent picture of the field, and to pick out some collaborative author clusters.
Article
In this study direct citations are weighted with shared references and co-citations in an attempt to decompose a citation network of articles on the subject of library and information science. The resulting maps have much in common with author co-citation maps that have been previously presented. However, using direct citations yields somewhat more detail in terms of detecting sub-domains. Reducing the network down to the strongest links of each article yielded the best results in terms of a high number of clusters, each with a substantial number of articles similar in content.
Article
Based on previous findings and theoretical considerations, it was suggested that bibliographic coupling could be combined with a cluster method to provide a method for science mapping, complementary to the prevailing co-citation cluster analytical method. The complete link cluster method was on theoretical grounds assumed to provide a suitable cluster method for this purpose. The objective of the study was to evaluate the proposed method's capability to identify coherent research themes. Applying a large multidisciplinary test bed comprising more than 600,000 articles and 17 million references, the proposed method was tested in accordance with two lines of mapping. In the first line of mapping, all significant (strong) links connecting ‘core documents’ (strongly and frequently coupled documents) in clusters with any other core document was mapped. This resulted in a depiction of all significant artificially broken links between core documents in a cluster and core documents extrinsic to that cluster. The second line of mapping involved the application of links between clusters only. They were used to successively merge clusters on two subsequent levels of fusion, where the first generation of clusters were considered objects for a second clustering, and the second generation of clusters gave rise to a final cluster fusion. Changes of cluster composition on the three levels were evaluated with regard to several variables. Findings showed that the proposed method could provide with valid depictions of current research, though some severe restrictions would adhere to its application.
Article
The authors describe a large-scale, longitudinal citation analysis of intellectual trading between information studies and cognate disciplines. The results of their investigation reveal the extent to which information studies draws on and, in turn, contributes to the ideational substrates of other academic domains. Their data show that the field has become a more successful exporter of ideas as well as less introverted than was previously the case. In the last decade, information studies has begun to contribute significantly to the literatures of such disciplines as computer science and engineering on the one hand and business and management on the other, while also drawing more heavily on those same literatures.
Article
In an earlier study the authors have shown that bibliographic coupling techniques can be used to identify ‘hot’ research topics. The methodology is based on appropriate thresholds for both number of related documentsand the strength of bibliographic links. Those papers are calledcore documents that have more than 9 links of at least the strength 0.25 according toSalton's measure, provided they are articles, notes or reviews. This choice resulted in a selection of nearly one per cent of all papers of the above types recorded in the 1992 annual cumulation of the SCI.Core documents proved important nodes in the network of documented science communication. In the present study, the set ofcore documents is analysed by journals, subfields and corporate addresses. The latter analysis is conducted on both national and regional-institutional level. First all countries which have published at least 20 core documents in 1992 are investigated in terms of their research profiles, their international collaboration patterns and their citation impact. Finally, those eight members of the European Union which have published at least 20 core documents in 1992 are analysed in respect of regional and institutional distribution of core documents.
Positioning knowledge: Schools of thought and new knowledge creation Mapping research topics using word-reference co-occurrences: A method and an exploratory case study
  • S P Upham
  • L Rosenkopf
  • L H P Ungar
  • G Heimeriks
Upham, S. P., Rosenkopf, L., & Ungar, L. H. (2010b). Positioning knowledge: Schools of thought and new knowledge creation. Scientometrics, 83(2), 555-581. van den Besselaar, P., & Heimeriks, G. (2006). Mapping research topics using word-reference co-occurrences: A method and an exploratory case study. Scientometrics, 68(3), 377-393.
OpenOrd: An open-source toolbox for large graph layout, Conference on Visualization and Data Analysis
  • S Martin
  • W M Brown
  • R Klavans
  • K W Boyack
Martin, S., Brown, W. M., Klavans, R., & Boyack, K. W. (2011). OpenOrd: An open-source toolbox for large graph layout, Conference on Visualization and Data Analysis 2011. San Francisco, CA.
Innovating knowledge communities: An analysis of group collaboration and competition in science and technology
  • Upham
Update on science mapping: Creating large document spaces
  • Small