Article

Missing author address information in Web of Science—An explorative study

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Studies also propose methods, including using publication languages and the affiliation history of the authors, to predict the affiliation country/territory information in these records (Mryglod & Nazarovets, 2023;Savchenko & Kosyakov, 2022). However, it is generally accepted that the null affiliation country/territory field in these established bibliographic databases is indicative of the absence of author affiliation address information, which has been probed by Liu et al. (2018) focusing on Web of Science Core Collection. For instance, records published anonymously typically lack author affiliation address information (Li & Zhang, 2024;Shamsi et al., 2022). ...
... A typical author affiliation address is formatted as "School name, University name, City, State, Country". A string comprises alphanumeric characters ranging from A to Z and 0 to 9 with the aid of the wildcard "*" as follows is employed to ascertain whether the value of a designated field in Scopus and Web of Science Core Collection is null or not (Liu et al., 2018). Records with non-null values identified in the author address field in Web of Science Core Collection (field tag: AD) and the affiliation field in Scopus (field tag: AFFIL) are called "affiliated records". ...
... A significant contributing factor is the tendency of some authors to provide only part of the author affiliation address information in their publications (e.g., the affiliation organization and city information), often omitting the affiliation country/territory information. However, to address these kinds of publications, Clarivate's Web of Science Core Collection has employed a method of inference to supplement the affiliation country/territory information based on the existing incomplete affiliation addresses (Liu et al., 2018). This positive and responsible action explains the scarcity of affiliated but countryundefined records within the Web of Science Core Collection. ...
Article
Full-text available
Scopus is increasingly regarded as a high‐quality and reliable data source for research and evaluation of scientific and scholarly activity. However, a puzzling phenomenon has been discovered occasionally: millions of records with author affiliation information collected in Scopus are oddly labeled as “country‐undefined” by Scopus, which is rarely detected in its counterpart Web of Science. This huge number of “homeless” records in Scopus will challenge the reliability of various Scopus‐based literature retrieval, analysis and evaluation and therefore is unacceptable for a widely used high‐quality bibliographic database. By using data from the past 124 years, this article tries to probe these affiliated but country‐undefined records in Scopus. Our analysis identifies four primary causes for these “homeless” records: incomplete author affiliation addresses, Scopus' inability to recognize different variants of country/territory names, misspelled country/territory names in author affiliation addresses, and Scopus' insufficiency in correctly splitting and identifying the clean affiliation addresses. To address this pressing issue, we put forward several recommendations to relevant stakeholders, with the aim of resettling millions of “homeless” records in Scopus and reducing its potential impact on Scopus‐based literature retrieval, analysis, and evaluation.
... Few studies have applied bibliometrics to analyze missing academic author metadata. Liu et al. (2018) explored the articles without authors' addresses indexed publications in WoS. They stated that the address information was the foundation of various bibliometric analyses to investigate collaborations across organizations, countries, and regions. ...
... They stated that the address information was the foundation of various bibliometric analyses to investigate collaborations across organizations, countries, and regions. Ignoring the missing information could lead to inaccurate findings and confusion (Liu et al., 2018). Authors' names are more critical than addresses. ...
... In this sense, our findings corroborate with the appointments given by Liu et al. (2018), by stating that ignoring missing data in bibliometric analysis can lead to inaccurate findings. Jacsó (2009), for example, states that missing information, such as the absence of country classification, can result in scale distortions. ...
Article
Full-text available
Publications without authorship information have been indexed as anonymous in the Web of Science database over the years. However, discussions on this subject have not been sufficiently addressed in the scholarly literature. Since bibliometrics studies are widely used for bibliometricians, scientific disciplines, science policy, and management, missing significant data as authorship metadata characterizes a gray zone that directly impacts these three components, and by extension, for bibliometrics and scientometrics. With a data collection performed at Web of Science Core Collection (WoSCC), 1,420,842 documents under "anonymous" authorship from 1900 to 2021 were retrieved, which accounted for 1.5% of the total documents indexed in the WoSCC. The publication data such as yearly growth of research publications, document type, language, productive research areas, and other bibliometric indicators were analyzed. The findings showed that in absolute numbers, a considerable growth of anonymous publications between 1996 and 2009, and there was a downward trend after that. However, this increase has not been proportional to the growth in the total number of publications indexed in the WoSCC. Articles, editorial materials, and news items were the top three document types among the WoSCC-indexed publications as anonymous. This study also finds two main scenarios of indexing publications as anonymous. The first is associated with the historical context of scholarly communication and practices that persist. The second is characterized by indexing persistent problems. This study suggests minimizing the error in databases, enabling an error-free indexing system and accurate bibliometrics studies. Full-text access to a view-only version via SharedIt link: https://rdcu.be/cTzDl
... Because publication and citation data have become increasingly important for accessing performance of individual authors and institutions, several studies have focused on the accuracy and applicability of author [70,71] and institution [72,73] information provided in WoS and Scopus. Meanwhile, the coverage and accuracy of funding information in WoS [74][75][76][77][78] and/or Scopus [79,80] has also attracted considerable attention. ...
... The study showed that more than one-fifth of the publications in WoS lacked address information, while full-text analyses have revealed that about 40% of the articles actually had at least a partial address information listed, and a part of investigated publications had full address information, but, for some reason, it was not indexed by WoS. Meanwhile, for the remaining publications with missing address information in WoS, the information was also not declared in the publications themselves, which was probably due to the different editorial policies of journals [73]. ...
... In this case, the omission of country name is of the highest importance, as this part of address is often exploited in the extraction of data for bibliometric analyses or evaluations. The issue was investigated among publications that were (co)authored with USA researchers, where a significant amount of publications only contained state information, but missed the name of the country [73]. ...
Article
Full-text available
Nowadays, the importance of bibliographic databases (DBs) has increased enormously, as they are the main providers of publication metadata and bibliometric indicators universally used both for research assessment practices and for performing daily tasks. Because the reliability of these tasks firstly depends on the data source, all users of the DBs should be able to choose the most suitable one. Web of Science (WoS) and Scopus are the two main bibliographic DBs. The comprehensive evaluation of the DBs’ coverage is practically impossible without extensive bibliometric analyses or literature reviews, but most DBs users do not have bibliometric competence and/or are not willing to invest additional time for such evaluations. Apart from that, the convenience of the DB’s interface, performance, provided impact indicators and additional tools may also influence the users’ choice. The main goal of this work is to provide all of the potential users with an all-inclusive description of the two main bibliographic DBs by gathering the findings that are presented in the most recent literature and information provided by the owners of the DBs at one place. This overview should aid all stakeholders employing publication and citation data in selecting the most suitable DB.
... Scholars, including the authors' research team, have carried out a series of studies on the characteristics and defects of the widely-used bibliographic databases, including the accuracy and completeness of funding information (Álvarez-Bornstein et al., 2017;Liu, 2020;Tang et al., 2017), the accuracy of Digital Object Identifier (Franceschini et al., 2015;Zhu et al., 2019), author address missing (Liu et al., 2018), record missing (Krauskopf, 2019;Liu et al., 2021). Therefore, many scholars repeatedly warn that high caution is needed when using bibliometric data, even for the well-known Scopus and Web of Science Core Collection Krauskopf, 2017). ...
... Scopus needs to strike a new balance between recall and precision. Besides, similar action has already been taken by Web of Science Core Collection even if it still has room for improvement (Liu et al., 2018). ...
Article
Full-text available
Identification of national research output is very common in bibliometric analyses and research evaluation. Searching appropriate country/territory names and using the country/territory refinement option are two methods provided by Scopus/Web of Science Core Collection to directly identify national research output. Using multi-source data, a recent but representative paper published in Scientometrics mapped computer science research in Africa, which was an important supplement to the existing literature. Forty-eight sub-Saharan African country/territory names were adopted to search in the country/territory field in Scopus and Web of Science Core Collection respectively to identify research output of sub-Saharan Africa. By repeating the above study, we find that the identification method is not always used accurately in practice, which may have some effect on the reliability of research. In addition, we further find that the above two national research output identification methods obtain different results for some countries/territories when using the Scopus database. Therefore, we probe this anomaly and try to provide explanations and give follow-up suggestions. We hope that the information conveyed in this brief communication will promote scholars to use the bibliographic databases more carefully in identifying national research output and provide more detailed data retrieval information to readers. At the same time, the providers of bibliographic databases should improve their products constantly based on recommendations from academics.
... In 2018, Chinese scientists published a large-scale study (Liu et al., 2018) on the absence of information about the addresses of authors. The authors used three main WoS citation indexes for their analysis: SCIE, SSCI, and A&HCI. ...
... When considering WoS, he used two time periods : 1970-2009 and 1980-2009, the assessment for them did not differ-14%, which is in good agreement with our data. In (Liu et al., 2018), for the three main WoS indices, an estimate of 20% of unaffiliated publications is given, but the period 1900-2015 is considered, which also affects the result. These results also confirm the found overall trend towards an increase in the numbers of correctly affiliated publications over time. ...
Article
Correct attribution of scientific publications with the countries of authors is vital for many scientometric tasks. Despite this, the Scopus and Web of Science databases contain a significant number of publications without affiliation with any country, which leads to their “loss” in the analysis results. This includes “scientific” types of documents—articles, reviews, conference proceedings, etc. It has been shown that the practice of excluding such publications from consideration in scientometric studies can lead to a significant distortion of their results. We propose a method that allows predicting, with reasonable accuracy, the connection between a publication and a country based on the affiliation history of the authors. This article analyses the significance of losses due to unaffiliated articles for the top 20 countries in terms of the number of publications indexed in Scopus over the past 20 years. The importance of this problem has been declining in recent years, but is still noticeable especially for developed countries.
... However, note that fractional counting also has limitations because usually the contribution of each co-author is not equal. Another limitation from these databases, and very common for old information, is that in some articles the author affiliation address and country information are missing, and even the author's name in some cases [56][57][58]. Therefore, when this limitation appears, it is difficult to provide accurate results of the information. ...
Article
Full-text available
Founded in 1824, the Annals of the New York Academy of Sciences (ANYAS) is a distinguished international journal that embraces various scientific disciplines. In 2024, the journal marks its 200th anniversary. To honor this remarkable milestone, this article provides a thorough bibliometric analysis of the journal's publications. The aim is to identify the main trends in the journal, particularly over the past few decades. Bibliographic data have been gathered from the Web of Science Core Collection and Scopus databases. The study also uses VOSviewer software to create and visualize bibliometric maps. This analysis reveals that researchers affiliated with American institutions are the most productive authors, surpassing their peers from other countries, with notable contributions also coming from France and Israel. The United States of America emerges as the leading nation in the total number of publications and citations, followed by the United Kingdom and Germany. Additionally, an in-depth examination of keywords and topics illustrates that ANYAS encompasses a diverse range of subjects, prominently featuring chemistry, hematology, and psychology research. This breadth of exploration underscores the journal's role as a significant platform for advancing scientific knowledge across multiple domains.
... However, note that fractional counting also has limitations because usually the contribution of each co-author is not equal. Another limitation from these databases, and very common for old information, is that in some articles the author affiliation address and country information are missing, and even the author's name in some cases [56][57][58]. Therefore, when this limitation appears, it is difficult to provide accurate results of the information. ...
Article
Full-text available
Founded in 1824, the Annals of the New York Academy of Sciences (ANYAS) is a distinguished international journal that embraces various scientific disciplines. In 2024, the journal marks its 200th anniversary. To honor this remarkable milestone, this article provides a thorough bibliometric analysis of the journal's publications. The aim is to identify the main trends in the journal, particularly over the past few decades. Bibliographic data have been gathered from the Web of Science Core Collection and Scopus databases. The study also uses VOSviewer software to create and visualize bibliometric maps. This analysis reveals that researchers affiliated with American institutions are the most productive authors, surpassing their peers from other countries, with notable contributions also coming from France and Israel. The United States of America emerges as the leading nation in the total number of publications and citations, followed by the United Kingdom and Germany. Additionally, an in‐depth examination of keywords and topics illustrates that ANYAS encompasses a diverse range of subjects, prominently featuring chemistry, hematology, and psychology research. This breadth of exploration underscores the journal's role as a significant platform for advancing scientific knowledge across multiple domains.
... As can be observed, some of the databases have not indexed it while, among the databases that have indexed the paper, the variety in the number of citations is large. Furthermore, as acknowledged in the scientific literature, WoS comes with a series of other limitations related to the absence of the authors in the studies included in the dataset [7,103,104] or the regional bias [105]. Even though we acknowledge these limitations, it shall be stated that in the case of the dataset included in our paper, the absence of the authors was not signalled by Biblioshiny in the initial step in which the dataset was scanned. ...
Article
Full-text available
Grey systems theory, through the special mathematics and methods offered, such as through seeing numbers as intervals rather than fixed values, provides a bridge between the two extreme cases in which a system under investigation might find, namely, a white system, easy to read and understand, and a black system, completely unknown to the investigator. Since its appearance in 1982, the theory has contributed to solving various challenges traditionally addressed through complex means. The paper provides a comprehensive perspective on the evolution of the grey systems domain over the 42-year period analysed, spanning from 1982 to 2024. Utilizing a dataset extracted from the Clarivate Analytics’ Web of Science Core Collection database, the paper conducts a bibliometric analysis that includes the identification of key journals, affiliations, authors, and countries, as well as the collaboration networks among authors and countries. It also analyses the most frequently used keywords and authors’ keywords. The annual growth rate of 12.99% indicates a sustained interest among researchers. Using the Biblioshiny 4.2.3 library in R version 4.4.1, a variety of visualisations have been created, including thematic maps and WordClouds. A detailed review of the most cited papers has been performed to highlight the role of grey systems in advancing intelligent decision-making techniques. In terms of results, it has been observed that the university with the highest contribution to the field is the Nanjing University of Aeronautics and Astronautics while the most influential figure in the area of grey systems in terms of the number of published papers is Sifeng Liu. As expected, China, the home of grey systems theory, is the country with the most notable contribution in terms of published papers and international collaboration networks.
... For WoS, we found some data on missing affiliation entries: Liu et al. (2017, p. 361) reported that 5% of the Science Citation Index Expanded (SCIE) records, 9% of the Social Science Citation Index, and 42% of the Arts & Humanities Citation Index records did not contain an institution name. There are two reasons for the absence of affiliation information: the articles themselves do not report it (about 60% of all cases examined by Liu et al., 2018) or they are not indexed in the WoS (40%). For articles from Spain and SCIE, García-Zorita et al. (2006) describe that 65% of the older literature (from 1985 to 1997) do not have a research address, while this is the case for 99.8% of papers from 1998 to 2004. ...
Article
Full-text available
Describing, analyzing, and evaluating research institutions are among the main tasks of scientometrics and research evaluation. But how can we optimally search for an institution's research output? Possible search arguments include institution names, affiliations, addresses, and affiliated authors' names. Prerequisites of these search tasks are complete lists (or at least good approximations) of the institutions' publications, and - in later steps - their citations, and topics. When searching for the publications of research institutions in an information service, there are two options, namely (1) searching directly for the name of the institution and (2) searching for all authors affiliated with the institution in a defined time interval. Which strategy is more effective? More specifically, do informetric indicators such as recall and precision, search recall and search precision, and relative visibility change depending on the search strategy? What are the reasons for differences? To illustrate our approach, we conducted an illustrative study on two information science institutions and identified all staff members. The search was performed using the Web of Science Core Collection (WoS CC). As a performance indicator, applying fractional counting and considering co-affiliations of authors, we used the institution's relative visibility in an information service. We also calculated two variants of recall and precision at the institution level, namely search recall and search precision as informetric measures of performance differences between different search strategies (here: author search versus institution search) on the same information service (here: WoS CC) and recall and precision in relation to the complete set of an institution's publications. For all our calculations, there is a clear result: Searches for affiliated authors outperform searches for institutions in WoS. However, especially for large institutions it is difficult to determine all the staff members in the time interval of research. Additionally, information services (including WoS) are incomplete and there are variants for the names of institutions in the services. Therefore, searching for institutions and the publication-based quantitative evaluation of institutions are very critical issues.
... The Web of Science database was used as a search platform for academic research, considered by researchers as a robust collection where works of great relevance and international impact can be found (Liu et al., 2018;Liu et al., 2021). The time horizon to guide the development of this research was inserted into a radius between 1945 and 2021. ...
Article
Full-text available
A pandemia da COVID-19 impulsionou as compras online, exigindo que organizações aprimorem a qualidade do serviço de seus sites. Diante da falta de conhecimento prévio sobre os fatores que satisfazem os consumidores, esta pesquisa realiza uma revisão sistemática para identificar dimensões e atributos críticos que influenciam a qualidade do serviço do site e a satisfação do consumidor em um contexto pós-pandemia. Utilizando a Teoria da Abordagem Meta-Analítica Consolidada (TEMAC), a revisão segue três fases estruturadas: preparação da pesquisa, apresentação e inter-relação dos dados, e detalhamento e integração do modelo. Essa metodologia aplica princípios bibliométricos para sintetizar os achados de 88 artigos selecionados. A análise identifica onze dimensões essenciais da qualidade do serviço do site: facilidade de uso, design/aspecto do site, pesquisa online, segurança/privacidade, personalização, eficiência, disponibilidade do sistema, confiabilidade, qualidade das informações, capacidade de resposta/garantia/confiança e entrega/cumprimento. Os resultados oferecem insights valiosos para organizações que buscam melhorar seus serviços online. Ao priorizar essas dimensões, empresas podem formular estratégias para aumentar a satisfação do consumidor, fortalecer a confiança e proporcionar uma experiência superior de compras online. Este estudo contribui para a literatura ao apresentar uma estrutura abrangente sobre qualidade do serviço do site, com implicações acadêmicas e práticas.
... Although our bibliometric research provides some implications in the HCI healthcare area, several limitations should be addressed in the future. First, although the WoS is a representative and widely used source for bibliometric analysis, WoS has some problems such as the error of DOI, missing author address, regional bias, and under-representation of non-English publications [13,[57][58][59]. Thus, future research can employ other sources such as Scopus, IEEE Xplore, or PubMed to provide a better understanding of the HCI healthcare area. ...
Article
Full-text available
Background/Objectives: Studies on the application and exploration of human–computer interaction (HCI) technologies within the healthcare sector have rapidly expanded, showcasing the immense potential of HCI to enhance medical services, elevate patient experiences, and advance health management. Despite this proliferating interest, there is a notable shortage of comprehensive bibliometric analyses dedicated to the application of HCI in healthcare, which limits a thorough comprehension of the growth trends and future trajectories in this area. Methods: To bridge this gap, we employed bibliometric methods using the CiteSpace tool to systematically review and analyze the current state and trends of HCI research in healthcare. A meticulous topic search of Web of Science yielded 3598 papers published between 2004 and 2023. Results: Through literature analysis, the most productive researchers, institutes, and countries/territories and the collaboration networks among authors and countries within the field were analyzed. Additionally, by conducting a co-citation analysis, journals and literature with high citation rates and influence within the academic community in this field were revealed. Through a cluster analysis based on literature co-citations and keyword burst analyses, we further explored the main research themes and hot topics within the fields of healthcare and HCI. Conclusions: In summary, through a comprehensive and systematic bibliometric analysis, this study provides a solid knowledge foundation for HCI in the healthcare research community, thereby fostering the development of innovative research and the optimization of practical applications in the field.
... Both Scopus and Web of Science are treated as reliable databases and are widely used by researchers [4,5]. Meanwhile, to facilitate the wise use of the famous Web of Science database, many researchers have also been trying to reveal the flaws, shortcomings and features of this widely used and classical database to users from different perspectives [6][7][8][9][10]. At the same time, this kind of quality or feature inspection research can allow users to make more informed use of these databases and their derivatives and also help the databases themselves improve the data quality of their products. ...
Article
Full-text available
Similar to the Web of Science, Scopus is also a widely used abstract and citation database. Researchers typically employ the Year of Publication or Date of Publication field in Scopus to retrieve, filter and analyse indexed records. However, the inconsistent retrieval results obtained by these two fields in Scopus, which was occasionally observed in this study, may cause confusion among users. In this brief research article, we seek to elucidate this phenomenon by utilising indexed records in Scopus from the past 50 years. Empirical evidence indicates that inconsistent retrieval results retrieved by these two search fields are attributable to discrepancies in the publication year information provided in the Year of Publication and Date of Publication fields in Scopus. Specifically, missing year information in the Date of Publication field, incorrect year information in the Date of Publication field or in the Year of Publication field, and inconsistent use of different versions of publication dates in these two fields are four representative causes for the observed inconsistencies in retrieval results in Scopus. This article concludes by outlining the potential consequences of these issues and suggesting ways to effectively address them.
... Herrmannova and Knoth (2016) tested the reliability of the publication date in Microsoft Academic Graph, finding that 88% of cases showed a correct date. Liu et al. (2018) detected that approximately 20% of WoS publications have no information from the address field. Basson et al. (2022) showed databases, to test their suitability for bibliometric studies or for bibliographic searches only. ...
Article
Full-text available
The main objective of this study is to compare the amount of metadata and the completeness degree of research publications in new academic databases. Using a quantitative approach, we selected a random Crossref sample of more than 115k records, which was then searched in seven databases (Dimensions, Google Scholar, Microsoft Academic, OpenAlex, Scilit, Semantic Scholar, and The Lens). Seven characteristics were analyzed (abstract, access, bibliographic info, document type, publication date, language, and identifiers), to observe fields that describe this information, the completeness rate of these fields, and the agreement among databases. The results show that academic search engines (Google Scholar, Microsoft Academic, and Semantic Scholar) gather less information and have a low degree of completeness. Conversely, third-party databases (Dimensions, OpenAlex, Scilit, and The Lens) have more metadata quality and a higher completeness rate. We conclude that academic search engines lack the ability to retrieve reliable descriptive data by crawling the Web, while the main problem of third-party databases is the loss of information derived from integrating different sources. Peer Review https://www.webofscience.com/api/gateway/wos/peer-review/10.1162/qss_a_00286
... Hence, the general frequencies and types of errors that occur in both It falls within the evaluation of the study presented by the researcher to determine the seriousness of his commitment to the imposed methodology after extensively presenting the research to databases [61,62,63,64], citation information and links[40,65,66], and testing incorrect or missing DOI numbers from Research [53,[67][68][69], also accuracy of duplicate entries [30], and inconsistency testing .Publication dates are in references[53]. To ensure scientific competencies regarding the performance of authors and those assigned to conduct research from individual institutions, the LCAI focuses on: The accuracy and applicability of the author's information as it is in the global context[70,71,72,73], according to the exact information we get from WoS[74][75][76][77][78] and Scopus (79,80 .(Thus, even scientific journals can be classified in WoS [81]. ...
Article
Full-text available
LCAI decided to supervise the scientific publishing of Libyan researchers in Scopus magazines, which requires the researcher to carefully prepare the scientific content to be of high quality, commensurate with the bibliographic databases (DBs) as the main destination for descriptive data for publications in Libya and the world, which makes our commitment to bibliometric indicatorsused globally Whether for research evaluation practices or for performing tasks efficiently. This study deals with the importance of databases, provided that researchers are able to choose the most appropriate from the Web of Science (WoS) and Scopus as bibliographic databases, and to determine the quality standards that must be available in the methodology of scientific research and articles to be ableThe Center is approved by the support and publication. keywords : Scientific research, research methodology, scientific articles, qualitative research, quantitative research The nature and steps of research preparation,
... While there are several aspects related to the inaccuracy in bibliometric databases, in this work we only focus on affiliation information. The study of Weishu Liu and colleagues [13] pointed out that the lack of author address information in WoS is a significant problem. This problem was also presented in Krauskopf's research [14,15]. ...
Article
Today, bibliometric databases are indispensable sources for researchers and research institutions. The main role of these databases is to find research articles and estimate the performance of researchers and institutions. Regarding the evaluation of the research performance of an organization, the accuracy in determining institutions of authors of articles is decisive. However, current popular bibliometric databases such as Scopus and Web of Science have not addressed this point efficiently. To this end, we propose an approach to revise the authors’ affiliation information of articles in bibliometric databases. We build a model to classify articles to institutions with high accuracy by assembling the bag of words and n-grams techniques for extracting features of affiliation strings. After that, these features are weighted to determine their importance to each institution. Affiliation strings of articles are transformed into the new feature space by integrating weights of features and local characteristics of words and phrases contributing to the sequences. Finally, on the feature space, the support vector classifier method is applied to learn a predictive model. Our experimental result shows that the proposed model’s accuracy is about 99.1%. Keywords:Affiliation, Disambiguation, Data cleaning, Classification, Supervised learning, if-iif, Support vector machine, Support vector classifier References[1] B. Shereen Hanafi, Discover the data behind the times higher education world university rankings, Elsevier Connect.[2] Dobrota, M. Bulajic, L. Bornmann, V. Jeremic, A new approach to the qs university ranking using the composite i-distance indicator: Uncertainty and sensitivity analyses, JASIST 67 (2016) 200-211.[3] -P. Pavel, Global university rankings - a comparative analysis, Procedia Economics and Finance 26 (2015) 54-63. https://doi.org/10.1016/S2212-5671(15)00838-2.[4] Web of science databases, Clarivate Analytics.[5] F. Burnham, Scopus database: a review, Biomedical Digital Libraries 3. http://doi.org/10.1186/1742-5581-3-1.[6] Franceschini, D. Maisano, L. Mastrogiacomo, A novel approach for estimating the omitted-citation rate of bibliometric databases with an application to the field of bibliometrics, Journal of the american society for information science and technology 64 (2013) 2149-2156. https://doi.org/10.1002/asi.22898.[7] Franceschini, D. Maisano, L. Mastrogiacomo, Scientific journal publishers and omitted citations in bibliometric databases: Any relationship?, Journal of Informetrics 8(3) (2014) 751 - 765. https://doi.org/10.1016/j.joi.2014.07.003.[8] Buchanan, Accuracy of cited references: The role of citation databases, College Research Libraries 67. http://doi.org/10.5860/crl.67.4.292.[9] Valderrama-Zurián, R. Aguilar-Moya, D. Melero-Fuentes, R. Aleixandre-Benavent, A systematic analysis of duplicate records in scopus, Journal of Informetrics 9 (2015) 570–576. http://doi.org/ 10.1016/j.joi.2015.05.002.[10] Zhu, G. Hu, W. Liu, Doi errors and possible solutions for web of science, Scientometrics 118(2) (2019) 709-718. http://doi.org/10.1007/s11192-018-2980-7.[11] Xu, L. Hao, X. An, D. Zhai, H. Pang, Types of doi errors of cited references in web of science with a cleaning method, Scientometrics 120(3) (2019) 1427-1437. http://doi.org/ 10.1007/s11192-019-03162-4.[12] Krauskopf, Missing documents in scopus: the case of the journal enfermeria nefrologica, Scientometrics 119(1) (2019) 543-547. https://doi.org/10.1007/ s11192-019-03040-z.[13] Liu, G. Hu, L. Tang, Missing author address information in web of science-an explorative study, Journal of Informetrics 12(3) (2018) 985-997. https://doi.org/10.1016/j.joi.2018.07.008.[14] Krauskopf, Standardization of the institutional address, Scientometrics 94(3) (2013) 1313-1315. http://doi.org/10.1007/s11192-012-0852-0.[15] Krauskopf, Call for caution in the use of bibliometric data, J. Assoc. Inf. Sci. Technol. 68(8) (2017) 2029-2032. http://doi.org/10.1002/asi.23809.[16] Awad, R. Khanna, Support Vector Machines for Classification, Apress, Berkeley, CA, 2015, pp. 39-66. http://doi:10.1007/978-1-4302-5990-9-3.[17] Breiman, Random forests, Machine Learning 45(1) (2001) 5-32. https://doi.org/10.1023/A:1010933404324.[18] Cover, P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theor. 13(1) (2006) 21-27. http://doi.org/10.1109/TIT.1967.1053964.[19] J.-C.B. Cuxac, P., Efficient supervised and semi-supervised approaches for affiliations disambiguation, Scientometrics 97(1) (2013) 47-58.
... 1 Completeness of the metadata is critical from the perspective of scientometric studies. Several papers have analysed the WoS data quality on several issues: accuracy of the document type assignment (Donner 2015(Donner , 2017, accuracy of citation (Buchanan, 2006;Franceschini et al., 2014Franceschini et al., , 2016van Eck & Waltman, 2019), completeness of the DOI information (Gorraiz et al., 2016;Xu et al., 2019), missing author address information (Liu et al., 2018). ...
Preprint
The author-affiliation links are the essential elements used for multiple purposes, such as the disambiguation of authors, the attribution of credits of a publication and fractional counting, the analysis of scientific networks, etc. In this article we analyzed the author-affiliation link quality in the Web of Science (WoS) database between 2000 and 2021. We analyzed the link completeness for 32,676,914 scientific publications under different angles: WoS index, document type and the number of authors per publication. The analysis showed that the author-affiliation link begins to be well informed from 2008. The share of publications for which all addresses and all authors are linked is close to 100% from 2016. The results show a strong variability according to the WoS index, the document type and the number of authors per publication. AHCI is the index with the highest completeness rate, unlike the SCI. For the document type, these are the Conference proceedings where the completeness rate is better and/or can be completed. Regarding the number of authors, statistics show that the higher the number, the more addresses and unlinked authors there are. Finally, the analysis of a random sample of 100 publications showed that in more than 50% of the cases, the author-address links do not exist in the original publication, and the WoS reproduced only the available information provided by the editor.
... 2 Completeness of the metadata is critical from the perspective of scientometric studies. Several papers have analysed the WoS data quality on several issues: accuracy of the document type assignment (Donner 2015(Donner , 2017, accuracy of citation (Buchanan 2006;Franceschini et al. 2014Franceschini et al. , 2016van Eck and Waltman 2019), completeness of the DOI information (Gorraiz et al. 2016;Xu et al. 2019), missing author address information (Liu et al. 2018). ...
Article
Full-text available
The author-affiliation links are the essential elements used for multiple purposes, such as the disambiguation of authors, the attribution of credits of a publication and fractional counting, the analysis of scientific networks, etc. In this article we analyzed the author-affiliation link quality in the Web of Science (WoS) database between 2000 and 2021. We analyzed the link completeness for 32,676,914 scientific publications under different angles: WoS index, document type and the number of authors per publication. The analysis showed that the author-affiliation link begins to be well informed from 2008. The share of publications for which all addresses and all authors are linked is close to 100% from 2016. The results show a strong variability according to the WoS index, the document type and the number of authors per publication. AHCI is the index with the highest completeness rate, unlike the SCI. For the document type, these are the Conference proceedings where the completeness rate is better and/or can be completed. Regarding the number of authors, statistics show that the higher the number, the more addresses and unlinked authors there are finally, the analysis of a random sample of 100 publications showed that in more than 50% of the cases, the author-address links do not exist in the original publication, and the WoS reproduced only the available information provided by the editor.
... [26] Bibliometric techniques are used to discern trends of data output and provide quantitative measures for the contribution of countries, authors, institutions, funding agencies, collaborations, and more. [23,24,27,28] Ethical approval was not applicable in this research since the study had no direct involvement of human subjects. ...
Article
Full-text available
BACKGROUND: Tuberculosis (TB) is a persistent public health issue requiring consistent global effort for its eradication and control. Research on the subject plays a vital role in combatting the disease, giving future directions, and meeting the sustainable development goals (SDGs). This study aimed to evaluate the global TB research trends and performance from 2011 to 2020. MATERIALS AND METHODS: All the data for TB-related research publications from 2011 to 2020 were extracted from the Web of Science database and a comprehensive analysis was performed on the R-bibliometrix package. RESULTS: An increasing number of publications with an annual growth rate of 6.32% and a plateau in production from 2015 to 2018 was observed. Of 145 countries, the United States of America (USA), China, India, the United Kingdom, and South Africa led and made up half of the global contribution. Out of 91,862 authors, Zhang Y was the most productive with 205 articles and Barry CE had the highest H-index of 45. Only seven of the top 20 authors were from high-burden countries. The University of Cape Town was the leading institutional affiliation, followed by Stellenbosch University and the London School of Hygiene and Tropical Medicine. The most frequent international collaboration was between the USA and South Africa, occurring on 1203 instances. Only five of the top 30 high-burden countries were present in the top 30 collaborations. PLOS ONE, disseminating 2271 articles, was the most productive out of 3500 sources. CONCLUSION: The past decade has seen a steady increase in global TB research. Prominent authors, affiliations, and countries showed collaborative trends, but publications were found to be mostly from developed, low-burden countries except China, India, and South Africa. To meet the goals set by the SDGs and the WHO End TB Strategy, high-burden countries need to explore feasible opportunities and global support to enhance their expected TB-related research contributions.
... As in-house citation indexes were developed in the 1990s, care was taken to construct them in a manner that facilitated affiliation disambiguation. The problem of missing author affiliations has been largely addressed but in 2015 the Web of Science indexes still contained sizeable quantities of publications without any affiliations whatsoever (SCIE: 7.6%, SSCI: 6%, and A&HCI: 35%) (Liu, Hu, & Tang, 2018). ...
Article
Full-text available
Research managers benchmarking universities against international peers face the problem of affiliation disambiguation. Different databases have taken separate approaches to this problem and discrepancies exist between them. Bibliometric data sources typically conduct a disambiguation process that unifies variant institutional names and those of its sub-units so that researchers can then search all records from that institution using a single unified name. This study examined affiliation discrepancies between Scopus, Web of Science, Dimensions, and Microsoft Academic for 18 Arab universities over a five-year period. We confirmed that digital object identifiers (DOIs) are suitable for extracting comparable scholarly material across databases and quantified the affiliation discrepancies between them. A substantial share of records assigned to the selected universities in any one database were not assigned to the same university in another. The share of discrepancy was higher in the larger databases, Dimensions and Microsoft Academic. The smaller, more selective databases, Scopus and especially Web of Science tended to agree to a greater degree with affiliations in the other databases. Manual examination of affiliation discrepancies showed they were caused by a mixture of missing affiliations, unification differences, and assignation of records to the wrong institution. Peer Review https://publons.com/publon/10.1162/qss_a_00175
... Apart from the abstract/author keywords/keywords plus fields, some other features of WoSCC will also influence old literature retrieval and historical bibliometric analysis. If we don't fully grasp the characteristics of author's affiliation (Jacsó 2009;Liu et al. 2018) and funding acknowledgment Furthermore, the changing coverage of regional journals especially non-English journals in WoSCC may also influence the interpretation of related studies (Liu 2017;Vera-Baceta et al. 2019). And finally, since different institutions may subscribe to customized version of WoSCC, it is also important to detail the sub-datasets and corresponding coverage timespans of the used sub-datasets, especially for old literature retrieval and historical bibliometric analysis (Calver et al. 2017;Liu 2019). ...
Preprint
Full-text available
By using publications from Web of Science Core Collection (WoSCC), Fosso Wamba and his colleagues published an interesting and comprehensive paper in Technological Forecasting and Social Change to explore the structure and dynamics of artificial intelligence (AI) scholarship. Data demonstrated in Fosso Wamba's study implied that the year 1991 seemed to be a "watershed" of AI research. This research note tried to uncover the 1991 phenomenon from the perspective of database limitation by probing the limitations of search in abstract/author keywords/keywords plus fields of WoSCC empirically. The low availability rates of abstract/author keywords/keywords plus information in WoSCC found in this study can explain the "watershed" phenomenon of AI scholarship in 1991 to a large extent. Some other caveats for the use of WoSCC in old literature retrieval and historical bibliometric analysis were also mentioned in the discussion section. This research note complements Fosso Wamba and his colleagues' study and also helps avoid improper interpretation in the use of WoSCC in old literature retrieval and historical bibliometric analysis.
... Apart from the abstract/author keywords/keywords plus fields, some other features of WoSCC will also influence old literature retrieval and historical bibliometric analysis. If we don't fully grasp the characteristics of author's affiliation (Jacsó, 2009;Liu et al., 2018) and funding acknowledgment information Paul-Hus et al., 2016;Tang et al., 2017) in WoSCC, corresponding literature retrieval and analysis may be biased. Besides, full author names were captured in WoSCC since June 2006 (Clarivate, 2018), serious name ambiguity problem exists for records published before 2006, especially for authors from East Asia (Harzing, 2015;Tang and Walsh, 2010). ...
Article
By using publications from Web of Science Core Collection (WoSCC), Fosso Wamba and his colleagues published an interesting and comprehensive paper in Technological Forecasting and Social Change to explore the structure and dynamics of artificial intelligence (AI) scholarship. Data demonstrated in Fosso Wamba's study implied that the year 1991 seemed to be a “watershed” of AI research. This research note tried to uncover the 1991 phenomenon from the perspective of database limitation by probing the limitations of search in abstract/author keywords/keywords plus fields of WoSCC empirically. The low availability rates of abstract/author keywords/keywords plus information in WoSCC found in this study can explain the “watershed” phenomenon of AI scholarship in 1991 to a large extent. The findings in this study are also applicable to other WoSCC-based studies focusing on fields from natural science, social sciences, to arts and humanities. Besides, some other caveats for the use of WoSCC in old literature retrieval and historical bibliometric analysis were also mentioned in the discussion section. This research note complements Fosso Wamba and his colleagues’ study and also helps avoid improper interpretation in the use of WoSCC in old literature retrieval and historical bibliometric analysis.
... As in-house citation indexes were developed in the 1990s, care was taken to construct them in a manner that facilitated affiliation disambiguation. The problem of missing author affiliations has been largely addressed but in 2015 the Web of Science indexes still contained sizeable quantities of publications without any affiliations whatsoever (SCIE: 7.6%, SSCI: 6%, and A&HCI: 35%) (Liu, Hu, & Tang, 2018). The owner of Web of Science tried to overcome the affiliation disambiguation problem by introducing its organisation enhanced feature. ...
Preprint
Full-text available
Research managers benchmarking universities against international peers face the problem of affiliation disambiguation. Different databases have taken separate approaches to this problem and discrepancies exist between them. Bibliometric data sources typically conduct a disambiguation process that unifies variant institutional names and those of its sub-units so that researchers can then search all records from that institution using a single unified name. This study examined affiliation discrepancies between Scopus, Web of Science, Dimensions, and Microsoft Academic for 18 Arab universities over a five-year period. We confirmed that digital object identifiers (DOIs) are suitable for extracting comparable scholarly material across databases and quantified the affiliation discrepancies between them. A substantial share of records assigned to the selected universities in any one database were not assigned to the same university in another. The share of discrepancy was higher in the larger databases, Dimensions and Microsoft Academic. The smaller, more selective databases, Scopus and especially Web of Science tended to agree to a greater degree with affiliations in the other databases. Manual examination of affiliation discrepancies showed they were caused by a mixture of missing affiliations, unification differences, and assignation of records to the wrong institution.
... For example, many of the 1,206 Polish-language articles were published from an "undefined" country. Such limitations have also been reported in WOS (Liu et al., 2018), but this seems to be much more considerable in Scopus, especially for older documents. The omission of a county is much more critical than omission of language (Jacsó, 2009). ...
Article
Full-text available
In research articles, cities usually occur as topics (e.g., subjects or actors) or places of studies (e.g., sites, destinations, locations, or spaces). Investigation of more general patterns is rare because research usually focuses on individual cities. We use science mapping, based on Scopus data and Vosviewer visualization software, to examine city-related research across journals and disciplines (subject areas), and to assess how multiple city functions are reflected in journals. Comparable European Union capital cities (Berlin, Madrid, Rome, and Warsaw) serve as models. The patterns are remarkably similar regardless of the city. National and regional journals are the most common publication venues. Research takes place within three major disciplinary clusters: 1) the social sciences, and arts and humanities, 2) medicine, and 3) natural/technical sciences (environmental, earth and planetary, agricultural, and biological sciences). Medicine shows an early prevalence, and recently the social sciences have been strongly represented in these studies. Although the relationships are based on different journals, they are comparable for all cities and can be used to assess cities of similar size. This study was conducted just before the Covid-19 pandemic, and it can serve as a reference to identify research patterns before and after because the outbreak may bring about changes in future city-related research.
... This data did not systematically exist in some publications because of their type. Indeed, several researchers have indicated that ignoring missing data in the WOS database may lead to inaccurate results (Franceschini et al., 2016;Liu et al., 2018;Zhu et al., 2019). To this end, two free software programs, Microsoft Excel and VOSviewer (version 1.6.11) ...
Article
Full-text available
Venture capital is a cross-cutting discipline and research field. This study used bibliometric indicators of publications indexed in Web of Science (WOS) Core Collection over the last two decades to provide an overview of the main characteristics of its publications. A total of 1,840 papers by 2,607 authors in 518 journals were reviewed. The publications were examined in terms of temporal trend, geographical and institutional distribution, references, authors, and citations. The results indicate that the importance of venture capital research is increasing. In terms of impact, a small group of productive countries (e.g., USA and UK) and authors (e.g., Cumming and Wright) contributed to a significant share of these publications. Whereas China is expected to attract more attention to this topic in the future. Harvard University is the most productive institution and the Journal of Business Venturing is the most active journal. By topic, publications that address the contribution of venture capital to entrepreneurship are the most cited. New areas of research have focused especially on the implication of signal theory in venture capital networks, support for decision-makers and crowdfunding as a new investment strategy. Today, the microlevel is the dominant level (compared to the macro-level) in venture capital research. The results of this study aim to contribute to supporting decision making in a venture capital research management context and to serve as a guide for future researchers or evaluators.
... Both Scopus and WoSCC advertise themselves as authoritative and high-quality data providers (Baas et al. 2020;Birkle et al. 2020). However, many previous studies including those conducted by the authors' team have expressed concerns about data quality and potential limitations of these two databases (Franceschini et al. 2016a, b;Krauskopf, 2017Krauskopf, , 2019Liu, 2020a, b;Liu et al. 2018Liu et al. , 2020Okagbue et al. 2020 The bibliographic databases may be updated frequently, so the data at different time points may vary to some extent. JBIRR: Journal Basic Information-Related Record; DOI: Digital Object Identifier; EID: document unique identifier in Scopus ...
Article
Full-text available
Scopus and Web of Science Core Collection (WoSCC) are the two most authoritative and widely-used bibliographic databases. However, for the double-indexed open access mega journal IEEE Access, we found large discrepancies regarding the numbers of published records based on journal publisher, Scopus, and WoSCC. Considering that Segal’s law will confuse many users and affect the reliability of various metrics derived from these two authoritative bibliographic databases, this study conducted a large scale and thorough comparison between the data collected from the journal publisher’s website and these two bibliographic databases. Apart from different policies towards the index of different document types, this study identified four main representative causes for the discrepancies through case study (including the serious record omission and duplicate entry problems). Possible consequences and solutions were also provided.
... With the continuous update of Web of Science Core Collection, the database provider should provide more technical details about its update for users timely. Besides, users should also be aware of the features and defects of this authoritative bibliographic database which have been widely investigated in previous studies (Birkle et al. 2020;Franceschini et al. 2016;Huang et al. 2017;Liu et al. 2018bLiu et al. , 2020Martín-Martín et al. 2018;Mongeon & Paul-Hus 2016;Tang et al. 2017). ...
Preprint
Full-text available
Web of Science Core Collection, one of the most authoritative bibliographic databases, is widely used in academia to track high-quality research. This database has begun to index online-first articles since December 2017. This new practice has introduced two different publication dates (online and final publication dates) into the database for more and more early access publications. It may confuse many users who want to search or analyze literature by using the publication-year related tools provided by Web of Science Core Collection. By developing custom retrieval strategies and checking manually, this study finds that the "year published" field in search page searches in both online and final publication date fields of indexed records. Each indexed record is allocated to only one "publication year" on the left of the search results page which will inherit first from online publication date field even when the online publication date is later than the final publication date. The "publication year" field in the results analysis page and the timespan "custom year range" field in the search page have the same function as that of the filter "publication year" in search results page. The potential impact of the availability of two different publication dates in calculating bibliometric indicators is also discussed at the end of the article.
... there is a 34% omission rate of country metadata in Scopus, 14% in Thomson-Reuters). Regarding missing address information, Liu, Hu, and Tang (2018) concluded that more than one-fifth of publications in Thomson-Reuters' Web of Science (from 1900 to 2015) are completely missing any kind of author address metadata. ...
Article
Full-text available
Purpose Our work seeks to overcome data quality issues related to incomplete author affiliation data in bibliographic records in order to support accurate and reliable measurement of international research collaboration (IRC). Design/methodology/approch We propose, implement, and evaluate a method that leverages the Web-based knowledge graph Wikidata to resolve publication affiliation data to particular countries. The method is tested with general and domain-specific data sets. Findings Our evaluation covers the magnitude of improvement, accuracy, and consistency. Results suggest the method is beneficial, reliable, and consistent, and thus a viable and improved approach to measuring IRC. Research limitations Though our evaluation suggests the method works with both general and domain-specific bibliographic data sets, it may perform differently with data sets not tested here. Further limitations stem from the use of the R programming language and R libraries for country identification as well as imbalanced data coverage and quality in Wikidata that may also change over time. Practical implications The new method helps to increase the accuracy in IRC studies and provides a basis for further development into a general tool that enriches bibliographic data using the Wikidata knowledge graph. Originality This is the first attempt to enrich bibliographic data using a peer-produced, Web-based knowledge graph like Wikidata.
... The authors' team has conducted a series of studies focusing on the features and limitations of Scopus (Huang and Liu 2019;Liu 2020) and Web of Science (Liu et al. 2018Tang et al. 2017;Zhu et al. 2019). Authoritative bibliographic databases should improve the quality of their own data, and remind users of the deficiencies and characteristics of their products. ...
Article
Full-text available
Self-citation is attracting wide attention in citation analysis research and research evaluation practice. However, the academic community’s views on self-citation are not uniform. If the number of self-citations should be calculated, it is critical to calculate it accurately and unambiguously. However, based on a case study of thirty papers published during 2014 and 2020 by the corresponding author of the study, we find that the numbers of self-citations identified through the automatic identification tools provided by Scopus and Web of Science Core Collection are confusing and inconsistent. We also put forward corresponding improvement suggestions to the stakeholders including these two authoritative bibliographic database providers at the end of this article.
... Lower presence rates also exist for many other document types. However, we should note the serious author address missing problems in Web of Science for some document types such as news item, correction, and biographical item (Liu et al. 2018). Scenario 1: all document types are taken into account Figure 1 shows the annual production volume of SCI publications of the United States and China during the past two decades when all document types are taken into account. ...
Article
Full-text available
China’s rising in scientific research output is impressive. The academic community is curious about the time when the cross-over in the number of annual scientific publication production between China and the USA can happen. By using Web of Science Core Collection’s Science Citation Index Expanded database, this study finds that China still ranks the second in the production of SCI-indexed publications in 2019 but may leapfrog the USA to be the first in 2020 or 2021, if all document types are considered. Comparatively, China has already overtaken the USA and been the largest SCI-indexed original research article producer since 2018. However, China still lags behind the USA regarding the number of review paper production. In general, quantitative advantage does not equal quality or impact advantage. We think that the USA will continue to be the global scientific leader for a long time.
... Lower presence rates also exist for many other document types. However, we should note the serious author address missing problems in Web of Science for some document types such as news item, correction, and biographical item (Liu, Hu, & Tang 2018). ...
Preprint
Full-text available
China's rising in scientific research output is impressive. The academic community is curious about the time when the cross-over in the number of annual scientific publication production between China and the USA can happen. By using Web of Science Core Collection's Science Citation Index Expanded database, this study finds that China still ranks the second in the production of SCI-indexed publications in 2019 but may leapfrog the USA to be the first in 2020 or 2021, if all document types are considered. Comparatively, China has already overtaken the USA and been the largest SCI-indexed original research article producer since 2018. However, China still lags behind the USA regarding the number of review paper production. In general, quantitative advantage does not equal quality or impact advantage. We think that the USA will continue to be the global scientific leader for a long time.
... The data quality is the basis of bibliometric data-based research evaluation. Fortunately, increasing attention from academia is paid to this point (Franceschini et al. 2016;Krauskopf 2019;Liu et al. 2018;Xu et al. 2019;Zhu et al. 2019a, b). Apart from the author's two successive studies on funding analysis in Web of Science, the characteristic, completeness, accuracy, and limitation of funding data in Web of Science have also been explored (Álvarez-Bornstein et al. 2017;Grassano et al. 2017;Morillo and Álvarez-Bornstein 2018;Paul-Hus et al. 2016). ...
Article
Full-text available
As an emerging bibliographic database, Scopus is increasingly used in academic research and evaluation practice. Compared with Web of Science, its data quality/reliability is still relatively underexplored. By using the author’s twenty-six English papers published during 2014 and 2019, this case study probes the accuracy of funding information in Scopus and shows that the accuracy of funding information collected by Web of Science is better than that of Scopus. Some obvious errors in funding acknowledgement text and funding agency fields still exist in Scopus. Therefore, Scopus needs to optimize the funding acknowledgement text identification method and improve the funding agency extraction and standardization strategy.
... Similarly, for other database-dependent metrics, this phenomenon also exists. Given the increasing use of various bibliographic databases, the features and also limitations of each database should be expressed explicitly (Falagas et al. 2008;Liu 2017;Liu et al. 2018;Tang et al. 2017;Zhu et al. 2019). We write this paper, partly in memory of Dr. Judit Bar-Ilan; while at the same time, we suggest that researchers and evaluation practitioners should pay attention to the details of data sources especially when using the WoS (Dallas et al. 2018;Liu 2019). ...
Article
Full-text available
The h-index has attracted wide attention from both scientometricians and science policy makers since it was proposed in 2005. Advocates champion h-index for its simplicity embracing both quantity and quality, while also express concern about its abuse in research evaluation practices and database-dependence attribute. We argue that it is increasingly important to calculate and interpret the h-index precisely along with the rapid evolution of bibliographic databases. In memory of Dr. Judit Bar-Ilan, we join the h-index discussion in Scientometrics by further probing a similar “which h-index” question via comparing different versions of h-index within the Web of Science. In this article we put forward the reasons of different WoS h-indices from two perspectives, which are often neglected by bibliometric studies. We suggest that users should specify the details of data sources of h-index calculation for research promotion and evaluation practices.
... Echoing the finding ofLiu et al. (2017Liu et al. ( , 2018, a small percentage of country/region information omission is also identified. This study merges England, Scotland, Wales and North Ireland into the UK. 6 Two records published in 2004 are related to Scopus, however, one of them is news item and another one is editorial material which are excluded from this study. ...
Article
Full-text available
Web of Science and Scopus are two world-leading and competing citation databases. By using the Science Citation Index Expanded and Social Sciences Citation Index, this paper conducts a comparative, dynamic, and empirical study focusing on the use of Web of Science (WoS) and Scopus in academic papers published during 2004 and 2018. This brief communication reveals that although both Web of Science and Scopus are increasingly used in academic papers, Scopus as a new-comer is really challenging the dominating role of WoS. Researchers from more and more countries/regions and knowledge domains are involved in the use of these two databases. Even though the main producers of related papers are developed economies, some developing economies such as China, Brazil and Iran also act important roles but with different patterns in the use of these two databases. Both two databases are widely used in meta-analysis related studies especially for researchers in China. Health/medical science related domains and the traditional Information Science and Library Science field stand out in the use of citation databases.
Article
Hybrid renewable energy systems (HRESs) play a key role in the decarbonization of many sectors of the economy and, thus, in achieving ambitious climate goals. Due to the complexity of the issues and the impact of many factors on the efficiency of these systems, it is necessary to ensure that they are properly designed, managed, and optimized. Many techniques and methods are used to achieve an optimal multi-source energy system. In recent years, there has been a growing interest in HRESs. Taking this into account, a comprehensive review of scientific literature was carried out, based on bibliometric analysis. Professional software was used for the research: Bibliometrix and VOSviewer. The bibliographic database was created using the international scientific platform Web of Science. The evolution of research trends and the dynamic development of research on the management and optimization of HRESs in the years 2010–2024 were presented. The results of the analysis confirmed the growing importance of integrated energy management systems and optimization strategies in the context of the global energy transformation. The analysis also indicated that, despite the growing interest in this topic, further development of advanced energy management strategies and optimization methods is necessary to effectively use renewable energy sources and enhance the stability of HRESs.
Article
University rankings released by famous ranking agencies provide valuable information for non-professionals seeking to select universities for study, employment, and research collaboration. Compared to the overall ranking of universities such as the Academic Ranking of World Universities by ShanghaiRanking, the flaws of the subject ranking of universities such as the Global Ranking of Academic Subjects by the same agency have not been sufficiently addressed. In this paper, we present evidence of the over-representation of US universities among the top 50 universities in the field of Law, as published by ShanghaiRanking. Moreover, we identify the anomalies and flaws of the top journal paper indicator used by ShanghaiRanking's Global Ranking of Academic Subjects for the field of Law, which is characterized by some distinctive research and publication practices. Through an in-depth analysis of these anomalies, flaws, and potential impacts, we put forward suggestions for improvement.
Article
Одним из основных факторов при назначении рецензента является его экспертность по теме рукописи (наличие соответствующих публикаций). Поддержка принятия решений, базирующаяся на применении интеллектуального анализа данных наукометрических баз по научным публикациям, ускоряет и делает менее трудоемким процесс оценки экспертности рецензентов. Однако критическим пунктом в данном случае является корректность данных по научным публикациям, подвергающихся интеллектуальному анализу. В настоящий момент исследователи активно занимаются вопросом определения корректности данных наукометрических баз и способам ее обеспечения, осуществляя различные процедуры очистки в рамках подготовки данных. Тем не менее, в существующих работах не учитывается специфика задачи, для решения которой собираются данные по научным публикациям. Для решения данной проблемы в статье предлагается метод подготовки данных по научным публикациям для интеллектуальной поддержки принятия решений при оценке экспертности рецензентов, учитывающий особенности, связанные с необходимостью определения семантической близости текста данных по публикациям. Метод успешно апробирован при подготовке данных по научным публикациям членов редколлегии журнала «Системная инженерия и информационные технологии» с привлечением содержимого их профилей в наукометрических базах «РИНЦ» и «Академия Google». One of the main factors in assigning a peer reviewer is his expertise on the manuscript topic (the existence of the relevant publicatios). Decision-making support, based on the usage of mining scientometric base data on scientific publications, speeds up the process of evaluating the expertise of peer reviewers and makes it less time-consuming. However, the critical point in this case is the correctness of the data on scientific publications subject to intellectual analysis. At present, researchers actively deal with the question of defining the scientometric base data correctness and means of ensuring it, conducting different procedures of cleaning within data preparation. Yet in the existing works, the specifics of the task, for which data on scientific publications are gathered, is not taken into account. To address the problem, a method of preparing data on scientific publications for intelligent decision-making support in evaluating expertise of peer reviewers, considering features associated with the need to define the semantic similarity of text of data on publications, is suggested in the paper. The method was successfully tested when preparing data on scientific publications of members of the academic journal “Systems Engineering and Information Technologies” editorial board, involving the content of their profiles in scientometric bases “RISC” and “Google Scholar”.
Article
Background Neurosurgery is a rapidly advancing surgical specialty. Social media has significantly impacted the landscape of advancements in the field of neurosurgery. Research on the subject of neurosurgery and social media plays a vital role in combating disability and mortality due to neurological diseases, especially in trauma-affected individuals by increasing cooperation and sharing of clinical experiences between neurosurgeons via social media. This study aimed to evaluate the global neurosurgery and social media research performance from 2004-2023. Materials and Methods All the data for neurosurgery and social media-related research publications from 2004 to 2023 were extracted from the Web of Science database and a comprehensive analysis was performed on the R-bibliometrix package. Results An increasing number of publications with an annual growth rate of 22.04% was observed, with >91% of total articles published in the last decade. The United States of America (USA), the United Kingdom (UK), Italy, France, Canada and India made up of more than 67% of the global contribution. Out of 1449 authors, Chaurasia B was the most productive with 14 publications and the most globally cited document was JEAN WC, 2020 with 117 citations. The University of Cambridge was the leading institutional affiliation. World Neurosurgery was the most productive with >60 articles. Conclusions Exploring neurosurgery on social media enhances global collaboration, utilizing dynamic platforms for real-time knowledge exchange and holds immense potential for the field's global advancement.
Article
Authorship is at the core of the reward system of academic research. However, over 1.4 million anonymous publications over the past hundred years uncovered in a pioneer study by Shamsi et al. (Scientometrics 127(10):5989–6009, 2022) may threaten the various authorship-based research evaluation and scholarly communication systems. In this brief communication, we continue Shamsi et al.’ exploration by focusing only on anonymous articles and reviews (so-called citable items as defined by Clarivate) which are highly valued in research evaluation and scholarly communication, to decipher the characteristics of anonymous citable items. Our data show that although the absolute number and relative proportion of anonymous citable items in Web of Science Core Collection kept decreasing in recent decades and remained at low levels in recent years, anonymous citable items in some fields, such as Law, were still non-negligible. Anonymous publishing of academic works, an old tradition from hundreds of years ago, can still be found in the field of Law in recent years, especially in the famous student-edited journal Harvard Law Review. We are not requesting journals such as Harvard Law Review to change their ancient traditions in the name of transparency and accountability, however, the unusual and persistent phenomenon of anonymous publishing of citable items and its impact on authorship-based research evaluation and scholarly communication deserves our attention more.
Conference Paper
Full-text available
This paper focuses on systematic errors and propose how these can be measured and reported when informing and offering bibliometric data for policy purposes. We analysed differences in the calculation of impact indicators, when different 23 different classification schemes derived from Clarivate's InCites suite are used, specifically on five indicators: the Category Normalized Citation Indicator, the total number of citations, the H-index, the 1% of most cited articles, and the 10% of most cited articles. Findings show that citation counts are quite stable, with a difference of 13%. However, proportional indicators, such as top 1% and top 10% most cited articles, tend to vary more between universities, although on average they are lower than in the previous case. The results of this study can be used in estimating the error level in the research assessment of countries, institutions, and individuals.
Article
Full-text available
This bibliometric study examines the characteristics of the overall research trends, patterns of productivity, and publications on “assessment in second language pronunciation”. Bibliometric data were retrieved from Web of Science (WoS on 1 September 2021 and the results of the study reveal that the first publication appeared in 1993 and, during the period of 28 years, there have been 118 publications between 1993 and 2021 in total. It was found that studies in this field have increased in recent years. The publications include articles and proceeding papers written by 2.31 authors per publication. The most cited document received 139 citations. It was also discovered that the most frequently used word is intelligibility and the trending topic is pronunciation. As for the affiliations, the most productive university is Concordia University in Canada. In the following headings, detailed information is discussed in detail. Keywords: Bibliometric analysis, biblioshiny, second language education, assessing pronunciation
Article
A recent study published in Science of the Total Environment conducted a systematic review of persistent, bioaccumulative, and toxic chemicals (PBTs) in insects using Web of Science Core Collection. Interestingly, a remarkable increase of human, animal, and vertebrate publications related to PBTs appeared in the early 1990s. Despite the authors' attempts to illustrate the anomalies from different perspectives, no rational explanation has been found yet. Quite interested in this abnormal phenomenon, we intend to join the academic discussion by pointing out some problems in the data retrieval and processing process in this review study and giving a more reasonable explanation for the surge of research publications in the early 1990s. Our new interpretations based on large-scale empirical data will help scholars make better use of this well-known and widely used database.
Article
Full-text available
V znanstvenih člankih se mesta običajno pojavljajo kot teme (predmeti ali akterji) ali kraji (območja, destinacije, lokacije in prostori) raziskav. Proučevanje splošnejših vzorcev je redkejše, saj se raziskave običajno osredotočajo na posamezna mesta. Avtorja z znanstvenim kartiranjem, ki temelji na podatkih bibliografske zbirke Scopus in programskem orodju za vizualizacijo Vosviewer, proučujeta objave, povezane z raziskavami mest, v znanstvenih revijah in na različnih tematskih področjih, da bi ugotovila, kako se različne funkcije mest odražajo v znanstvenih revijah. Za modele uporabita primerljiva glavna mesta držav članic Evropske unije (Berlin, Madrid, Rim in Varšavo). Izsledki kažejo zelo podobne vzorce pri vseh mestih, pri čemer so najpogostejše objave v nacionalnih in regionalnih revijah. Večina raziskav poteka na treh glavnih znanstvenih področjih: 1. v družboslovju in humanistiki, 2. v medicini in 3. v naravoslovju (okoljske vede, vede o Zemlji in drugih planetih ter biotehniške in biološke vede). Prvotno so prevladovale raziskave s področja medicine, v zadnjem času pa so najpogostejše družboslovne študije. Čeprav ugotovljena razmerja med znanstvenimi področji temeljijo na različnih revijah, so primerljiva za vsa mesta, na njihovi podlagi pa se lahko presojajo mesta podobne velikosti. Raziskava je bila opravljena tik pred izbruhom pandemije koronavirusne bolezni (covid-19), na podlagi njenih izsledkov pa bi lahko primerjali raziskovalne vzorce pred pandemijo in po njej, saj se bodo lahko raziskave, povezane z mesti, zaradi pandemije v prihodnosti spremenile.
Article
Author-level scientometric indicators are an important tool in individual and institutional-based research assessment and require high-quality author-publication profiles. To address this need, our study developed a robust supervised machine learning approach in combination with graph community detection methods to disambiguate author names in the Web of Science publication database. We used the unique author identifier Researcher ID to retrieve true authorship data of 1,904 scientists and trained a random forest and a logistic regression classifier on 1.2 million corresponding publication pairs with authors that share the same last name and first name initial. To do this, we reviewed a vast set of paper and author characteristics and randomly included missing data to make our machine learning robust to quality changes of new publication data. In the application on an unseen test set, we achieved F1 scores of 0.82 in the random forest and 0.75 in the logistic regression model. Subsequently, we evaluate feature performance and apply the infomap graph community detection algorithm to identify all publications belonging to an author. The community detection results in reasonable cluster metrics (Mean K-Metric in logistic regression-based model = 0.78 and = 0.81 in random forest-based model). Finally, we test our algorithm on a large surname-initial block (“Muller, M.”) and demonstrate speed and predictive performance.
Article
Full-text available
Research collaborations, especially long-distance and international collaborations, have become increasingly prevalent worldwide. Recent studies highlighted the significant role of research leadership in collaborations. However, existing measures of the research leadership do not take into account the intensity of leadership in the co-authorship network. More importantly, the spatial features, which influence the collaboration patterns and research outcomes, have not been incorporated in measuring the research leadership. To fill the gap, we construct an institution-level weighted co-authorship network that integrates two types of weight on the edges: the intensity of collaborations and the spatial score (the geographical distance adjusted by the cross-linguistic-border nature). Based on this network, we propose a novel metric, namely the spatial research leadership rank, to identify the leading institutions while considering both the collaboration intensity and the spatial features. The leadership of an institution is measured by the following three criteria: (a) the institution frequently plays the corresponding rule in papers with other institutions; (b) the institution frequently plays the corresponding rule in longer distance and even cross-linguistic-border collaborations; (c) the participating institutions led by the institution have high leadership status themselves. Harnessing a dataset of 323,146 journal publications in pharmaceutical sciences during 2010-2018, we perform a comprehensive analysis of the geographical distribution and dynamic patterns of research leadership flows at the institution level. The results demonstrate that the SpatialLeaderRank outperforms baseline metrics in predicting the scholarly impact of institutions. And the result remains robust in the field of Information Science and Library Science. Supplementary information: The online version contains supplementary material available at 10.1007/s11192-021-03943-w.
Article
Bibliometric analysis is effective for evaluating the merits of a given discipline. This study provides an analysis of collaboration evolution in analytic hierarchy process (AHP) research from 1982 to 2018. As an important developed approach of AHP, analytic network process (ANP) is also considered in this review. 9859 publications are harvested from Web of Science to conduct this bibliometric analysis. Country and institution are the two primary objectives to investigate the collaboration pattern of the 9859 publications. The most prolific countries and institutions are identified based on bibliometric indicators, and the collaboration relationships between connected countries or institutions are explored based on science mapping techniques. Further, a dynamic analysis is provided to investigate the collaboration evolution of AHP publications at the levels of country and institution. This study offers a new topic on the overview research of AHP publications, and could help in developing the collaboration evolution analysis in the AHP field.
Article
Full-text available
Web of Science Core Collection, one of the most authoritative bibliographic databases, is widely used in academia to track high-quality research. This database has begun to index online-first articles since December 2017. This new practice has introduced two different publication dates (online and final publication dates) into the database for more and more early access publications. It may confuse many users who want to search or analyze literature by using the publication-year related tools provided by Web of Science Core Collection. By developing custom retrieval strategies and checking manually, this study finds that the “year published” field in search page searches in both online and final publication date fields of indexed records. Each indexed record is allocated to only one “publication year” on the left of the search results page which will inherit first from online publication date field even when the online publication date is later than the final publication date. The “publication year” field in the results analysis page and the timespan “custom year range” field in the search page have the same function as that of the filter “publication year” in search results page. The potential impact of the availability of two different publication dates in calculating bibliometric indicators is also discussed at the end of the article.
Article
Full-text available
Purpose: The purpose of this study was to identify organizations and countries with the highest number of retracted productions, as well as to determine the upward trend or downside of the production of this type of works globally and to compare these organizations in Iran and the world in terms of the number of retracted productions, as well as the pattern of collaboration among the organizations and countries that have been published with the most retracted articles, have been reviewed. Methodology: This research was carried out with a scientometrics approach and data collection from the Web of Science database. Excel, Hist cite, Vos viewer, and NodexL software were used to analyze the data. Findings: The results of this study showed that the amount of production of retracted products in recent years has been increasing and Iran has not good condition due to the number of scientific products discredited (7th rank of the world). Also, some organizations such as Islamic Azad University are ranked first in terms of this type of work. Although Harvard is ranked second in terms of the total number of articles and the total number of retracted articles, it is ranked 8th among the top organizations in terms of the proportion of total retracted articles to all papers and among organizations with the most significant number of retracted articles, it's rank is 10th. Conclustion: In order to measure and compare organizations, only the calculation of the number of articles and the number of scientific productions is not considered an important indicator, but the calculation and comparison of the ratio of papers and retracted papers can also change their position relative to each other. Considering the relatively unfavorable situation of Iran in terms of the number of retraction, it is recommended that researchers be familiarized with exemptions from the validity of research works and that the responsible units such as the research deputy of the organizations have penalties for eliminating the credit quality of defective and of poor quality so that the name of the country as the highest ranked country does not count as the number of denied credits.
Article
Bibliometrics refers to the statistical analysis of publications, which mainly include journal papers, books, and conference proceedings. It is an effective method for organizing and analyzing available information on a given research topic and has been commonly used in various disciplines. This paper provides a comprehensive overview of all publications about the researches on energy efficiency based on data envelopment analysis (DEA) retrieved from the Web of Science database. A total of 1206 documents in this field, published until 2018, are retrieved from the Web of Science. This study pays special attention to several key issues such as the general citation structure, the most cited publications, the productive journals, institutions and countries/ territories in the area. The cooperation model and cooperative network between countries and research institutes are presented. The key nodes documents in this field are analyzed through the study of literature co-citation. The evolution of research hot spots is explored by analyzing the keywords based on text mining techniques. Three different knowledge diffusion paths such as forward local main path, global main path and key-route main path are presented to identify the knowledge diffusion path of this field. The main advantage of this study is it provides a general picture of this domain. The achievements of this study will undoubtedly be valuable for future research in energy efficiency and will have great reference value to other disciplines.
Article
Full-text available
In tandem with the rapid globalisation of science, spatial scientometrics has become an important research sub-field in scientometric studies. Recently, numerous spatial scientometric contributions have focused on the examination of cities' scientific output by using various scientometric indicators. In this paper, I analyse cities' scientific output worldwide in terms of the number of journal articles indexed by the Scopus database, in the period from 1986 to 2015. Furthermore, I examine which countries are the most important collaborators of cities. Finally, I identify the most productive disciplines in each city. I use GPS Visualizer to illustrate the scientometric data of nearly 2200 cities on maps. Results show that cities with the highest scientific output are mostly located in developed countries and China. Between 1986 and 2015, the greatest number of scientific articles were created in Beijing. The international hegemony of the United States in science has been described by many studies, and is also reinforced by the fact that the United States is the most important collaborator to more than 75% of all cities. Medicine is the most productive discipline in two-thirds of cities. Furthermore, cities having the highest scientific output in specific disciplines show well-defined geographical patterns.
Article
Full-text available
Collaborations between China and the European Union (EU) member states involve not only connections between China and individual countries, but also interactions between the different EU member states, the latter of which is due also to the influence exerted by the EU’s integration strategy. The complex linkages between China and the EU28, as well as among the 28 EU member states, are of great importance for studying knowledge flows. Using co-authorship analysis, this study explores the changes of the network structure between 2000 and 2014. Our results show that EU member states with middle- or low- scientific capacities, in particular those who joined the EU after 2000, have been actively reshaping the network of scientific collaborations with China. The linkages between middle- and low- scientific capacity countries have been tremendously strengthened in the later years. The network positional advantage (measured by the degree of betweenness centrality) has shifted from a few dominant nations to a wider range of countries. We also find that countries like Belgium, Sweden and Denmark are in important positions connecting the relatively low-capacity ‘new’ EU member states with China. The ‘new’ EU member states—that have relatively low scientific capacity—intend to cooperate with China jointly with ‘old’ EU member(s).
Article
Full-text available
Non-English languages are widely used, but their roles in scholarly communication are relatively under-explored. By using Web of Science's Science Citation Index Expanded (SCIE, 1900–2015), Social Sciences Citation Index (SSCI, 1900–2015), and Arts and Humanities Citation Index (A&HCI, 1975–2015), this study probes the patterns and dynamics of non-English papers by year, citation index, and discipline using bibliometric analysis. The analyses show that English is increasingly being used as the dominating language from natural sciences and social sciences to arts and humanities. Around 97% of the papers in SCIE, 95% of the papers in SSCI, and 73% of the papers in A&HCI during the past decade were in English. However, other languages such as German and French were also used as important academic languages in sciences and social sciences during the first half of the 20th century, 1970s, and 1980s. Unlike natural science and social science disciplines, non-English papers have consistently played important role in arts and humanities disciplines from the beginning of 1975. Although the shares of non-English papers in SCIE and SSCI databases have been limited during the past decade, a large number of non-English papers can be found in some applied disciplines of sciences and social sciences.
Article
Full-text available
Book reviews play important roles in scholarly communication especially in arts and humanities disciplines. By using Web of Science’s Science Citation Index Expanded, Social Sciences Citation Index, and Arts & Humanities Citation Index, this study probed the patterns and dynamics of book reviews within these three indexes empirically during the past decade (2006–2015). We found that the absolute numbers of book reviews among all the three indexes were relatively stable but the relative shares were decreasing. Book reviews were very common in arts and humanities, common in social sciences, but rare in natural sciences. Book reviews are mainly contributed by authors from developed economies such as the USA and the UK. Oppositely, scholars from China and Japan are unlikely to contribute to book reviews.
Article
Full-text available
The analysis of bibliometric networks, such as co-authorship, bibliographic coupling, and co-citation networks, has received a considerable amount of attention. Much less attention has been paid to the construction of these networks. We point out that different approaches can be taken to construct a bibliometric network. Normally the full counting approach is used, but we propose an alternative fractional counting approach. The basic idea of the fractional counting approach is that each action, such as co-authoring or citing a publication, should have equal weight, regardless of for instance the number of authors, citations, or references of a publication. We present two empirical analyses in which the full and fractional counting approaches yield very different results. These analyses deal with co-authorship networks of universities and bibliographic coupling networks of journals. Based on theoretical considerations and on the empirical analyses, we conclude that for many purposes the fractional counting approach is preferable over the full counting one.
Article
Full-text available
Thomson Reuters' Web of Science (WoS) began systematically collecting acknowledgment information in August 2008. Since then, bibliometric analysis of funding acknowledgment (FA) has been growing and has aroused intense interest and attention from both academia and policy makers. Examining the distribution of FA by citation index database, by language, and by acknowledgment type, we noted coverage limitations and potential biases in each analysis. We argue that in spite of its great value, bibliometric analysis of FA should be used with caution.
Article
Full-text available
The state is still the significant unit for innovative studies during the age of R&D globalization and innovation regionalization. Using the bibliometric method, this paper attempts to provide a comprehensive picture of national innovation studies based on data derived from the Web of Knowledge. In particular, we identify the most significant countries and institutions, major journals, seminal contributions and contributors, and clusters in the network of citations in the field of national innovation studies. The results are useful for understanding and promoting the field of national innovation.
Article
Full-text available
In 2014 Thomson Reuters (TR, provider of the Web of Science, WoS) published a list of highly-cited researchers worldwide. This includes those scientists who have published the most papers in their discipline which belong to the 1 % of the most-cited papers. Bornmann and Bauer (J Assoc Inf Sci Technol, in press) have presented a first evaluation in which the scientists are evaluated on the basis of their affiliations. In this short communication we would like to indicate how the TR data can be used to perform a meaningful country-specific evaluation. Germany serves as the example for the analysis.
Article
Full-text available
This paper examined the coauthorship patterns of China’s humanities and social sciences (HSS), based on articles and reviews covered by the Social Science Citation Index and the Arts and Humanities Citation Index of the Web of Science. We defined four types of coauthorship as: no collaboration (NOC), national collaboration (NAC), bilateral international collaboration (BIC) and multilateral international collaboration (MIC), and proposed the development phases of China’s HSS as: 1978-1991, 1992-2000 and 2001-present. Accordingly, we explored the evolution of coauthorship patterns by a number of metrics. Findings include: (1) the coauthorship patterns of China’s HSS significantly evolved from NOC to NAC, BIC and MIC; (2) China’s major collaborators had not significantly varied over the past decade, in which USA had always taken the lead (among every four HSS articles of China, one was collaborated with USA); (3) pic (percentage of internationally coauthored articles) was negatively correlated to pnc (percentage of not cited articles); (4) MIC is 1.5 times the CPP (citation per publication) of BIC, 3 times of NAC and 4 times of NOC. Chinese government has been eagerly promoting economic development through science and technology. However, after over 30 years’ growth miracle, Chinese government realized that China’s HSS had been overshadowed, and then initiated prosperity plannings.
Article
Full-text available
There is increasing evidence that citations to Chinese research publications are rising sharply. A series of reasons have been highlighted in previous studies. This research explores another possibility—whether there is a “clubbing” effect in China's surge in research citations, in which a higher rate of internal citing takes place among influential Chinese researchers. Focusing on the most highly cited research articles in nanotechnology, we find that a larger proportion of Chinese nanotechnology research citations are localized within individual, institutional, and national networks within China. Both descriptive and statistical tests suggest that highly cited Chinese papers are more likely than similar U.S. papers to receive internal and localized citations. Tentative explanations and policy implications are discussed.
Article
Full-text available
We evaluated earthquake research performance based on a bibliometric analysis of 84,051 documents published in journals and other outlets contained in the Scientific Citation Index (SCI) and Social Science Citation Index (SSCI) bibliographic databases for the period of 1900–2010. We summarized significant publication indicators in earthquake research, evaluated national and institutional research performance, and presented earthquake research development from a supplementary perspective. Research output descriptors suggested a solid development in earthquake research, in terms of increasing scientific production and research collaboration. We identified leading authors, institutions, and nations in earthquake research, and there was an uneven distribution of publications at authorial, institutional, and national levels. The most commonly used keywords appeared in the articles were evolution, California, deformation, model, inversion, seismicity, tectonics, crustal structure, fault, zone, lithosphere, and attenuation.
Article
Full-text available
It has already been pointed out that the foreign language barrier is probably the greatest impediment to the free flow and transfer of information. This barrier is even growing as scientists of more and more countries publish in their own languages. Almost all studies addressing the language barrier problem were conducted from an Anglo-Saxon perspective, limiting their scope to English-language sources or English speakers. Little research has been devoted to studying and measuring language preference among non-English-speaking scholars. This article reviews measures proposed in former studies such as the “relative own-language preference” indicator, and the “straight odds ratio”, pointing out their advantages and drawbacks. Two new refined measures (in both “raw” and normalised versions) are offered, claiming to be free of these drawbacks, and thus enabling a better and more reliable comparison between journals of different languages. Practical use of the proposed measures is illustrated by applying them to findings of a former language-citation study done on nine sociology journals.
Article
Full-text available
Authorship identity has long been an Achilles’ heel in bibliometric analyses at the individual level. This problem appears in studies of scientists’ productivity, inventor mobility and scientific collaboration. Using the concepts of cognitive maps from psychology and approximate structural equivalence from network analysis, we develop a novel algorithm for name disambiguation based on knowledge homogeneity scores. We test it on two cases, and the results show that this approach outperforms other common authorship identification methods with the ASE method providing a relatively simple algorithm that yields higher levels of accuracy with reasonable time demands.
Article
Full-text available
Based on data from the Science Citation Index Expanded (SCIE) and using scientometric methods, we conducted a systematic analysis of Chinese regional contributions and international collaboration in terms of scientific publications, publication activity, and citation impact. We found that regional contributions are highly skewed. The top positions measured by number of publications or citations, share of publications or citations are taken by almost the same set of regions. But this is not the case when indicators for relative citation impact are used. Comparison between regional scientific output and R&D expenditure shows that Spearman’s rank correlation coefficient between the two indicators is rather low among the leading publication regions.
Article
Full-text available
There is increasing interest in assessing how sponsored research funding influences the development and trajectory of science and technology. Traditionally, linkages between research funding and subsequent results are hard to track, often requiring access to separate funding or performance reports released by researchers or sponsors. Tracing research sponsorship and output linkages is even more challenging when researchers receive multiple funding awards and collaborate with a variety of differentially-sponsored research colleagues. This article presents a novel bibliometric approach to undertaking funding acknowledgement analysis which links research outputs with their funding sources. Using this approach in the context of nanotechnology research, the article probes the funding patterns of leading countries and agencies including patterns of cross-border research sponsorship. We identify more than 91,500 nanotechnology articles published worldwide during a 12-month period in 2008–2009. About 67% of these publications include funding acknowledgements information. We compare articles reporting funding with those that do not (for reasons that may include reliance on internal core-funding rather than external awards as well as omissions in reporting). While we find some country and field differences, we judge that the level of reporting of funding sources is sufficiently high to provide a basis for analysis. The funding acknowledgement data is used to compare nanotechnology funding policies and programs in selected countries and to examine their impacts on scientific output. We also examine the internationalization of research funding through the interplay of various funding sources at national and organizational levels. We find that while most nanotechnology funding is nationally-oriented, internationalization and knowledge exchange does occur as researchers collaborate across borders. Our method offers a new approach not only in identifying the funding sources of publications but also in feasibly undertaking large-scale analyses across scientific fields, institutions and countries.
Article
Full-text available
A method has been devised for the electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets. The method results in quantitative transfer of ribosomal proteins from gels containing urea. For sodium dodecyl sulfate gels, the original band pattern was obtained with no loss of resolution, but the transfer was not quantitative. The method allows detection of proteins by autoradiography and is simpler than conventional procedures. The immobilized proteins were detectable by immunological procedures. All additional binding capacity on the nitrocellulose was blocked with excess protein; then a specific antibody was bound and, finally, a second antibody directed against the first antibody. The second antibody was either radioactively labeled or conjugated to fluorescein or to peroxidase. The specific protein was then detected by either autoradiography, under UV light, or by the peroxidase reaction product, respectively. In the latter case, as little as 100 pg of protein was clearly detectable. It is anticipated that the procedure will be applicable to analysis of a wide variety of proteins with specific reactions or ligands.
Article
In tandem with the rapid globalisation of science, spatial scientometrics has become an important research sub-field in scientometric studies. Recently, numerous spatial scientometric contributions have focused on the examination of cities’ scientific output by using various scientometric indicators. In this paper, I analyse cities’ scientific output worldwide in terms of the number of journal articles indexed by the Scopus database, in the period from 1986 to 2015. Furthermore, I examine which countries are the most important collaborators of cities. Finally, I identify the most productive disciplines in each city. I use GPS Visualizer to illustrate the scientometric data of nearly 2,200 cities on maps. Results show that cities with the highest scientific output are mostly located in developed countries and China. Between 1986 and 2015, the greatest number of scientific articles were created in Beijing. The international hegemony of the United States in science has been described by many studies, and is also reinforced by the fact that the United States is the most important collaborator to more than 75 percent of all cities. Medicine is the most productive discipline in two-thirds of cities. Furthermore, cities having the highest scientific output in specific disciplines show well-defined geographical patterns.
Article
In the last decade, a growing number of studies focused on the qualitative/quantitative analysis of bibliometric-database errors. Most of these studies relied on the identification and (manual) examination of relatively limited samples of errors. Using an automated procedure, we collected a large corpus of more than 10,000 errors in the two multidisciplinary databases Scopus and Web of Science (WoS), mainly including articles in the Engineering-Manufacturing field. Based on the manual examination of a portion (of about 10%) of these errors, this paper provides a preliminary analysis and classification, identifying similarities and differences between Scopus and WoS. The analysis reveals interesting results, such as: (i) although Scopus seems more accurate than WoS, it tends to forget to index more papers, causing the loss of the relevant citations given/obtained, (ii) both databases have relatively serious problems in managing the so-called Online-First articles, and (iii) lack of correlation between databases, regarding the distribution of the errors in several error categories. The description is supported by practical examples concerning a variety of errors in the Scopus and WoS databases.
Article
Model uncertainty is pervasive in social science. A key question is how robust empirical results are to sensible changes in model specification. We present a new approach and applied statistical software for computational multimodel analysis. Our approach proceeds in two steps: First, we estimate the modeling distribution of estimates across all combinations of possible controls as well as specified functional form issues, variable definitions, standard error calculations, and estimation commands. This allows analysts to present their core, preferred estimate in the context of a distribution of plausible estimates. Second, we develop a model influence analysis showing how each model ingredient affects the coefficient of interest. This shows which model assumptions, if any, are critical to obtaining an empirical result. We demonstrate the architecture and interpretation of multimodel analysis using data on the union wage premium, gender dynamics in mortgage lending, and tax flight migration among U.S. states. These illustrate how initial results can be strongly robust to alternative model specifications or remarkably dependent on a knife-edge specification.
Article
As one of the most important tool for information fusion, aggregation operator has successful application in decision making, combination forecasting, military operations research and so on. Therefore, the focus of this paper is to present in a scientometrics review on the development of aggregation operator. The records adopted in this paper were downloaded from Web of Science. The useful information visualization software called CiteSpace II was utilized to analysis and visualizes the development of the discipline of aggregation operator. According to the results of this study, the main research clusters of this area and their corresponding key elements can be revealed. The close relationship between the different clusters, main journals, and important authors can be found out and shown in a visualization and quantitative way. The research of this paper will become a significant reference source for theoretical researchers and practitioners working in the area of information fusion, decision making and operations research.
Article
The aim of this study is to examine how scientific collaborative features influence scientific collaboration networks and then affect scientific output. In order to explore the influence of scientific collaboration, we define three collaborative features: inertia, diversity and strength. The data are collected from Scopus and the Web of Science databases. Using technique for order preference by similarity to an ideal solution method, we firstly combine h-index, impact factor and SCImago journal rank to rank journals in the field of wind power. Then we construct the collaboration network of institutions and use structural equation model-partial least square to examine the relationship among collaborative features, network structure, and scientific output. The results show that collaborative diversity and strength have positive effects on scientific output, while collaborative inertia has a negative effect. Both of centrality and structural holes fully account for (mediate) the relationships between collaborative features and outputs. The findings have some important policy implications to scientific collaboration: (1) research institutions should actively participate in diverse collaborations; (2) rather than only collaborating with previous partners, they should seek more new partners; and (3) collaborative features are important antecedents of scientific networks.
Article
The examination of three samples of geological scientific publications: (A) 9 journals from Western Europe and USA; (B) 10 up-to-date review books, and (C) 3 sections of Volume 127 (1990–1991) of theZoological Record, shows that the statement that English is now the lingua franca in geological sciences is only in part true, but reflects a desire by many people in the scientific community, a desire which may not yet have been fulfilled
Article
We performed a bibliometric analysis of published biodiversity research for the period of 1900–2009, based on the Science Citation Index (SCI) database. Our analysis reveals the authorial, institutional, spatiotemporal, and categorical patterns in biodiversity research and provides an alternative demonstration of research advancements, which may serve as a potential guide for future research. The growth of article outputs has exploded since the 1990s, along with an increasing collaboration index, references, and citations. Ecology, environmental sciences, biodiversity conservations, and plant science were most frequently used subject categories in biodiversity studies, and Biological Conservation, Journal of Soil and Water Conservation, Conservation Biology and Biodiversity and Conservation were most active journals in this field. The United States was the largest contributor in global biodiversity research, as the U.S. produced the most single-country and collaborative articles, had the greatest number of top research institutions, and had a central position in collaboration networks. We perceived an increasing number of both internationally collaborative and inter-institutionally collaborative articles, with the latter form of collaboration being more prevalent than the former. A keyword analysis found several interesting terminology preferences, confirmed conservation’s central position as a topic in biodiversity research, revealed the adoption of advanced technologies, and demonstrated keen interest in both the patterns and underlying processes of ecosystems. Our study reveals patterns in scientific outputs and academic collaborations and serves as an alternative and innovative way of revealing global research trends in biodiversity. KeywordsBibliometrics–Biodiversity–Conservation–Research trends–Scientific outputs
Article
Classifying journals or publications into research areas is an essential element of many bibliometric analyses. Classification usually takes place at the level of journals, where the Web of Science subject categories are the most popular classification system. However, journal-level classification systems have two important limitations: They offer only a limited amount of detail, and they have difficulties with multidisciplinary journals. To avoid these limitations, we introduce a new methodology for constructing classification systems at the level of individual publications. In the proposed methodology, publications are clustered into research areas based on citation relations. The methodology is able to deal with very large numbers of publications. We present an application in which a classification system is produced that includes almost ten million publications. Based on an extensive analysis of this classification system, we discuss the strengths and the limitations of the proposed methodology. Important strengths are the transparency and relative simplicity of the methodology and its fairly modest computing and memory requirements. The main limitation of the methodology is its exclusive reliance on direct citation relations between publications. The accuracy of the methodology can probably be increased by also taking into account other types of relations, for instance based on bibliographic coupling.
Article
This paper deals with the specific features of historical papers relevant for information retrieval and bibliometrics. The analysis is based mainly on the citation indexes accessible under the Web of Science (WoS) but also on field-specific databases: the Chemical Abstracts Service (CAS) literature database and the INSPEC database. First, the journal coverage of the WoS (in particular of the WoS Century of Science archive), the limitations of specific search fields as well as several database errors are discussed. Then, the problem of misspelled citations and their “mutations” is demonstrated by a few typical examples. Complex author names, complicated journal names, and other sources of errors that result from prior citation practice are further issues. Finally, some basic phenomena limiting the meaning of citation counts of historical papers are presented and explained.
Article
Nanoscience and technology (NST) is a young scientific and technological field that has generated great worldwide interest in the past two decades. Previous bibliometric analyses have unmistakably demonstrated the remarkable growth of the global NST literature. While almost all published research articles in NST are in English, increasingly a larger share of NST publications is published in the Chinese language. Perplexingly, Chinese is the only language — apart from English — that displays an ascendant trend in the NST literature. In this brief note, we explore and evaluate three arguments that could explain this phenomenon: coverage bias, language preference, and community formation.
Article
Purpose – The purpose of the paper is to explore the extent of the absence of data elements that are critical from the perspective of scientometric evaluation of the scientific productivity and impact of countries in terms of the most common indicators – such as the number of publications, the number of citations and the impact factor (the ratio of citations received to papers published), and the effect these may have on the h‐index of countries – in two of the most widely used citation‐enhanced databases. Design/methodology/approach – The author uses the Scopus database and Thomson‐Reuters' (earlier known as ISI) three citation databases (Science, Social Sciences and Arts & Humanities), both as implemented on the Dialog Information Services (Thomson ISI databases) and on the Web of Knowledge platform, known as Web of Science (WoS). The databases were searched to discover how many records they have for each year, how many of those have cited references for each year, and what percentage of the records have other essential or often used data elements for bibliometric/scientometric evaluation. Findings – There is no difference between the databases in the presence of publication year data – all of them include this element for all the records. The presence of the language field is comparable between the Thomson and Scopus databases, but it should be noted that a 2 per cent difference for mega‐databases of such size is not entirely negligible. The rate of presence of the subject category field is better in Scopus, even though it has far fewer subject categories (27) than the Thomson databases (well over 200). The rate of absence of country identification is the most critical and disappointing. It is caused primarily by the fact that journals have not had consistent policies for including the country affiliation of the authors. The huge 34 percent omission rate of country identification in Scopus also hurts its impressive author identification feature. Unfortunately, the country information is not available in more than 12 million records. Originality/value – Irrespective of the reasons for the very high rate of omission of country names or codes, it should be realised and prominently mentioned in any scientometric country reports. The author has never seen this mentioned in published papers, nor in the manuscripts that he has peer reviewed. Many can live with the low omission rates of the language, document type and subject category elements, and many can just avoid using these filters. The two factors that define the level of distortion in the assessment and ranking of the research achievements of countries are the rate of cited reference enhanced records and the rate of presence of country affiliation data.
Article
The ongoing globalisation of science has undisputedly a major impact on how and where scientific research is being conducted nowadays. Yet, the big picture remains blurred. It is largely unknown where this process is heading, and at which rate. Which countries are leading or lagging? Many of its key features are difficult if not impossible to capture in measurements and comparative statistics. Our empirical study measures the extent and growth of scientific globalisation in terms of physical distances between co-authoring researchers. Our analysis, drawing on 21 million research publications across all countries and fields of science, reveals that contemporary science has globalised at a fairly steady rate during recent decades. The average collaboration distance per publication has increased from 334 kilometres in 1980 to 1553 in 2009. Despite significant differences in globalisation rates across countries and fields of science, we observe a pervasive process in motion, moving towards a truly interconnected global science system.
Web of science backfiles
  • Thomson Reuters
Thomson Reuters. (2009). Web of science backfiles. (Accessed 4 October 2017).