Article

Institutional addresses in the Web of Science: the effects on scientific evaluation

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The effectiveness of the analytical tools used for the evaluation of scientific activity has been enhanced by the availability of bibliographic databases, in particular the standard-setting Institute for Scientific Information databases, whose operating rules are widely accepted by the scientific community. One of these rules is the availability in a single field of the institutional affiliations of all the authors of a paper. In practice this rule has been replaced by another, resulting from the inclusion of a new option, whereby records can be retrieved by the author’s reprint address (Reprint Address field). The outcome is diversity in the information on affiliation that may generate some degree of uncertainty in connection with institutional attribution when discrepancies arise between the information contained in the two fields, mainly when the only option available is the reprint address. The present study found a high degree of uncertainty, however, essentially for the period prior to Web of Science, in particular for scientific evaluation in peripheral countries such as Spain.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... There are two reasons for the absence of affiliation information: the articles themselves do not report it (about 60% of all cases examined by Liu et al., 2018) or they are not indexed in the WoS (40%). For articles from Spain and SCIE, García-Zorita et al. (2006) describe that 65% of the older literature (from 1985 to 1997) do not have a research address, while this is the case for 99.8% of papers from 1998 to 2004. Older studies identified problems with searching for institutions in bibliographic information services, including translation of foreign names (Stefaniak, 1987) and name changes when an institution merges with another or splits into new entities (Hoed & Wilson, 2003). ...
Article
Full-text available
Describing, analyzing, and evaluating research institutions are among the main tasks of scientometrics and research evaluation. But how can we optimally search for an institution's research output? Possible search arguments include institution names, affiliations, addresses, and affiliated authors' names. Prerequisites of these search tasks are complete lists (or at least good approximations) of the institutions' publications, and - in later steps - their citations, and topics. When searching for the publications of research institutions in an information service, there are two options, namely (1) searching directly for the name of the institution and (2) searching for all authors affiliated with the institution in a defined time interval. Which strategy is more effective? More specifically, do informetric indicators such as recall and precision, search recall and search precision, and relative visibility change depending on the search strategy? What are the reasons for differences? To illustrate our approach, we conducted an illustrative study on two information science institutions and identified all staff members. The search was performed using the Web of Science Core Collection (WoS CC). As a performance indicator, applying fractional counting and considering co-affiliations of authors, we used the institution's relative visibility in an information service. We also calculated two variants of recall and precision at the institution level, namely search recall and search precision as informetric measures of performance differences between different search strategies (here: author search versus institution search) on the same information service (here: WoS CC) and recall and precision in relation to the complete set of an institution's publications. For all our calculations, there is a clear result: Searches for affiliated authors outperform searches for institutions in WoS. However, especially for large institutions it is difficult to determine all the staff members in the time interval of research. Additionally, information services (including WoS) are incomplete and there are variants for the names of institutions in the services. Therefore, searching for institutions and the publication-based quantitative evaluation of institutions are very critical issues.
... This has further intensified the development of bibliometrics, scientometrics and informetrics in recent years (Hood and Wilson 2003). Such databases are playing an increasing role in research policies in many countries, where they are used as tools to evaluate the research conducted by authors and institutions for different types of analysis, such as scientific productivity at an institutional or national level (García-Zorita et al. 2006). The evolution of the digital age has accelerated the development of numerous databases, such as PubMed, Scopus, Web of Science (WoS), and Google Scholar. ...
Article
Full-text available
With rapid advances and diversifications in new fields of science and technology, new journals are emerging as a location for the exchange of research methods and findings in these burgeoning communities. These new journals are large in number and, in their early years, it is unclear how central these journals will be in the fields of science and technology. On one hand, these new journals offer valuable data sources for bibliometric scholars to understand and analyze emerging fields; on the other hand, how to identify important peer-reviewed journals remains a challenge—and one that is essential for funders, key opinion leaders, and evaluators to overcome. To fulfill growing demand, the Web of Science platform, as the world’s most trusted research publication and citation index, launched the Emerging Sources Citation Index (ESCI) in November 2015 to extend the universe of journals already included in the Science Citation Index Expanded, the Social Sciences Citation Index, and the Arts & Humanities Citation Index. This paper profiles ESCI, drawing some comparisons against these three established indexes in terms of two questions: (1) Does ESCI cover more regional journals of significant importance and provide a more balanced distribution of journals? (2) Does ESCI offer earlier visibility of emerging fields and trends through upgraded science overlay maps? The results show that the ESCI has a positive effect on research assessment and it accelerates communication in the scientific community. However, ESCI brings little impact to promoting the inferior role of non-English countries and regions. In addition, medical science, education research, social sciences, and humanities are emerging fields in recent research, reflected by the lower proportion of traditional fundamental disciplines and applied science journals included in ESCI. Furthermore, balancing the selection of journals across different research domains to facilitate cross-disciplinary research still needs further effort.
... It is for this reason that the high or low accuracy with which institutions are normalized can have a major impact in the visibility of institutions in this particular product, and therefore, in the potential institution-level studies ‒ with evaluative purposes or otherwise ‒ that may be carried out using these data, as it happens with other bibliographic products (García-Zorita et al, 2006;Venets, 2014). The identification and unification of institutions is a difficult and intricate task, full of complex and unexpected cases, mainly due to the wide variety of forms researchers use to enter their institutional affiliations (uncommon variations, spelling errors or even complete absence of information). ...
Article
Full-text available
Purpose Google Scholar Citations (GSC) provides an institutional affiliation link which groups together authors who belong to the same institution. The purpose of this work is to ascertain whether this feature is able to identify and normalize all the institutions entered by the authors, and whether it is able to assign all researchers to their own institution correctly. Design/methodology/approach Systematic queries to Google Scholar Citations’ internal search box were performed under two different forms (institution name and institutional email web domain) in September 2015. The whole Spanish academic system (82 institutions) was used as a test. Additionally, specific searches to companies (Google) and world-class universities were performed to identify and classify potential errors in the functioning of the feature. Findings Although the affiliation tool works well for most institutions, it is unable to detect all existing institutions in the database, and it is not always able to create a unique standardized entry for each institution. Additionally, it also fails to group all the authors who belong to the same institution. A wide variety of errors have been identified and classified. Research limitations/implications Even though the analyzed sample is good enough to empirically answer the research questions initially proposed, a more comprehensive study should be performed to calibrate the real volume of the errors. Practical implications The discovered affiliation link errors prevent institutions from being able to access the profiles of all their respective authors using the institutions lists offered by Google Scholar Citations. Additionally, it introduces a shortcoming in the navigation features of Google Scholar which may impair web user experience. Originality/value This work proves inconsistencies in the affiliation feature provided by Google Scholar Citations. A whole national university system is systematically analyzed and several queries have been used to reveal errors in its functioning. The completeness of the errors identified and the empirical data examined are the most exhaustive to date regarding this topic. Lastly, some recommendations about how to correctly fill in the affiliation data (both for authors and institutions) and how to improve this feature are provided as well.
... For example, in WoS, the acknowledgement information is systematically missing in publications produced in other languages than English (Díaz-Faes and Bordons 2014). Moreover, mistakes in the affiliation data and/or funding acknowledgements are a quite common problem in bibliometrics (García-Zorita et al. 2006;Jiang et al. 2011;Sirtes 2013). 2. CIBER_address ? ...
Article
Full-text available
The emergence of new networking research organisations is explained by the need to promote excellence in research and to facilitate the resolution of specific problems. This study focuses on a Spanish case, the Biomedical Research Networking Centres (CIBER), created through a partnership of research groups, without physical proximity, who work on common health related issues. These structures are a great challenge for bibliometricians due to their heterogeneous composition and virtual nature. Therefore, the main objective of this paper is to assess different approaches based on addresses, funding acknowledgements and authors to explore which search strategy or combination is more effective to identify CIBER publications. To this end, we downloaded all the Spanish publications from the Web of Science databases, in the subject categories of Gastroenterology/Hepatology and Psychiatry during the period 2008-2011. Our results showed that, taken alone, the dataset based on addresses identified more than 60 % of all potential CIBER publications. However, the best outcome was obtained by combining it with additional datasets based on funding acknowledgements and on authors, recovering more than 80 % of all possible CIBER publications without losing accuracy. In terms of bibliometric performance, all the CIBER sets showed scores above the country average, thus proving the relevance of these virtual organisations. Finally, given the increasing importance of these structures and the fact that authors do not always mention their connection to CIBER, some recommendations are offered to develop clear policies on how, when and where to specify this relationship.
Article
Full-text available
A naive user seeking introductory information on a topic may perceive a domain as it is shown by the search results in a database; however, inconsistencies in indexing can misrepresent the full picture of the domain by including irrelevant documents or omitting relevant ones, sometimes inexplicably. A bibliometric analysis was conducted on the domain of ethics in knowledge organization in the World of Science (WoS) and Library, Information Science & Technology Abstracts (LISTA) databases to discern how it is being presented by search results in those databases and to attempt to determine why inconsistencies occurred.
Article
Full-text available
The evaluation of the scientific activity is an essential element for all the programs of research, technology and development implemented in a society. Scientometrics has contributed to the development of indicators that constitute key tools in the management of scientific and technological policies, and in the processes of strategic decision-making. The current paper is an analysis of the different approaches to the research evaluation from the scientometric perspective. Topics related to the determination of the research quality, the qualitative value of citation analysis, the use of scientometrics indicators in evaluative assessments, the collaboration networks as patterns of scientific development, and the domain analysis as theoretical support to scientometric studies for science evaluation are dealt with. In the XXI century, the strategies were directed to the search of alternatives that allowed the perception of the qualitative dimension inherent in science communication processes, through the use of relative indicators and information visualization techniques, and starting from the implicit recognition of the social and economic conditions where the scientific activity was developed. It is necessary the revision of the scientometric indicators used, as well as the creation of strong information systems to register and process the national scientific production, with the aim to develop evaluative tools that accelerate its growth and improve its visibility and position in the context of the world scientific activity.
Article
Bibliographic databases such as Thomson Reuters' Web of Science (WoS) or Elsevier's Scopus support search filtering by country or institution. However, the study of the scientific production at internal levels of organizations (universities) such as departments or faculties is error prone. In this paper, it shows common errors to retrieve papers in WoS at departmental-level or at faculty-level. We propose a method to support the information retrieval process at internal level of universities. The method is composed by an exhaustive search strategy and a Bayesian model to estimate the attribution of universities' papers that belong to a given department or faculty. The method was validated on two real cases with promising results. This work is a research in progress; the contrast with other methods and other cases of evaluation are proposed as future work. Nevertheless, it could open new opportunities to scientometric studies and research policy.
Article
The International Journal of LCA recently published two articles that highlight life cycle assessment (LCA) research worldwide: Chen et al. (Int J Life Cycle Assess 19(10):1674–1685, 2014) and Hou et al. (Int J Life Cycle Assess 20(4):541–555, 2015). Both research papers conducted a bibliometric analysis using the Web of Science (WoS) database and ranked scientific journals and research institutions according to publication performances based on article counts and citation analysis. The expected outcome of this work is to contribute to a better map of global scientific LCA research— a vital tool in the field. However, the methodology used to retrieve the publications showed that many key LCA research articles were not taken into account in the surveyed literature and revealed inconsistencies related to author affiliation. By studying their research outputs, significant insights have been gained into ways to improve the indexing of scientific papers and, in turn, help researchers better disseminate and communicate their findings and overcome ranking bias.
Article
Full-text available
Engineering as a branch of science has a crucial role in the growth of the economy. The growth and development of engineering is therefore highly relevant. One way to understand this is to examine the characteristics of the scientific knowledge produced in the field of engineering. Drawing on the publications in engineering from the ISI Web of Science over the last three decades, this paper looks at the visibility and importance of engineering research in South Africa. The visibility of research publications is studied in terms of the number of citations a publication receives. The analysis shows that the visibility of South African engineering research is determined by the number of authors involved in the production of a paper, the presence of international collaboration, the degree of collaboration, and the journals in which the papers are published. Engineering research in South Africa, compared with that of all subjects, is clearly growing. But the visibility of South African engineering publications, in comparison with all other subjects, has been diminishing in recent years.
Article
Full-text available
Este trabajo se centra en el análisis bibliométrico de los artículos de revista publicados sobre dos enfermedades raras que generan trastornos mentales y del comportamiento: CADASIL y Síndrome de Rett, durante el periodo 2000-2009. Además, aunque el Rett afecta fundamentalmente al género femenino y las causas de ambas enfermedades son muy diferentes, tienen en común que en ambos casos se trata de trastornos con un origen genético. Para llevar a cabo el análisis se utilizaron dos bases de datos, Medline y SCI y una metodología de análisis multidimensional que permitiera determinar la relación existente entre la producción en estas enfermedades raras y los avances en el campo de la Genética. Los resultados encontrados muestran una clara diferenciación en los patrones de investigación en ambas enfermedades, aunque confluyen en un factor común, la influencia genética, a pesar de que esta está más acentuada en el Síndrome de Rett. Finalmente se concluye que los logros obtenidos en el campo de la genética, tanto específicos (mutaciones en los genes responsables) y generales (secuenciación del genoma humano), afectan significativamente a la actividad científica alrededor de estas enfermedades raras.
Article
A new semi-automatic method is presented to standardize or codify addresses, in order to produce bibliometric indicators from bibliographic databases. The hypothesis is that this new method is very trustworthy to normalize authors’ addresses, easy and quick to obtain. As a way to test the method, a set of already hand-coded data is chosen to verify its reliability: 136,821 Spanish documents (2006–2008) downloaded previously from the Web of Science database. Unique addresses from this set were selected to produce a list of keywords representing various institutional sectors. Once the list of terms is obtained, addresses are standardized with this information and the result is compared to the previous hand-coded data. Some tests are done to analyze possible association between both systems (automatic and hand-coding), calculating measures of recall and precision, and some statistical directional and symmetric measures. The outcome shows a good relation between both methods. Although these results are quite general, this overview of institutional sectors is a good way to develop a second approach for the selection of particular centers. This system has some new features because it provides a method based on the previous non-existence of master lists or tables and it has a certain impact on the automation of tasks. The validity of the hypothesis has been proved taking into account not only the statistical measures, but also considering that the obtaining of general and detailed scientific output is less time-consuming and will be even less due to the feedback of these master tables reused for the same kind of data. The same method could be used with any country and/or database creating a new master list taking into account their specific characteristics.
Article
The correct attribution of scientific publications to their true owners is extremely important, considering the detailed evaluation processes and the future investments based upon them. This attribution is a hard job for bibliometricians because of the increasing amount of documents and the raise of collaboration. Nevertheless, there is no published work with a comprehensive solution of the problem. This article introduces a procedure for the detailed identification and normalisation of addresses to facilitate the correct allocation of the scientific production included in databases. Thanks to our long experience in the manual normalisation of addresses, we have created and maintained various master lists. We have already developed an application to detect institutional sectors (issued in a previous paper) and now we analyse the details of particular institutions, taking advantage of our master tables. To test our methodology we have implemented it in a Spanish data set already manually codified (95,314 unique addresses included in the year 2008 on the Web of Science databases). This data was analysed with a full text search against our master lists, giving optional codes for each address and choosing which one could be automatically encoded and which one should be reviewed manually. The results of the implementation, comparing the automatic versus manual codes, showed 87 % automatically codified records with 1.9 % of error. We should review manually only 13 %. Finally, we applied the Wilcoxon non-parametric test to show the validity of the methodology, comparing detailed codes of centres already encoded with the automatically encoded ones, and concluding that their distribution was similar with a significance of 0.078.
Article
Full-text available
South Africa's record in the production of scientific knowledge in medicine is remarkable, but attempts have yet to be made to examine its distinctive characteristics. This is critical to the understanding of its nature, trends and the directions which it is taking today. Using the publication records extracted from the Science Citation Index (SCI) of the ISI Web of Science for a 3-decade period from 1975 to 2005, with 5-year windows, Ihave examined the salient characteristics of medical research in South Africa in terms of, (1) the number of publications, (2) type of publications (sole/co-authored), (3) collaboration (domestic/international), (4) affiliation sector of authors and collaborators, (5) regional origin of collaborators, (6) publication outlets and (7) citations, in comparison with 'all subjects' covered in the database concerned. This analysis shows that the contribution of medical publications to the total output of South African scholars is shrinking (25% in 1980 to 8% in 2000). Papers produced in collaboration are growing in number (increased by 17% during 1975-2005). While domestic collaboration declined by 24%, international collaboration grew from 4% of total papers in 1975 to 48% in 2005. South African medical researchers now publish more in foreign-originated journals (from 20% in 1975 to 75% in 2005) than in local journals and work mostly in universities, hospitals and research institutes; they collaborate with overseas partners from as many as 56 countries. Significantly, collaboration with Western European partners has increased 45-fold from 1975-2005. This study showed that a marked degree of internationalisation (measured in terms of international collaboration, publications in foreign journals and the number of citations) of South African medical research is taking place and that this trend is likely to continue in the future.
Article
Detecting trends on the use of Information and Communication Technologies (ICT) in the domain of environmental sciences is important to foresee new frontiers of modelling and software research. Here we analyze the impact of ICT in scientific papers published from 1990 to 2007 in all ISI journals and in those belonging to the Environmental Sciences category only. We thus determine some expanding, fluctuating or unexploited research directions. Quite unexpectedly, we find that the frequency of occurrence of many, yet not all, information technologies increased at significantly higher rates in the environmental studies than in the general sciences. We also contrast trends in the EMS journal to those of the entire domain and discuss differences in some details. Retrieving pertinent information for the comparison is certainly made difficult by some inconsistency in the use of keywords and the current lack of semantic oriented tools. Nevertheless, we find that this journal is leading the development on some research topics, like decision support systems or information systems.
Article
Full-text available
This paper addresses the validity, from the educational standpoint, of a methodological proposal for teaching bibliometrics as a part of the library and information science curriculum. The emphasis in this approach is on the use of readily available software, modularly integrated to facilitate full automation of the data gathering and analysis processes, as well as to obtain indicators. This methodology has been in place since 1996 as part of the curricula for the associate degree in Library and Information Science and the bachelor's degree in Documentation awarded by the Carlos III University of Madrid.
Article
Full-text available
This paper informs about an evaluation of Spanish educational research journals using the modality of reputation inferred from survey data. Univariate and multivariate patterns are offered. Specifically cluster analysis and non-parametric multidimensional scaling reveal themselves as useful methods to inquire the complexity of this scientometric question which is the evaluation of periodical series.
Article
Full-text available
An overview is given of the studies published in the international journal Scientometrics during 1978-2000 on cross-national, national and institutional scientometric assessment.
Article
Full-text available
Empirical evidence presented in this paper shows that the utmost care must be taken ininterpreting bibliometric data in a comparative evaluation of national research systems. From theresults of recent studies, the authors conclude that the value of impact indicators of researchactivities at the level of an institution or a country strongly depend upon whether one includes orexcludes research publications in SCI covered journals written in other languages than in English.Additional material was gathered to show the distribution of SCI papers among publicationlanguages. Finally, the authors make suggestions for further research on how to deal with this typeof problems in future national research performance studies.
Article
Full-text available
This is an author post-print (ie final draft post-refereeing) of the ‘scientific correspondence’ paper published in Nature, 397 (7 january 1999): 14. The original publication is available at www.nature.com [http://www.nature.com/nature/journal/v397/n6714/pdf/397014b0.pdf]
Article
The idea of a unified citation index to the literature of science was first outlined by Eugene Garfield [1] in 1955 in the journal Science. Science Citation Index has since established itself as the gold standard for scientific information retrieval. It has also become the database of choice for citation analysts and evaluative bibliometricians worldwide. As scientific publication moves to the web, and novel approaches to scholarly communication and peer review establish themselves, new methods of citation and link analysis will emerge to capture often liminal expressions of peer esteem, influence and approbation. The web thus affords bibliometricians rich opportunities to apply and adapt their techniques to new contexts and content: the age of ‘bibliometric spectroscopy’ [2] is dawning.
Article
Can the journal impact factors regularly published in the Journal Citation Reports (JCR) be shaped by a self-fulfilling prophecy? This question was investigated by reference to a journal for which incorrect impact factors had been published in the JCR for almost 20 years: Educational Research. In order to investigate whether the propagation of exaggerated impact factors had resulted in an increase in the actual impact of the journal, the correct impact factors were calculated. A self-fulfilling prophecy effect was not observed. However, shows that the impact factors for Educational Research published in the JCR were based on calculations that erroneously included citations of a journal with a similar title, Educational Researcher, which is not included in the JCR. Concludes that published impact factors should be used with caution.
Article
The existence of hierarchies based on reputation in modern science is indisputable. A set of common scientific journals is often assumed to be instrumental in the formation of these hierarchies. However, the character of the hierarchies, how monolithic/pluralistic they are and the functions of this differentiation have been discussed and caused controversy. The article brings together results from a survey of 788 Danish researchers, mainly from the social sciences, concerning their assessments of the most influential researchers and most important journals. The rankings indicate a pluralistic picture and only a moderate degree of consensus among researchers. Comparisons with (the few) other surveys and with citation data do not suggest this to be a peculiarity of Danish social scientists, however.
Article
Citation analysis is performed in order to evaluate authors and scientific collections, such as journals and conference proceedings. Currently, two major systems exist that perform citation analysis: Science Citation Index (SCI) by the Institute for Scientific Information (ISI) and CiteSeer by the NEC Research Institute. The SCI, mostly a manual system up until recently, is based on the notion of the ISI Impact Factor, which has been used extensively for citation analysis purposes. On the other hand the CiteSeer system is an automatically built digital library using agents technology, also based on the notion of ISI Impact Factor. In this paper, we investigate new alternative notions besides the ISI impact factor, in order to provide a novel approach aiming at ranking scientific collections. Furthermore, we present a web-based system that has been built by extracting data from the Databases and Logic Programming (DBLP) website of the University of Trier. Our system, by using the new citation metrics, emerges as a useful tool for ranking scientific collections. In this respect, some first remarks are presented, e.g. on ranking conferences related to databases.
Article
Since their arrival in the 1960s, electronic databases have been an invaluable tool for informetricians. Databases and their delivery mechanism have provided both the source of raw data, as well as the analytical tools for many informetric studies. In particular, the citation databases produced by the Institute for Scientific Information have been the key source of data for a whole range of citation-based research. However, there are also many problems and challenges associated with the use of online databases. Most of the problems arise because databases are designed primarily for information retrieval purposes, and informetric studies represent only a secondary use of the systems. The sorts of problems encountered by informetricians include: errors or inconsistency in the data itself; problems with the coverage, overlap and changeability of the databases; as well as problems and limitations in the tools provided by the database hosts such as DIALOG. For some informetric studies, the only viable solution to these problems is to download the data and perform offline correction and data analysis.