Journal of Informetrics

Print ISSN: 1751-1577
Publications
The public sharing of primary research datasets potentially benefits the research community but is not yet common practice. In this pilot study, we analyzed whether data sharing frequency was associated with funder and publisher requirements, journal impact factor, or investigator experience and impact. Across 397 recent biomedical microarray studies, we found investigators were more likely to publicly share their raw dataset when their study was published in a high-impact journal and when the first or last authors had high levels of career experience and impact. We estimate the USA's National Institutes of Health (NIH) data sharing policy applied to 19% of the studies in our cohort; being subject to the NIH data sharing plan requirement was not found to correlate with increased data sharing behavior in multivariate logistic regression analysis. Studies published in journals that required a database submission accession number as a condition of publication were more likely to share their data, but this trend was not statistically significant. These early results will inform our ongoing larger analysis, and hopefully contribute to the development of more effective data sharing initiatives.

Scientific collaboration and endorsement are well-established research topics which utilize three kinds of methods: survey/questionnaire, bibliometrics, and complex network analysis. This paper combines topic modeling and path-finding algorithms to determine whether productive authors tend to collaborate with or cite researchers with the same or different interests, and whether highly cited authors tend to collaborate with or cite each other. Taking information retrieval as a test field, the results show that productive authors tend to directly coauthor with and closely cite colleagues sharing the same research interests; they do not generally collaborate directly with colleagues having different research topics, but instead directly or indirectly cite them; and highly cited authors do not generally coauthor with each other, but closely cite each other.

Spatial scientometrics has attracted a lot of attention in the very recent past. The visualization methods (density maps) presented in this paper allow for an analysis revealing regions of excellence around the world using computer programs that are freely available. Based on Scopus and Web of Science data, field-specific and field-overlapping scientific excellence can be identified in broader regions (worldwide or for a specific continent) where high quality papers (highly cited papers or papers published in Nature or Science) were published. We used a geographic information system to produce our density maps. We also briefly discuss the use of Google Earth.

In this note some new fields of application of Hirsch-related statistics are presented. Furthermore, so far unrevealed properties of the h-index are analysed in the context of rank-frequency and extreme-value statistics.

A bibliometric analysis was applied in this work to evaluate global scientific production of geographic information system (GIS) papers from 1997 to 2006 in any journal of all the subject categories of the Science Citation Index compiled by Institute for Scientific Information (ISI), Philadelphia, USA. ‘GIS’ and ‘geographic information system’ were used as keywords to search parts of titles, abstracts, or keywords. The published output analysis showed that GIS research steadily increased over the past 10 years and the annual paper production in 2006 was about three times 1997s paper production. There are clear distinctions among author keywords used in publications from the five most productive countries (USA, UK, Canada, Germany and China) in GIS research. Bibliometric methods could quantitatively characterize the development of global scientific production in a specific research field. The analytical results eventually provide several key findings.

Combining different data sets with information on grant and fellowship applications submitted to two renowned funding agencies, we are able to compare their funding decisions (award and rejection) with scientometric performance indicators across two fields of science (life sciences and social sciences). The data sets involve 671 applications in social sciences and 668 applications in life sciences. In both fields, awarded applicants perform on average better than all rejected applicants. If only the most preeminent rejected applicants are considered in both fields, they score better than the awardees on citation impact. With regard to productivity we find differences between the fields. While the awardees in life sciences outperform on average the most preeminent rejected applicants, the situation is reversed in social sciences.

The new excellence indicator in the World Report of the SCImago Institutions Rankings (SIR) makes it possible to test differences in the ranking in terms of statistical significance. For example, at the 17th position of these rankings, UCLA has an output of 37,994 papers with an excellence indicator of 28.9. Stanford University follows at the 19th position with 37,885 papers and 29.1 excellence, and z = - 0.607. The difference between these two institution thus is not statistically significant. We provide a calculator at http://www.leydesdorff.net/scimago11/scimago11.xls in which one can fill out this test for any two institutions and also for each institution on whether its score is significantly above or below expectation (assuming that 10% of the papers are for stochastic reasons in the top-10% set).

Field normalized citation rates are well-established indicators for research performance from the broadest aggregation levels such as countries, down to institutes and research teams. When applied to still more specialized publication sets at the level of individual scientists, also a more accurate delimitation is required of the reference domain that provides the expectations to which a performance is compared. This necessity for sharper accuracy challenges standard methodology based on predefined subject categories. This paper proposes a way to define a reference domain that is more strongly delimited than in standard methodology, by building it up out of cells of the partition created by the pre-defined subject categories and their intersections. This partition approach can be applied to different existing field normalization variants. The resulting reference domain lies between those generated by standard field normalization and journal normalization. Examples based on fictive and real publication records illustrate how the potential impact on results can exceed or be smaller than the effect of other currently debated normalization variants, depending on the case studied. The proposed Partition-based Field Normalization is expected to offer advantages in particular at the level of individual scientists and other very specific publication records, such as publication output from interdisciplinary research.

This paper reviews developments in informetrics between 2000 and 2006. At the beginning of the 21st century we witness considerable growth in webometrics, mapping and visualization and open access. A new topic is comparison between citation databases, as a result of the introduction of two new citation databases Scopus and Google Scholar. There is renewed interest in indicators as a result of the introduction of the h-index. Traditional topics like citation analysis and informetric theory also continue to develop. The impact factor debate, especially outside the informetric literature continues to thrive. Ranked lists (of journal, highly cited papers or of educational institutions) are of great public interest.

Three variations on the power law model proposed by Egghe are fitted to four groups of h-index time series: publication-citation data for authors, journals and universities; and patent citation data for firms. It is shown that none of the power law models yields an adequate description of total career h-index sequences.

International collaboration as measured by co-authorship relations on refereed papers grew linearly from 1990 to 2005 in terms of the number of papers, but exponentially in terms of the number of international addresses. This confirms Persson et al.'s [Persson, O., Glänzel, W., & Danell, R. (2004). Inflationary bibliometrics values: The role of scientific collaboration and the need for relative indicators in evaluative studies. Scientometrics, 60(3), 421–432] hypothesis of an inflation in international collaboration. Patterns in international collaboration in science can be considered as network effects, since there is no political institution mediating relationships at that level except for the initiatives of the European Commission. Science at the international level shares features with other complex adaptive systems whose order arises from the interactions of hundreds of agents pursuing self-interested strategies. During the period 2000–2005, the network of global collaborations appears to have reinforced the formation of a core group of fourteen most cooperative countries. This core group can be expected to use knowledge from the global network with great efficiency, since these countries have strong national systems. Countries at the periphery may be disadvantaged by the increased strength of the core.

Rankings of journals and rankings of scientists are usually discussed separately. We argue that a consistent approach to both rankings is desirable because both the quality of a journal and the quality of a scientist depend on the papers it/he publishes. We present a pair of consistent rankings (impact factor for the journals and total number of citations for the authors) and we provide an axiomatic characterization thereof.

Digital libraries (DLs) are complex information systems which can present changes in their structure, content, and services. These complexities and dynamics make system maintenance a non-trivial task, since it requires periodical evaluation of the different DL components. Generally, these evaluations are customized per system and are performed only when problems occur and administrator intervention is required. This work aims to change the situation. We present 5SQual, a tool which provides ways to perform automatic and configurable evaluations of some of the most important DL components, among them, digital objects, metadata, and services. The tool implements diverse numeric indicators that are associated with eight quality dimensions described in the 5S quality model. Its generic architecture was developed to be applicable to various DLs and scenarios. In sum, the main contributions of this work include: (i) the design and implementation of 5SQual, a tool that validates a theoretical DL quality model; (ii) the demonstration of the applicability of the tool in several usage scenarios; and (iii) the evaluation (with usability specialists) of its graphical interface specially designed to guide the configuration of 5SQual evaluations. We also present the results of interviews conducted with administrators of real DLs regarding their expectations and opinions about 5SQual.

Dyads of journals related by citations can agglomerate into specialties through the mechanism of triadic closure. Using the Journal Citation Reports 2011, 2012, and 2013, we analyze triad formation as indicators of integration (specialty growth) and disintegration (restructuring). The strongest integration is found among the large journals that report on studies in different scientific specialties, such as PLoS ONE, Nature Communications, Nature, and Science. This tendency towards large-scale integration has not yet stabilized. Using the Islands algorithm, we also distinguish 51 local maxima of integration. We zoom into the cited articles that carry the integration for: (i) a new development within high-energy physics and (ii) an emerging interface between the journals Applied Mathematical Modeling and the International Journal of Advanced Manufacturing Technology. In the first case, integration is brought about by a specific communication reaching across specialty boundaries, whereas in the second, the dyad of journals indicates an emerging interface between specialties. These results suggest that integration picks up substantive developments at the specialty level. An advantage of the bottom-up method is that no ex ante classification of journals is assumed in the dynamic analysis.

The general aim of this paper is to show the results of a study in which we combined bibliometric mapping and citation network analysis to investigate the process of creation and transfer of knowledge through scientific publications. The novelty of this approach is the combination of both methods. In this case we analyzed the citations to a very influential paper published in 1990 that contains, for the first time, the term Absorptive Capacity. A bibliometric map identified the terms and the theories associated with the term while two techniques from the citation network analysis recognized the main papers during 15 years. As a result we identified the articles that influenced the research for some time and linked them into a research tradition that can be considered the backbone of the “Absorptive Capacity Field”.

Examining a comprehensive set of papers (n = 1837) that were accepted for publication by the journal Angewandte Chemie International Edition (one of the prime chemistry journals in the world) or rejected by the journal but then published elsewhere, this study tested the extent to which the use of the freely available database Google Scholar (GS) can be expected to yield valid citation counts in the field of chemistry. Analyses of citations for the set of papers returned by three fee-based databases – Science Citation Index, Scopus, and Chemical Abstracts – were compared to the analysis of citations found using GS data. Whereas the analyses using citations returned by the three fee-based databases show very similar results, the results of the analysis using GS citation data differed greatly from the findings using citations from the fee-based databases. Our study therefore supports, on the one hand, the convergent validity of citation analyses based on data from the fee-based databases and, on the other hand, the lack of convergent validity of the citation analysis based on the GS data.

The use of scholarly publications that have not been formally published in e.g. journals is widespread in some fields. In the past they have been disseminated through various channels of informal communication. However, the Internet has enabled dissemination of these un-published and often unrefereed publications to a much wider audience. This is particularly interesting seen in relation to the highly disputed open access advantage as the potential advantage for low visibility publications has not been given much attention in the literature. The present study examines the role of working papers in economics during a 10-year period (1996–2005). It shows that working papers are increasingly becoming visible in the field specific databases. The impact of working papers is relatively low; however, high impact working paper series have citation rate levels similar to the low impact journals in the field. There is no tendency to an increase in impact during the 10 years which is the case for the high impact journals. Consequently, the result of this study does not provide evidence of an open access advantage for working papers in economics.

Field delimitation for citation analysis, the process of collecting a set of bibliographic records with cited-reference information of research articles that represent a research field, is the first step in any citation analysis study of a research field. Due to a number of limitations, the commercial citation indexes have long made it difficult to obtain a comprehensive dataset in this step. This paper discusses some of the limitations imposed by these databases, and reports on a method to overcome some of these limitations that was used with great success to delimit an emerging and highly interdisciplinary biomedical research field, stem cell research. The resulting field delimitation and the citation network it induces are both excellent. This multi-database method relies on using PubMed for the actual field delimitation, and on mapping between Scopus and PubMed records for obtaining comprehensive information about cited-references contained in the resulting literature. This method provides high-quality field delimitations for citation studies that can be used as benchmarks for studies of the impact of data collection biases on citation metrics, and may help improve confidence in results of scientometric studies for an increased impact of scientometrics on research policy.

In order to take multiple co-authorship appropriately into account, a straightforward modification of the Hirsch index was recently proposed. Fractionalised counting of the papers yields an appropriate measure which is called the hm-index. The effect of this procedure is compared in the present work with other variants of the h-index and found to be superior to the fractionalised counting of citations and to the normalization of the h-index with the average number of authors in the h-core. Three fictitious examples for model cases and one empirical case are analysed.

We present a simple generalization of Hirsch's h-index, Z = \sqrt{h^{2}+C}/\sqrt{5}, where C is the total number of citations. Z is aimed at correcting the potentially excessive penalty made by h on a scientist's highly cited papers, because for the majority of scientists analyzed, we find the excess citation fraction (C-h^{2})/C to be distributed closely around the value 0.75, meaning that 75 percent of the author's impact is neglected. Additionally, Z is less sensitive to local changes in a scientist's citation profile, namely perturbations which increase h while only marginally affecting C. Using real career data for 476 physicists careers and 488 biologist careers, we analyze both the distribution of $Z$ and the rank stability of Z with respect to the Hirsch index h and the Egghe index g. We analyze careers distributed across a wide range of total impact, including top-cited physicists and biologists for benchmark comparison. In practice, the Z-index requires the same information needed to calculate h and could be effortlessly incorporated within career profile databases, such as Google Scholar and ResearcherID. Because Z incorporates information from the entire publication profile while being more robust than h and g to local perturbations, we argue that Z is better suited for ranking comparisons in academic decision-making scenarios comprising a large number of scientists.

There are a number of solutions that perform unsupervised name disambiguation based on the similarity of bibliographic records or common co-authorship patterns. Whether the use of these advanced methods, which are often difficult to implement, is warranted depends on whether the accuracy of the most basic disambiguation methods, which only use the author's last name and initials, is sufficient for a particular purpose. We derive realistic estimates for the accuracy of simple, initials-based methods using simulated bibliographic datasets in which the true identities of authors are known. Based on the simulations in five diverse disciplines we find that the first initial method already correctly identifies 97% of authors. An alternative simple method, which takes all initials into account, is typically two times less accurate, except in certain datasets that can be identified by applying a simple criterion. Finally, we introduce a new name-based method that combines the features of first initial and all initials methods by implicitly taking into account the last name frequency and the size of the dataset. This hybrid method reduces the fraction of incorrectly identified authors by 10-30% over the first initial method.

This paper introduces a new approach to describe the spread of research topics across disciplines using epidemic models. The approach is based on applying individual-based models from mathematical epidemiology to the diffusion of a research topic over a contact network that represents knowledge flows over the map of science—as obtained from citations between ISI Subject Categories. Using research publications on the protein class kinesin as a case study, we report a better fit between model and empirical data when using the citation-based contact network. Incubation periods on the order of 4–15.5 years support the view that, whilst research topics may grow very quickly, they face difficulties to overcome disciplinary boundaries.

Citation numbers are extensively used for assessing the quality of scientific research. The use of raw citation counts is generally misleading, especially when applied to cross-disciplinary comparisons, since the average number of citations received is strongly dependent on the scientific discipline of reference of the paper. Measuring and eliminating biases in citation patterns is crucial for a fair use of citation numbers. Several numerical indicators have been introduced with this aim, but so far a specific statistical test for estimating the fairness of these numerical indicators has not been developed. Here we present a statistical method aimed at estimating the effectiveness of numerical indicators in the suppression of citation biases. The method is simple to implement and can be easily generalized to various scenarios. As a practical example we test, in a controlled case, the fairness of fractional citation count, which has been recently proposed as a tool for cross-discipline comparison. We show that this indicator is not able to remove biases in citation patterns and performs much worse than the rescaling of citation counts with average values.

Slovenia’s Current Research Information System (SICRIS) currently hosts 86,443 publications with citation data from 8359 researchers working on the whole plethora of social and natural sciences from 1970 till present. Using these data, we show that the citation distributions derived from individual publications have Zipfian properties in that they can be fitted by a power law P(x)∼x−α, with α between 2.4 and 3.1 depending on the institution and field of research. Distributions of indexes that quantify the success of researchers rather than individual publications, on the other hand, cannot be associated with a power law. We find that for Egghe’s g-index and Hirsch’s h-index the log-normal form P(x)∼exp⁡[−aln⁡x−b(ln⁡x)2] applies best, with a and b depending moderately on the underlying set of researchers. In special cases, particularly for institutions with a strongly hierarchical constitution and research fields with high self-citation rates, exponential distributions can be observed as well. Both indexes yield distributions with equivalent statistical properties, which is a strong indicator for their consistency and logical connectedness. At the same time, differences in the assessment of citation histories of individual researchers strengthen their importance for properly evaluating the quality and impact of scientific output.

Evaluative bibliometrics is concerned with comparing research units by using statistical procedures. According to Williams (2012) an empirical study should be concerned with the substantive and practical significance of the findings as well as the sign and statistical significance of effects. In this study we will explain what adjusted predictions and marginal effects are and how useful they are for institutional evaluative bibliometrics. As an illustration, we will calculate a regression model using publications (and citation data) produced by four universities in German-speaking countries from 1980 to 2010. We will show how these predictions and effects can be estimated and plotted, and how this makes it far easier to get a practical feel for the substantive meaning of results in evaluative bibliometric studies. We will focus particularly on Average Adjusted Predictions (AAPs), Average Marginal Effects (AMEs), Adjusted Predictions at Representative Values (APRVs) and Marginal Effects at Representative Values (MERVs).

The data of F1000 provide us with the unique opportunity to investigate the relationship between peers' ratings and bibliometric metrics on a broad and comprehensive data set with high-quality ratings. F1000 is a post-publication peer review system of the biomedical literature. The comparison of metrics with peer evaluation has been widely acknowledged as a way of validating metrics. Based on the seven indicators offered by InCites, we analyzed the validity of raw citation counts (Times Cited, 2nd Generation Citations, and 2nd Generation Citations per Citing Document), normalized indicators (Journal Actual/Expected Citations, Category Actual/Expected Citations, and Percentile in Subject Area), and a journal based indicator (Journal Impact Factor). The data set consists of 125 papers published in 2008 and belonging to the subject category cell biology or immunology. As the results show, Percentile in Subject Area achieves the highest correlation with F1000 ratings; we can assert that for further three other indicators (Times Cited, 2nd Generation Citations, and Category Actual/Expected Citations) the 'true' correlation with the ratings reaches at least a medium effect size.

The definition of the g-index is as arbitrary as that of the h-index, because the threshold number g^2 of citations to the g most cited papers can be modified by a prefactor at one's discretion, thus taking into account more or less of the highly cited publications within a dataset. In a case study I investigate the citation records of 26 physicists and show that the prefactor influences the ranking in terms of the generalized g-index less than for the generalized h-index. I propose specifically a prefactor of 2 for the g-index, because then the resulting values are of the same order of magnitude as for the common h-index. In this way one can avoid the disadvantage of the original g-index, namely that the values are usually substantially larger than for the h-index and thus the precision problem is substantially larger; while the advantages of the g-index over the h-index are kept. Like for the generalized h-index, also for the generalized g-index different prefactors might be more useful for investigations which concentrate only on top scientists with high citation frequencies or on junior researchers with small numbers of citations.

The relationship of the h-index with other bibliometric indicators at the micro level is analysed for Spanish CSIC scientists in Natural Resources, using publications downloaded from the Web of Science (1994–2004). Different activity and impact indicators were obtained to describe the research performance of scientists in different dimensions, being the h-index located through factor analysis in a quantitative dimension highly correlated with the absolute number of publications and citations. The need to include the remaining dimensions in the analysis of research performance of scientists and the risks of relying only on the h-index are stressed. The hypothesis that the achievement of some highly visible but intermediate-productive authors might be underestimated when compared with other scientists by means of the h-index is tested.

When carrying out a research project, some materials may not be available in-house. Thus, investigators resort to external providers for conducting their research. To that end, the exchange may be formalised through material transfer agreements. In this context, industry, government and academia have their own specific expectations regarding compensation for the help they provide when transferring the research material. This paper assesses whether these contracts might have had an impact on visibility of researchers. Visibility is thereby operationalised on the basis of a bibliometric approach. In the sample utilised, researchers that availed themselves of these contracts were more visible compared to those who did not use them, controlling for seniority and co-authorship. Nonetheless, providers and receivers could not be differentiated in terms of visibility but by research sector and co-authorship. Being a user of these contracts might, to some extent, be the reflection of systematic differences in the stratification of science based on visibility.

Many, if not most network analysis algorithms have been designed specifically for single-relational networks; that is, networks in which all edges are of the same type. For example, edges may either represent "friendship," "kinship," or "collaboration," but not all of them together. In contrast, a multi-relational network is a network with a heterogeneous set of edge labels which can represent relationships of various types in a single data structure. While multi-relational networks are more expressive in terms of the variety of relationships they can capture, there is a need for a general framework for transferring the many single-relational network analysis algorithms to the multi-relational domain. It is not sufficient to execute a single-relational network analysis algorithm on a multi-relational network by simply ignoring edge labels. This article presents an algebra for mapping multi-relational networks to single-relational networks, thereby exposing them to single-relational network analysis algorithms. Comment: ISSN:1751-1577

There are different ways in which the authors of a scientific publication can determine the order in which their names are listed. Sometimes author names are simply listed alphabetically. In other cases, authorship order is determined based on the contribution authors have made to a publication. Contribution-based authorship can facilitate proper credit assignment, for instance by giving most credits to the first author. In the case of alphabetical authorship, nothing can be inferred about the relative contribution made by the different authors of a publication. In this paper, we present an empirical analysis of the use of alphabetical authorship in scientific publishing. Our analysis covers all fields of science. We find that the use of alphabetical authorship is declining over time. In 2011, the authors of less than 4% of all publications intentionally chose to list their names alphabetically. The use of alphabetical authorship is most common in mathematics, economics (including finance), and high energy physics. Also, the use of alphabetical authorship is relatively more common in the case of publications with either a small or a large number of authors.

Can altmetric data be validly used for the measurement of societal impact? The current study seeks to answer this question with a comprehensive dataset (about 100,000 records) from very disparate sources (F1000, Altmetric, and an in-house database based on Web of Science). In the F1000 peer review system, experts attach particular tags to scientific papers which indicate whether a paper could be of interest for science or rather for other segments of society. The results show that papers with the tag "good for teaching" do achieve higher altmetric counts than papers without this tag - if the quality of the papers is controlled. At the same time, a higher citation count is shown especially by papers with a tag that is specifically scientifically oriented ("new finding"). The findings indicate that papers tailored for a readership outside the area of research should lead to societal impact. If altmetric data is to be used for the measurement of societal impact, the question arises of its normalization. In bibliometrics, citations are normalized for the papers' subject area and publication year. This study has taken a second analytic step involving a possible normalization of altmetric data. As the results show there are particular scientific topics which are of especial interest for a wide audience. Since these more or less interesting topics are not completely reflected in Thomson Reuters' journal sets, a normalization of altmetric data should not be based on the level of subject categories, but on the level of topics.

Is more always better? We address this question in the context of bibliometric indices that aim to assess the scientific impact of individual researchers by counting their number of highly cited publications. We propose a simple model in which the number of citations of a publication correlates with the scientific impact of the publication but also depends on other 'random' factors. Our model indicates that more need not always be better. It turns out that the most influential researchers may have a systematically lower performance, in terms of highly cited publications, than some of their less influential colleagues. The model also suggests an improved way of counting highly cited publications.

This paper investigates the mechanism of the Journal Impact Factor (JIF). Although created as a journal selection tool the indicator is probably the central quantitative indicator for measuring journal quality. The focus is journal self-citations as the treatment of these in analyses and evaluations is highly disputed. The role of self-citations (both self-citing rate and self-cited rate) is investigated on a larger scale in this analysis in order to achieve statistical reliable material that can further qualify that discussion. Some of the hypotheses concerning journal self-citations are supported by the results and some are not.

This paper introduces the Hirsch spectrum (h-spectrum) for analyzing the academic reputation of a scientific journal. h-Spectrum is a novel tool based on the Hirsch (h) index. It is easy to construct: considering a specific journal in a specific interval of time, h-spectrum is defined as the distribution representing the h-indexes associated to the authors of the journal articles. This tool allows defining a reference profile of the typical author of a journal, compare different journals within the same scientific field, and provide a rough indication of prestige/reputation of a journal in the scientific community. h-Spectrum can be associated to every journal. Ten specific journals in the Quality Engineering/Quality Management field are analyzed so as to preliminarily investigate the h-spectrum characteristics.

We present CitNetExplorer, a new software tool for analyzing and visualizing citation networks of scientific publications. CitNetExplorer can for instance be used to study the development of a research field, to delineate the literature on a research topic, and to support literature reviewing. We first introduce the main concepts that need to be understood when working with CitNetExplorer. We then demonstrate CitNetExplorer by using the tool to analyze the scientometric literature and the literature on community detection in networks. Finally, we discuss some technical details on the construction, visualization, and analysis of citation networks in CitNetExplorer.

We address the question how citation-based bibliometric indicators can best be normalized to ensure fair comparisons between publications from different scientific fields and different years. In a systematic large-scale empirical analysis, we compare a normalization approach based on a field classification system with three source normalization approaches. We pay special attention to the selection of the publications included in the analysis. Publications in national scientific journals, popular scientific magazines, and trade magazines are not included. Unlike earlier studies, we use algorithmically constructed classification systems to evaluate the different normalization approaches. Our analysis shows that a source normalization approach based on the recently introduced idea of fractional citation counting does not perform well. Two other source normalization approaches generally outperform the classification-system-based normalization approach that we study. Our analysis therefore offers considerable support for the use of source-normalized bibliometric indicators.

Percentile-based approaches have been proposed as a non-parametric alternative to parametric central-tendency statistics to normalize observed citation counts. Percentiles are based on an ordered set of citation counts in a reference set, whereby the fraction of papers at or below the citation counts of a focal paper is used as an indicator for its relative citation impact in the set. In this study, we pursue two related objectives: (1) although different percentile-based approaches have been developed, an approach is hitherto missing that satisfies a number of criteria such as scaling of the percentile ranks from zero (all other papers perform better) to 100 (all other papers perform worse), and solving the problem with tied citation ranks unambiguously. We introduce a new citation-rank approach having these properties, namely P100. (2) We compare the reliability of P100 empirically with other percentile-based approaches, such as the approaches developed by the SCImago group, the Centre for Science and Technology Studies (CWTS), and Thomson Reuters (InCites), using all papers published in 1980 in Thomson Reuters Web of Science (WoS). How accurately can the different approaches predict the long-term citation impact in 2010 (in year 31) using citation impact measured in previous time windows (years 1 to 30)? The comparison of the approaches shows that the method used by InCites overestimates citation impact (because of using the highest percentile rank when papers are assigned to more than a single subject category) whereas the SCImago indicator shows higher power in predicting the long-term citation impact on the basis of citation rates in early years. Since the results show a disadvantage in this predictive ability for P100 against the other approaches, there is still room for further improvements.

Bibliometric studies often rely on field-normalized citation impact indicators in order to make comparisons between scientific fields. We discuss the connection between field normalization and the choice of a counting method for handling publications with multiple co-authors. Our focus is on the choice between full counting and fractional counting. Based on an extensive theoretical and empirical analysis, we argue that properly field-normalized results cannot be obtained when full counting is used. Fractional counting does provide results that are properly field normalized. We therefore recommend the use of fractional counting in bibliometric studies that require field normalization, especially in studies at the level of countries and research organizations. We also compare different variants of fractional counting. In general, it seems best to use either the author-level or the address-level variant of fractional counting.

In this paper a machine learning approach for classifying Arabic text documents is presented. To handle the high dimensionality of text documents, embeddings are used to map each document (instance) into R (the set of real numbers) representing the tri-gram frequency statistics profiles for a document. Classification is achieved by computing a dissimilarity measure, called the Manhattan distance, between the profile of the instance to be classified and the profiles of all the instances in the training set. The class (category) to which an instance (document) belongs is the one with the least computed Manhattan measure. The Dice similarity measure is used to compare the performance of method. Results show that tri-gram text classification using the Dice measure outperforms classification using the Manhattan measure.

The arbitrariness of the h-index becomes evident, when one requires q*h instead of h citations as the threshold for the definition of the index, thus changing the size of the core of the most influential publications of a dataset. I analyze the citation records of 26 physicists in order to determine how much the prefactor q influences the ranking. Likewise, the arbitrariness of the highly-cited-publications indicator is due to the threshold value, given either as an absolute number of citations or as a percentage of highly cited papers. The analysis of the 26 citation records shows that the changes in the rankings in dependence on these thresholds are rather large and comparable with the respective changes for the h-index.

In the case of the scientometric evaluation of multi- or interdisciplinary units one risks to compare apples with oranges: each paper has to be assessed in comparison to an appropriate reference set. We suggest that the set of citing papers can be considered as the relevant representation of the field of impact. In order to normalize for differences in citation behavior among fields, citations can be fractionally counted proportionately to the length of the reference lists in the citing papers. This new method enables us to compare among units with different disciplinary affiliations at the paper level and also to assess the statistical significance of differences among sets. Twenty-seven departments of the Tsinghua University in Beijing are thus compared. Among them, the Department of Chinese Language and Linguistics is upgraded from the 19th to the second position in the ranking. The overall impact of 19 of the 27 departments is not significantly different at the 5% level when thus normalized for different citation potentials.

In December 2003, seventeen years after the first UK research assessment exercise, Italy started up its first-ever national research evaluation, with the aim to evaluate, using the peer review method, the excellence of the national research production. The evaluation involved 20 disciplinary areas, 102 research structures, 18,500 research products and 6,661 peer reviewers (1,465 from abroad); it had a direct cost of 3.55 millions Euros and a time length spanning over 18 months. The introduction of ratings based on ex post quality of output and not on ex ante respect for parameters and compliance is an important leap forward of the national research evaluation system toward meritocracy. From the bibliometric perspective, the national assessment offered the unprecedented opportunity to perform a large-scale comparison of peer review and bibliometric indicators for an important share of the Italian research production. The present investigation takes full advantage of this opportunity to test whether peer review judgements and (article and journal) bibliometric indicators are independent variables and, in the negative case, to measure the sign and strength of the association. Outcomes allow us to advocate the use of bibliometric evaluation, suitably integrated with expert review, for the forthcoming national assessment exercises, with the goal of shifting from the assessment of research excellence to the evaluation of average research performance without significant increase of expenses.

This paper raises concerns about the advantages of using statistical significance tests in research assessments as has recently been suggested in the debate about proper normalization procedures for citation indicators. Statistical significance tests are highly controversial and numerous criticisms have been leveled against their use. Based on examples from articles by proponents of the use statistical significance tests in research assessments, we address some of the numerous problems with such tests. The issues specifically discussed are the ritual practice of such tests, their dichotomous application in decision making, the difference between statistical and substantive significance, the implausibility of most null hypotheses, the crucial assumption of randomness, as well as the utility of standard errors and confidence intervals for inferential purposes. We argue that applying statistical significance tests and mechanically adhering to their results is highly problematic and detrimental to critical thinking. We claim that the use of such tests do not provide any advantages in relation to citation indicators, interpretations of them, or the decision making processes based upon them. On the contrary their use may be harmful. Like many other critics, we generally believe that statistical significance tests are over- and misused in the social sciences including scientometrics and we encourage a reform on these matters.

AimThe scientific norm of universalism prescribes that external reviewers recommend the allocation of awards to young scientists solely on the basis of their scientific achievement. Since the evaluation of grants utilizes scientists with different personal attributes, it is natural to ask whether the norm of universalism reflects the actual evaluation practice.Subjects and methodsWe investigated the influence of three attributes of external reviewers on their ratings in the selection procedure followed by the Boehringer Ingelheim Fonds (B.I.F.) for awarding long-term fellowships to doctoral and post-doctoral researchers in biomedicine: (i) number of applications assessed in the past for the B.I.F. (reviewers’ evaluation experience), (ii) the reviewers’ country of residence and (iii) the reviewers’ gender. To analyze the reviewers’ ratings (1: award; 2: maybe award; 3: no award) in an ordinal regression model (ORM) the following were considered in addition to the three attributes: (i) the scientific achievements of the fellowship applicants, (ii) interaction effects between reviewers’ and applicants’ attributes and (iii) judgmental tendencies of reviewers.ResultsThe results of the model estimations show no significant effect of the reviewers’ attributes on the evaluation of B.I.F. fellowship applications. The ratings of the external reviewers are mainly determined by the applicants’ scientific achievement prior to application.ConclusionsThe results suggest that the external reviewers of the B.I.F. indeed achieved the foundation's goal of recommending applicants with higher scientific achievement for fellowships and of recommending those with lower scientific achievement for rejection.

The paper reviews the literature on disciplinary credit assignment practices, and presents the results of a longitudinal study of credit assignment practices in the fields of economics, high energy physics, and information science. The practice of alphabetization of authorship is demonstrated to vary significantly between the fields. A slight increase is found to have taken place in economics during the last 30 years (1978–2007). A substantial decrease is found to have taken place in information science during the same period. High energy physics is found to be characterised by a high and stable share of alphabetized multi-authorships during the investigated period (1990–2007). It is important to be aware of such disciplinary differences when conducting bibliometric analyses.

In this paper we present a number of metrics for usage of the SAO/NASA Astrophysics Data System (ADS). Since the ADS is used by the entire astronomical community, these are indicative of how the astronomical literature is used. We will show how the use of the ADS has changed both quantitatively and qualitatively. We will also show that different types of users access the system in different ways. Finally, we show how use of the ADS has evolved over the years in various regions of the world. The ADS is funded by NASA Grant NNG06GG68G. Comment: 12 pages, 8 figures, 2 tables. Accepted by Journal of Informetrics

We analyze whether preferential attachment in scientific coauthorship networks is different for authors with different forms of centrality. Using a complete database for the scientific specialty of research about "steel structures," we show that betweenness centrality of an existing node is a significantly better predictor of preferential attachment by new entrants than degree or closeness centrality. During the growth of a network, preferential attachment shifts from (local) degree centrality to betweenness centrality as a global measure. An interpretation is that supervisors of PhD projects and postdocs broker between new entrants and the already existing network, and thus become focal to preferential attachment. Because of this mediation, scholarly networks can be expected to develop differently from networks which are predicated on preferential attachment to nodes with high degree centrality.

The structure of different types of time series in citation analysis is revealed, using an adapted form of the Frandsen–Rousseau notation. Special cases where this approach can be used include time series of impact factors and time series of h-indices, or h-type indices. This leads to a tool describing dynamic aspects of citation analysis. Time series of h-indices are calculated in some specific models.

As part of its program of 'Excellence in Research for Australia' (ERA), the Australian Research Council ranked journals into four categories (A*, A, B, C) in preparation for their performance evaluation of Australian universities. The ranking is important because it likely to have a major impact on publication choices and research dissemination in Australia. The ranking is problematic because it is evident that some disciplines have been treated very differently than others. This paper reveals weaknesses in the ERA journal ranking and highlights the poor correlation between ERA rankings and other acknowledged metrics of journal standing. It highlights the need for a reasonable representation of journals ranked as A* in each scientific discipline.

Top-cited authors
• Max Planck Society
• University of Amsterdam
• University of Naples Federico II
• Leiden University
• University of Wolverhampton