Table 3 - uploaded by Marek Gagolewski

Content may be subject to copyright.

Source publication

The Hirsch's h-index is perhaps the most popular citation-based measure of the scientific excellence. In 2013 G. Ionescu and B. Chopard proposed an agent-based model for this index to describe a publications and citations generation process in an abstract scientific community. With such an approach one can simulate a single scientist's activity, an...

## Context in source publication

**Context 1**

... order to assess the quality of the proposed ap- proximation we choose vectors of length greater than or equal to 20 (in total number of 69) and from the vectors of length smaller than 20 we randomly choose 31 with uni- form distribution. Basic sample statistics of the selected sample are presented in Table 3. Fig. 3: Comparison of the h-index and its approximations on a Scopus data set. ...

## Similar publications

Background and aim:
COVID-19 has affected the world population, with a higher impact among at-risk groups, such as diabetic patients. This has led to an exponential increase in the number of studies related to the subject, although their bibliometric characteristics are unknown. This article aims to characterize the world scientific production on...

Background
Despite its impact on female health worldwide, no efforts have been made to depict the global architecture of ovarian cancer research and to understand the trends in the related literature. Hence, it was the objective of this study to assess the global scientific performance chronologically, geographically and in regards to economic benc...

## Citations

... In the recent paper (Siudem et al., 2020) we have introduced the so-called 3DSI model (3 dimensions of scientific impact). It is an agent-based model inspired by (Ionescu and Chopard, 2013;Żogała-Siudem et al., 2016) that captures the evolution of an author's citation record which we represent with where X k denotes the number of citations received by the k-th most referenced paper. The 3DSI model has the following intuitive underlying assumptions: in each time step one new paper is added into the author's track record. ...

We demonstrate that by using a triple of simple numerical summaries: an author’s productivity, their overall impact, and a single other bibliometric index that aims to capture the shape of the citation distribution, we can reconstruct other popular metrics of bibliometric impact with a sufficient degree of precision. We thus conclude that the use of many indices may be unnecessary – entities should not be multiplied beyond necessity. Such a study was possible thanks to our new agent-based model (Siudem et al. in Proc Natl Acad Sci 117:13896–13900, 2020, 10.1073/pnas.2001064117 ), which not only assumes that citations are distributed according to a mixture of the rich-get-richer rule and sheer chance, but also fits real bibliometric data quite well. We investigate which bibliometric indices have good discriminative power, which measures can be easily predicted as functions of other ones, and what implications to the research evaluation practice our findings have.

... One such mechanism is known as rich-get-richer, success-breads-success [3], the Matthew effect [4], or the preferential attachment rule [5]. It assumes that highly cited papers are most likely to receive even more citations (for the possible bibliometric applications of this rule see [6,7]). Yet, recent research [8,9] into the origins of success in science and beyond highlights the role of other factors -such as chance. ...

We consider a version of the D.J.Price's model for the growth of a bibliographic network, where in each iteration a constant number of citations is randomly allocated according to a weighted combination of accidental (uniformly distributed) and preferential (rich-get-richer) rules. Instead of relying on the typical master equation approach, we formulate and solve this problem in terms of the rank-size distribution. We show that, asymptotically, such a process leads to a Pareto-type 2 distribution with an appealingly interpretable parametrisation. We prove that the solution to the Price model expressed in terms of the rank-size distribution coincides with the expected values of order statistics in an independent Paretian sample. We study the bias and the mean squared error of three well-behaving estimators of the underlying model parameters. An empirical analysis of a large repository of academic papers yields a good fit not only in the tail of the distribution (as it is usually the case in the power law-like framework), but also across the whole domain. Interestingly, the estimated models indicate higher degree of preferentially attached citations and smaller share of randomness than previous studies.

... Only occasionally does an S-shape occur, while in our simulations a concave increase is very rare" (p. 410).Ionescu and Chopard (2013) published two agent-based models which refer to performance measurements of single scientists and a group of scientists (see alsoŻogała-Siudem, Siudem, Cena, & Gagolewski, 2016). They studied, for example, what happens when low h index researchers are removed from a community. ...

Recently, Hirsch (2019a) proposed a new variant of the h index called the h α index. The h α index was criticized by Leydesdorff, Bornmann, and Opthof (2019). One of their most important points is that the index reinforces the Matthew effect in science. The Matthew effect was defined by Merton (1968) as follows: “the Matthew effect consists in the accruing of greater increments of recognition for particular scientific contributions to scientists of considerable repute and the withholding of such recognition from scientists who have not yet made their mark” (p. 58). We follow up on the point about the Matthew effect in the current study by using a recently developed Stata command (h_index) and R package (hindex), which can be used to simulate h index and hα index applications in research evaluation. The user can investigate under which conditions hα reinforces the Matthew effect. The results of our study confirm what Leydesdorff et al. (2019) expected: the hα index reinforces the Matthew effect. This effect can be intensified if strategic behavior of the publishing scientists and cumulative advantage effects are additionally considered in the simulation.

... Only occasionally does an S-shape occur, while in our simulations a concave increase is very rare" (p. 410).Ionescu and Chopard (2013) published two agent-based models which refer to performance measurements of single scientists and a group of scientists (see alsoŻogała-Siudem, Siudem, Cena, & Gagolewski, 2016). They studied, for example, what happens when low h-index researchers are removed from a community. ...

Recently, Hirsch (2019a) proposed a new variant of the h index called the $h_\alpha$ index. He formulated as follows: "we define the $h_\alpha$ index of a scientist as the number of papers in the h-core of the scientist (i.e. the set of papers that contribute to the h-index of the scientist) where this scientist is the $\alpha$-author" (p. 673). The $h_\alpha$ index was criticized by Leydesdorff, Bornmann, and Opthof (2019). One of their most important points is that the index reinforces the Matthew effect in science. We address this point in the current study using a recently developed Stata command (h_index) and R package (hindex), which can be used to simulate h index and $h_\alpha$index applications in research evaluation. The user can investigate under which conditions $h_\alpha$ reinforces the Matthew effect. The results of our study confirm what Leydesdorff et al. (2019) expected: the $h_\alpha$ index reinforces the Matthew effect. This effect can be intensified if strategic behavior of the publishing scientists and cumulative advantage effects are additionally considered in the simulation.

We study an agent-based model for generating citation distributions in complex networks of scientific papers, where a fraction of citations is allotted according to the preferential attachment rule (rich get richer) and the remainder is allocated accidentally (purely at random, uniformly). Previously, we derived and analysed such a process in the context of describing individual authors, but now we apply it to scientific journals in computer and information sciences. Based on the large DBLP dataset as well as the CORE (Computing Research and Education Association of Australasia) journal ranking, we find that the impact of journals is correlated with the degree of accidentality of their citation distribution. Citations to impactful journals tend to be more preferential, while citations to lower-ranked journals are distributed in a more accidental manner. Further, applied fields of research such as artificial intelligence seem to be driven by a stronger preferential component – and hence have a higher degree of inequality – than the more theoretical ones, e.g., mathematics and computation theory.

We consider a version of the D. Price’s model for the growth of a bibliographic network, where in each iteration, a constant number of citations is randomly allocated according to a weighted combination of the accidental (uniformly distributed) and the preferential (rich-get-richer) rule. Instead of relying on the typical master equation approach, we formulate and solve this problem in terms of the rank–size distribution. We show that, asymptotically, such a process leads to a Pareto-type 2 distribution with a new, appealingly interpretable parametrisation. We prove that the solution to the Price model expressed in terms of the rank–size distribution coincides with the expected values of order statistics in an independent Paretian sample. An empirical analysis of a large repository of academic papers yields a good fit not only in the tail of the distribution (as it is usually the case in the power law-like framework), but also across a significantly larger fraction of the data domain.

There are many approaches to the modelling of citation vectors of individual authors. Models may serve different purposes, but usually they are evaluated with regards to how well they align to citation distributions in large networks of papers. Here we compare a few leading models in terms of their ability to correctly reproduce the values of selected bibliometric indices of individual authors. Our recently-proposed three-dimensional model of scientific impact serves this purpose equally well as the discrete generalised beta distribution and the log-normal models, but has fewer parameters which additionally are all easy to interpret. We also indicate which indices can be predicted with high accuracy and which are more difficult to model.

The growing popularity of bibliometric indexes (whose most famous example is the h index by J. E. Hirsch [J. E. Hirsch, Proc. Natl. Acad. Sci. U.S.A. 102, 16569–16572 (2005)]) is opposed by those claiming that one’s scientific impact cannot be reduced to a single number. Some even believe that our complex reality fails to submit to any quantitative description. We argue that neither of the two controversial extremes is true. By assuming that some citations are distributed according to the rich get richer rule (success breeds success, preferential attachment) while some others are assigned totally at random (all in all, a paper needs a bibliography), we have crafted a model that accurately summarizes citation records with merely three easily interpretable parameters: productivity, total impact, and how lucky an author has been so far.

There is a mutual resemblance between the behavior of users of the Stack Exchange and the dynamics of the citations accumulation process in the scientific community, which enabled us to tackle the outwardly intractable problem of assessing the impact of introducing “negative” citations. Although the most frequent reason to cite an article is to highlight the connection between the 2 publications, researchers sometimes mention an earlier work to cast a negative light. While computing citation‐based scores, for instance, the h‐index, information about the reason why an article was mentioned is neglected. Therefore, it can be questioned whether these indices describe scientific achievements accurately. In this article we shed insight into the problem of “negative” citations, analyzing data from Stack Exchange and, to draw more universal conclusions, we derive an approximation of citations scores. Here we show that the quantified influence of introducing negative citations is of lesser importance and that they could be used as an indicator of where the attention of the scientific community is allocated.