Content uploaded by Andre E. Vellwock
Author content
All content in this area was uploaded by Andre E. Vellwock on Nov 07, 2020
Content may be subject to copyright.
On the Benfordness of academic citations
Andre E. Vellwock and Anran Wei
Benford’s Law is a tool to assess the validity of datasets. Citation numbers are a method to quantify a
researcher's academic achievement, but the legitimacy can be questioned. Benford’s Law can point out data
manipulation in accounting and financial markets; however, its effectiveness in the academic field is unknown.
Our results showed that Benford’s Law is valid for analyzing a list of cited publications, both for individual
researchers and average university values. The Benfordness increases according to the quantity of cited
publications. Nevertheless, for 1000+ publications, a non-traditional distribution appears. This goes against the
Law, and here we do cannot explain the reason for its appearance. This study also shows that the world rank of
a university is not directly correlated to the Benfordness of its publications.
Keywords: Benford’s Law, academic citation, publications, data analysis
Introduction
Researchers are evaluated based on their academic
citations, resulting in factors such as h-index and i10-
index. The validity of these numbers is thus essential to
guarantee a fair performance assessment. Benford’s Law
is a number distribution also applied as a statistical tool
to indicate manipulation in datasets. It is broadly used in
finance (1), accounting (2), and politics (3, 4). Recently,
its applicability has been validated in identifying possible
manipulation in COVID-19 numbers (5). Here we tested
citation numbers against Benford’s Law distribution, for
individual researchers and universities. The proposal is
not to localize possible citation manipulations, but to
evaluate if citation numbers follow the Law.
Results and discussions
The Benfordness of the data is measured by the d*-factor,
the lowest the value the better fit to the Benford’s
distribution. In Figure 1a, the d*-factor of individual
researchers (blue square) and university average values
(dark grey circle) are plotted against the quantity of cited
publications (QCT). A direct correlation between the
variables is evident, with a second-order polynomial fit
(R2 of 0.92) where
The graph shows this hold while the publication numbers
are below 1000. Above, a more random distribution is
present (dashed square). This result goes against the
fundament of Benford’s Law: a larger dataset should
essentially reduce variation and exacerbate Benfordness.
We cannot explain this behavior statistically.
The graph in Figure 1b aims to evaluate if the university
world ranking affects the institution's average d*-factor.
High ranking universities showed the least values of d*,
thus presenting a high Benfordness. Universities in the
100ths position have an increase of d*. This tendency is
rejected by analyzing the institutions with lower-ranking
(501-510th), where the deviation to Benford’s Law is
decreased. For example, the East China Normal
University has a d*-factor in the same range as the
highest institutions. On another hand, Hitotsubashi
University has a higher d*. This large variation for lower-
ranking universities does not let us trace a tendency
between ranking and d*-factor. The result shows that
Figure 1. (a) The influence of the quantity of cited
publications for the d*-factor of individual researchers
and universities. (b) The absence of correlation between
university world ranking and average d*-factor.
QTC is more important than the position in a rank,
regarding Benford’s Law.
Methods
Data selection and assessment
Nine universities were chosen from three ranking levels
according to the QS World University Rankings 2021
(Appendix A). The top ten most cited researchers,
according to Google Scholar, were selected for each
university. The total quantity of studied individual
researchers is 90. With the aid of the software Publish or
Perish 7, a list of publications and their citation numbers
were acquired, the methodology is illustrated in Figure 2.
The software has a limitation of 1000 publications, thus
dfactor= 0.394.04104QCT+ 1.16107QCT2 1
if a researcher had 1000+ cited publications, the ones not
given by software were manually obtained from Google
Scholar. The data acquisition was made from 30 October
2020 to 01 November 2020.
Figure 2. Schematic representing the data acquisition
method for a hypothetical author with hypothetical
publications.
Benford’s Law
In a dataset, isolating the first digit of each number let us
obtain a list with from 1 to 9. Dividing the time one digit
is present by the total quantity of numbers gives us a
fraction. Doing this for 1, 2, 3 …9, we have a frequency
distribution. Benford’s Law states that the fraction of
each first number follows the distribution P(d) with
Listing the number fractions as
d
1
2
3
4
5
6
7
8
9
P(d)
[%]
30.1
17.6
12.5
9.7
7.9
6.7
5.8
5.1
4.6
The Benfordness, thus the deviation from Benford’s Law,
was evaluated by quantifying the d*-factor (5), expressed
as
where d is the first digit from 1 to 9, and P ̃(d) stands for
the real distribution of each first digit in the dataset.
Conclusions
Citation numbers do follow Benford’s Law, increasing
the application of the distribution. The deviation to the
Law is correlated to the quantity of cited publications of
the specific researcher. The correlation obeys a second-
order polynomial fit with a coefficient close to 1.0. For
highly cited individuals, oscillations occur but
fundamentally the d*-factor values are low. We also
showed that the university ranking does not openly
influence the Benfordness. A study taking into
consideration a larger number of institutions is needed for
a more detailed assessment.
References
1. Cho W, Gaines B. Breaking the (Benford) Law:
Statistical Fraud Detection in Campaign Finance. The
American Statistician. 2007;61:218-23.
2. Durtschi C, Hillison WA, Pacini C, editors. The
effective use of Benford's Law to assist in detecting fraud
in accounting data. 2004.
3. Deckert J, Myagkov M, Ordeshook PC.
Benford's Law and the Detection of Election Fraud.
Political Analysis. 2011;19(3):245-68.
4. Beber B, Scacco A. What the numbers say: A
digit-based test for election fraud. Political analysis.
2012;20(2):211-34.
5. Wei A, Vellwock AE. Is COVID-19 data
reliable? A statistical analysis with Benford's Law. 2020.
Appendix A
Table 1. Selected universities and their world ranking
Ranking
University
1
Massachusetts Institute of Technology
2
Stanford University
3
Harvard University
101
Pennsylvania State University
102
Trinity College Dublin
103
Technical University of Denmark
501-510
Christian-Albrechts-University zu Kiel
501-510
East China Normal University
501-510
Hitotsubashi University
=log10+1log10 =log10 1+ 1
, for = 1,,9 1
dfactor =
2
9
=1 1.03606 1