PreprintPDF Available

Breaking Benford's law: A statistical analysis of Covid-19 data using the Euclidean distance statistic

Authors:
  • All Saints University School of Medicine
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

An extended version of this paper has been accepted for publication in Statistics in Transition. Main results unchanged.
Content may be subject to copyright.
Breaking Benford’s law: A statistical analysis of Covid-19 data
using the Euclidean distance statistic
Leonardo Campanelli1
1All Saints University School of Medicine, 5145 Steeles Ave., Toronto (ON), Canada
(Dated: October 26, 2022)
Using the Euclidean distance statistical test of Benford’s law, we analyze the Covid-19 weekly
case counts by country. While 62% of the 100 countries and territories considered in the present
study conforms to Benford’s law at a significant level α= 0.05 and 17% at a significant level
0.01 α < 0.05, the remaining 21% shows a deviation from it (pvalues smaller than 0.01). In
particular, 5% of countries “breaks” Benford’s law with a pvalue smaller than 0.001.
I. INTRODUCTION
At the end of the 19th century, Newcomb [1] noticed that the first-digit distribution of logarithms were not uniform,
as one would expect, but rather followed the rule
PB(d) = log1 + 1
d,(1)
where PB(d) is the probability of the first significant digit d. About 60 years later, Benford [2] rediscovered Newcomb’s
rule (hereafter Benford’s law), extended the law to arbitrary logarithmic bases and to multiple digits, and successfully
tested the law against 20 very different data sets, like physical constants, deaths rates, populations of cities, length of
rivers, etc.
Although it is now known that some distributions satisfy Benford’s law (see [3] and references therein) and that
particular principles lead to the emergence of the Benford phenomenon in data [4], no general criteria has be found
that fully explain when and why Benford’s law holds for a generic set of data. Compliance to Benford’s law has been
recently tested on very disparate data sets, from natural sciences [5] to the general framework of detecting fraud, as
in payment of taxes [6] and campaign finance [7] (for theoretical insights and general applications of Benford’s law,
see [8]). However, rejection of tests on data whose underlying distribution is not known to follow Benford’s law should
not be used as a tool to uncover error or, more importantly, fraud. This is particularly true for Covid-19 data since
there is no theoretical basis or sufficient empirical evidence that these data follow a Benford distribution.
The first application of Benford’s law to the study of Covid-19 data, in particular to daily and cumulative case and
death counts, is due to Sambridge and Jackson [9], while the most recent work on the “Benfordness” of Covid-19 data
is by Farhadi [10]. Using different statistical tests, the authors of both studies conclude that, in general, Covid-19
data conform to a Benford’s distribution and also indicate “anomalies” in the data of some countries. The results
of these and similar analyses, however, cannot be completely trusted for reasons discussed in Sec. II. Here, we will
describe the statistical approach used to test the compliance of Covid-19 data to Benford’s law and will also present
our results. These are not in complete disagreement with previous results in the literature and clearly show that, in
some countries, Covid-19 “breaks” Benford’s law.
II. METHOD AND RESULTS
It is well known that the compliance of data sets to Benford’s law improves as the range of the data increases.
Daily confirmed cases and daily death cases are then not appropriate when checking for the compliance of Covid-19
first-digit distributions to Benford’s law because they typically extend over very few orders of magnitude. Another
possibility would be the use of cumulative data. The disadvantage of using this type of data is that as cumulative
cases numbers begin to flatten (especially after a Covid-19 “wave” has passed), first digits tend to become all the
same, thus distorting relative digit frequencies. In order to overcome the above problems for Covid-19 data, we will
only analyze the data on weekly confirmed cases by country: they extend, at least for about 45.0% of countries, over
4 order of magnitudes, and do not flatten.
The most common tests in use for testing whether an observed sample of size Nsatisfies Benford’s law are the
Pearson’s χ2, Kolmogorov-Smirnov, and Kuiper tests. However, such tests are based on the null hypothesis of a
continuous distribution, and are generally conservative for testing discrete distributions as the Benford’s one [11].
2
62%
17%
16%
5%
FIG. 1: Percentages of countries in a given range of pvalues of the Euclidean distance statistic for the first-digit distribution of
Covid-19 weekly case counts by country: from top and clockwise, p0.05 (green), 0.01 p < 0.05 (yellow), 0.001 p < 0.01
(red), and p < 0.001 (purple).
This problem can be overcome if one uses the results by Morrow [3] who has recently found asymptotically valid test
values for these statistics under the specific null hypothesis that Benford’s law holds.
Other tests have been recently proposed, based on new statistics such as the “max” statistic, m, introduced by
Leemis et al. [12], and the “normalized Euclidean distance” statistic, d, introduced by Cho and Gaines [7]. At the
moment of their introduction, however, the properties of the corresponding estimators were not well understood and
no test values were reported. These problems were solved by Morrow [3], who provided asymptotically test values for
those statistics too.
Recently enough [13], we have found, by means of Monte Carlo simulations, the (empirical) cumulative distribution
function (CDF) of the “Euclidean distance” statistic, d
N, which is based on the statistic dand was introduced by
Morrow. It is defined as [3]
d
N=v
u
u
tN
9
X
d=1
[P(d)PB(d)]2,(2)
where P(d) is the observed first-digit frequency distribution. 1
In the following, we will use this statistic to study the first-digit distribution of Covid-19 weekly case counts
by country since this is the only statistic, among the ones discussed before and analyzed by Morrow, with known
distribution. In particular, we will use its CDF to evaluate pvalues as p= 1 CDF(d
N).
Data are from the World Health Organization (WHO) [16] and updated to December 20, 2021 (two years from the
start of the pandemic). Of the 222 countries and territories affected by Covid 19, only 100 have Covid-19 weekly case
counts with range spanning 4, or more, orders of magnitude. These countries and territories are shown in Tab. I and
grouped in six different regions [16]. Also shown is the range of weekly cases, [Nmin, Nmax ], the number of weeks, N,
the Euclidean distance, d
N, and the corresponding pvalue. Notice that the CDF of d
N, and then the pvalues, are
reliable up to the second decimal place if 0.28 < d
N<1.85 and up to the third decimal place otherwise [13]. In the first
case, the uncertainty on pis ±0.001, while in second case is ±0.0001. In Tab. I, the last digits in parentheses refer to
these errors. For example, p= 0.27(4) stands for p= 0.274 ±0.001, while p= 0.000(2) stands for p= 0.0002 ±0.0001.
As shown in Fig. 1, while the great majority of countries (79%) conform to Benford’s law (p0.01), 5% of them
show a large deviation from it, having pvalues smaller than 0.001. 2
In Fig. 2, we show the observed first-digit frequency distributions of weekly case counts for 15 selected countries
superimposed to Benford’s law. Represented countries are China (where the pandemic started), the United States of
America (with the largest total number of cases), India (with the largest range of weekly case counts, Nmax /Nmin),
1The dstatistic is defined as d=qP9
d=1 [P(d)PB(d)]2/D, where D=qP8
d=1 P2
B(d)+[P(9) 1]2'1.03631 is a normalization
factor that assures that the normalized Euclidean distance is bounded by 0 and 1. A measure of fit to check concordance with Benford’s
law has been proposed by Goodman [14]. His “rule of thumb”, which has been used in the literature (see, e.g., [15]), but whose statistical
validity has been criticized in [13], is that compliance to Benford’s law occurs when d0.25.
2It is worth observing that the use of the Cho-Gaines’ normalized Euclidean distance dtogether with Goodman’s rule-of-thumb for
compliance to Benford’ law would give a highly questionable compliance to Benford’s law for all countries excepted Honduras, for which
d= 0.260, and Tanzania, with d= 0.251.
3
Tanzania (with the smallest sample size N), Mauritius (with the smallest total number of cases), Algeria (with the
smallest range of weekly case counts), Vietnam, Thailand, and Poland (the outliers in the first box plot of Fig. 4 with
the world largest pvalues), Honduras, Qatar, Belarus, Cuba, and Egypt (with the smallest pvalues, p < 0.001), and
Canada (with the smallest pvalue in the interval 0.001 p < 0.01). It is worth noticing that, although the first six
countries in Fig. 1 have very disparate statistical properties (such as sample size, total number of cases, and range of
weekly cases), they all conform to Benford’s law at a significant level of 0.01 (excluding Mauritius and Algeria, the
other four countries conform to Benford’s law at a significant level of 0.05).
In Fig. 3, we show the percentages of countries in a given range of pvalues, as in Fig. 1, this time grouped in six
different regions of the world [16]: Africa, Americas, Eastern Mediterranean, Europe, South-Est Asia, and Western
Pacific. As it is clear from the pie charts, Africa conforms very well to Benford’s law, all countries in this region having
pvalues larger than 0.01. Also, South-Est Asian and Western Pacific countries conform well to Benford’s law, the
only countries with a pvalue less than 0.01 being Maldives (for South-Est Asia), and Philippines and Australia (for
Western Pacific). Countries in Americas (Eastern Mediterranean), instead, show the largest deviation from Benford’s
law: only about 41% (53%) of them have pvalues bigger than 0.05, while about 12% (13%) have pvalues below 0.001.
In Fig. 4, we present box-and-whisker plots for the pvalues of all 100 countries and territories analyzed in this
study and countries in the six different regions of the world. All distributions are positively skewed, with medians well
below 0.5. This indicates that the first-digit distribution of Covid-19 weekly case counts by country deviates somehow
from Benford’s law on a “global” scale. 3Such a deviation is, however, to be expected for the reasons explained
in [13]. Indeed, Benford’s law does not represent a true law of numbers: some distributions can be “close” to but not
exactly Benford’s, and this regardless of data quality; also, Benford’s law emerges in the limit of infinite range of the
underlying distribution, condition which is never realized in practice.
Our conclusion is twofold: since conformity to Benford’s law cannot be rejected at a significant level of 0.01 for most
of the countries (79%), the first-digit distribution of Covid-19 weekly case counts by country follows Benford’s law
and then can be used to detect possible “anomalies” in Covid-19 count data. In our case, data from Canada, Jordan,
Puerto Rico, Greece, Philippines, Belgium, Tunisia, Latvia, Paraguay, Sweden, Guatemala, Pakistan, Kazakhstan,
Maldives, Australia, and Russia show a possible anomalous behaviour (0.001 p < 0.01), while anomalies are
certainly present in the data of Honduras, Qatar, Belarus, Cuba, and Egypt (p < 0.001).
III. CONCLUSIONS
We have analyzed the Covid-19 weekly case counts by country, as provided by the World Health Organization,
updated to December 20, 2021. We worked under the null hypothesis that the first-digit distribution of those counts
follows a Benford’s distribution. The choice of weekly confirmed cases instead of daily ones came from the requirement
of having counts that extended over many order of magnitudes so to improve the compliance of the data sets to
Benford’s law. For the same reason we did not consider daily and weekly death counts. Also, cumulative cases were
not considered as their numbers flatten (especially at the end of a “wave”), thus distorting relative digit frequencies.
Out of the 222 countries affected by Covid 19, we considered only those ones with weekly counts spanning at least 4
orders of magnitude. This choice reduced the study to the analysis of the data from 100 countries and territories. In
order to test the null hypothesis, we used the Euclidean distance test introduced in [3] and developed in [13], which
avoids the specific problems introduced by other statistical tests.
Our analysis shows that the majority of countries (62%) conforms to Benford’s law at a significant level of 0.05.
However, 5% of countries (Honduras, Qatar, Belarus, Cuba, and Egypt) “break” Benford’s law with pvalues smaller
than 0.001.
Electronic address: leonardo.s.campanelli@gmail.com
[1] S. Newcomb, “Note on the frequency of use of different digits in natural numbers,“ Am. J. Math. 4, 39 (1881).
3Such a deviation can be quantified by a Kolmogorov-Smirnov (KS) statistical test for the distribution of pvalues, whose CDF is
CDF(p) = p. The values (degrees of freedom) of the KS statistic for all countries and the ones in the six regions are 0.4295 (100),
0.4487 (13), 0.6165 (17), 0.6567 (15), 0.3379 (38), 0.5208 (8), and 0.3639 (9), respectively. Accordingly, conformance to Benford’s law is
rejected at a significant level of 0.001 [17] in the case of all countries, and countries in Americas, Eastern Mediterranean, and Europe.
It is rejected at a significant level of 0.01 for African countries. It is not rejected at a significant level of 0.01 for South-East Asian
countries, and it is not rejected at a significant level larger than 0.20 for the case of Western Pacific countries.
4
[2] F. Benford, “The Law of Anomalous Numbers,” Proceedings of the American Physical Society 78, 551 (1938).
[3] J. Morrow, “Benford’s Law, Families of Distributions and a Test Basis” (Centre for Economic Performance, London, 2014).
[4] T. P. Hill, “The significant-digit phenomenon,” Am. Math. Mon. 102, 322 (1995); “Base-invariance implies Benford’s law,”
Proc. Am. Math. Soc. 123, 887 (1995); “A statistical derivation of the significant-digit law”, Stat. Sci. 10, 354 (1995).
[5] M. Sambridge, H. Tkalˇci´c, and A. Jackson, “Benford’s law in the natural sciences,” Geophys ˙
Res. Lett. 37 L22301 (2010).
[6] M. Nigrini, “A taxpayer compliance application of Benford’s law,” Journal of the American Taxation Association 18, 72
(1996).
[7] W. K. T. Cho and B. J. Gaines, “Breaking the (Benford) Law: Statistical Fraud Detection in Campaign Finance,” Am.
Stat. 61, 218 (2007).
[8] S. J. Miller (ed.), “Benford’s Law: Theory and Applications,” (Princeton University Press, Princeton, 2015).
[9] M. Sambridge and A. Jackson, “National COVID numbers - Benford’s law looks for errors,” Nature 581, 384 (2020).
[10] N. Farhadi, “Can we rely on COVID-19 data? An assessment od data from over 200 countries worldwide,” Sci. Prog. 104,
1 (2021).
[11] G. E. Noether, “Note on the Kolmogorov statistic in the discrete case,” Metrika 7, 115 (1963)
[12] L. M. Leemis, B. W. Schmeiser, and D. L. Evans, (2000), “Survival Distributions Satisfying Benford’s Law,” Am. Stat.
54, 236 (2000).
[13] L. Campanelli, “On the Euclidean Distance Statistic of Benford’s Law,” submitted to Communications in Statistics -
Theory and Methods.
[14] W. Goodman, “The promises and pitfalls of Benford’s law,” Significance 13, 38 (2016).
[15] A. Wei and A. E. Vellwock, “Is COVID-19 data reliable? A statistical analysis with Benford’s law.”
[16] www.covid19.who.int
[17] S. Facchinetti, “A procedure to find exact critical values of Kolmogorov-Smirnov test,” Ital. J. Appl. Stat. 21, 337 (2009).
5
TABLE I: The Euclidean distance d
Nin Eq. (2) and its corresponding pvalue for the first-digit distribution of Covid-19 weekly
case counts for 100 countries. Also indicated is the range of cases, [Nmin , Nmax ], and the number of weeks, N. Counts are from
WHO [16] and are updated to December 20, 2021. (Digits in parentheses indicate a statistical error on those digits of ±1).
Country Range N d
Np
Africa
Algeria [5,10524] 96 1.4079 0.02(9)
Botswana [1,15884] 87 1.0374 0.24(3)
Ethiopia [3,19940] 94 0.8937 0.43(8)
Kenya [3,19023] 94 1.2771 0.06(7)
Mauritius [1,10258] 80 1.4535 0.02(1)
Mozambique [2,13268] 92 1.1051 0.17(5)
Namibia [1,12944] 89 1.1731 0.12(2)
Nigeria [1,12531] 95 0.6236 0.84(9)
South Africa [7,162987] 95 1.5317 0.01(2)
Tanzania [4,24307] 23 1.2457 0.08(0)
Uganda [1,22511] 90 0.7271 0.70(8)
Zambia [1,19058] 93 1.3057 0.05(7)
Zimbabwe [1,26671] 93 0.9221 0.39(5)
Americas
Argentina [16,219910] 95 1.5532 0.01(1)
Brazil [6,533024] 96 1.2301 0.08(9)
Bolivia [7,19834] 94 1.0026 0.28(4)
Canada [2,60784] 100 1.8364 0.00(1)
Colombia [5,204556] 95 1.3949 0.03(2)
Costa Rica [9,17469] 95 1.3167 0.05(3)
Cuba [8,64196] 94 2.0674 0.000(1)
Dominican Republic [4,11168] 95 1.3509 0.04(3)
Ecuador [5,14597] 95 1.3332 0.04(8)
Guatemala [5,26678] 94 1.6470 0.00(5)
Honduras [6,10595] 94 2.6172 0.000(0)
Mexico [5,128779] 96 1.1353 0.15(0)
Paraguay [5,20955] 95 1.6844 0.00(4)
Peru [9,60739] 95 1.0690 0.20(9)
Puerto Rico [7,32162] 93 1.7721 0.00(2)
Uruguay [6,26378] 94 0.8801 0.46(0)
U.S.A. [12,1745361] 101 0.7242 0.71(2)
Eastern Mediterranean
Afghanistan [3,12314] 96 1.2214 0.09(3)
Egypt [5,10778] 96 1.9710 0.000(4)
Iran [47,269975] 97 1.3850 0.03(4)
Iraq [2,83098] 96 1.2867 0.06(4)
Jordan [5,57666] 95 1.7989 0.00(1)
Lebanon [5,33605] 97 1.1526 0.13(7)
Libya [1,19510] 92 1.4154 0.02(8)
Morocco [6,64784] 95 1.1256 0.15(8)
Oman [6,17783] 96 1.0093 0.27(6)
Pakistan [2,40287] 95 1.6157 0.00(6)
Palestine [8,17509] 96 1.0512 0.22(8)
Qatar [7,13049] 96 2.4137 0.000(0)
Saudi Arabia [5,30925] 95 1.2266 0.09(1)
Tunisia [5,52076] 95 1.7322 0.00(2)
U.A.E [2,26285] 100 0.9135 0.40(8)
Europe
Armenia [1,14417] 95 0.7368 0.69(3)
Austria [8,96094] 96 0.8381 0.52(8)
Azerbaijan [2,29155] 96 0.6744 0.78(5)
Belarus [1,14213] 96 2.2927 0.000(0)
Country Range N d
Np
Belgium [1,125246] 96 1.7387 0.00(2)
Bosnia and Herzegovina [2,11122] 95 0.6642 0.79(9)
Bulgaria [2,32962] 95 1.4023 0.03(0)
Croatia [1,37433] 96 0.6771 0.78(1)
Czechia [27,127489] 95 0.9291 0.38(5)
Denmark [3,78981] 96 1.4868 0.01(7)
Estonia [1,11930] 96 1.4258 0.02(6)
Finland [1,16510] 98 0.7175 0.72(3)
France [1,504469] 100 0.8111 0.57(2)
Georgia [3,33665] 96 0.9460 0.36(0)
Germany [2,406754] 99 0.8982 0.43(1)
Greece [7,47411] 96 1.7715 0.00(2)
Hungary [7,70400] 95 1.1028 0.17(7)
Ireland [1,53846] 96 1.0170 0.26(7)
Israel [1,65917] 97 0.7419 0.68(5)
Italy [3,257579] 98 1.3064 0.05(6)
Kazakhstan [6,56120] 94 1.5923 0.00(8)
Latvia [3,16957] 95 1.6877 0.00(4)
Lithuania [1,20730] 96 0.7973 0.59(5)
Moldova [1,11680] 95 1.4101 0.02(9)
Netherlands [2,156007] 96 1.0607 0.21(8)
Norway [1,33281] 97 1.0084 0.27(7)
Poland [6,192441] 95 0.4811 0.96(3)
Portugal [2,86549] 95 1.4460 0.02(3)
Romania [3,104668] 96 1.1152 0.16(6)
Russia [5,281305] 95 1.5750 0.00(9)
Serbia [1,49995] 95 0.9512 0.35(3)
Slovakia [1,61514] 95 1.2145 0.09(7)
Slovenia [2,22657] 95 1.0019 0.28(5)
Spain [1,245818] 99 0.6382 0.83(2)
Sweden [1,46511] 97 1.6545 0.00(5)
Turkey [6,414312] 94 1.4798 0.01(8)
U.K. [1,683874] 100 1.0711 0.20(7)
Ukraine [1,153131] 95 1.4855 0.01(7)
South-East Asia
Bangladesh [7,99693] 94 1.3715 0.03(7)
India [1,2738957] 97 1.2935 0.06(1)
Indonesia [10,350273] 95 1.2031 0.10(4)
Maldives [1,11401] 94 1.5900 0.00(8)
Myanmar [4,40004] 92 0.8451 0.51(6)
Nepal [4,61814] 91 0.9980 0.29(0)
Sri Lanka [5,41519] 95 1.2816 0.06(6)
Thailand [1,150652] 102 0.4247 0.98(2)
Western Pacific
Australia [3,45560] 100 1.5845 0.00(8)
China [1,31333] 104 0.6523 0.81(4)
Japan [1,156931] 101 1.1777 0.11(9)
Malaysia [3,150933] 100 0.8115 0.57(2)
Mongolia [1,36698] 91 1.0876 0.19(1)
Philippines [1,144991 97 1.7711 0.00(2)
Singapore [4,25950] 101 0.8253 0.54(9)
South Korea [3,47825] 101 1.2551 0.07(7)
Vietnam [1,125955 97 0.4202 0.98(4)
6
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
U.S.A.
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
India
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
Tanzania
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
Mauritius
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
Algeria
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
Vietnam
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
Thailand
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
Poland
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
Honduras
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
Quatar
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
Belarus
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
Cuba
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
Egypt
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.1
0.2
0.3
0.4
0.5
fHdL
Canada
FIG. 2: Observed first-digit frequencies of the Covid-19 weekly case counts for 15 selected countries: China (with the largest
sample size N), USA (with the largest total number of cases), India (with the largest range of weekly case counts), Tanzania
(with the smallest sample size N), Mauritius (with the smallest total number of cases), Algeria (with the smallest range of
weekly case counts), Vietnam, Thailand, and Poland (the outliers in the first box plot of Fig. 4 with the world largest pvalues),
Honduras, Qatar, Belarus, Cuba, and Egypt (with the smallest pvalues, p < 0.001), and Canada (with the smallest pvalue in
the interval 0.001 p < 0.01). The (blue) continuous lines represent Benford’s law.
7
Africa
76.9%
23.1%
Americas
41.2%
23.5%
23.5%
11.8%
Eastern Meditteranean
53.3%
13.4%
20.0%
13.3%
Europe
63.2%
18.4%
15.8%
2.6%
South-East Asia
75.0%
12.5%
12.5%
Western Pacific
77.8%
22.2%
FIG. 3: Percentages of countries in different regions of the world in a given range of pvalues of the Euclidean distance statistic
for the first-digit distribution of Covid-19 weekly case counts by country. Ranges of pvalues in each pie chart are as follows:
from top and clockwise, p0.05 (green), 0.01 p < 0.05 (yellow), 0.001 p < 0.01 (red), and p < 0.001 (purple).
ë
ë
ë
ë
ë
ëë
ëë
World
Africa
Americas
E. Mediterranean
Europe
S.-E. Asia
W. Pacific
0.0
0.2
0.4
0.6
0.8
1.0
p
FIG. 4: Box-and-whisker plots for the pvalues of the Euclidean distance statistic for the first-digit distribution of Covid-19
weekly case counts of all countries and countries in different regions of the world.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
To fight COVID-19, global access to reliable data is vital. Given the rapid acceleration of new cases and the common sense of global urgency, COVID-19 is subject to thorough measurement on a country-by-country basis. The world is witnessing an increasing demand for reliable data and impactful information on the novel disease. Can we trust the data on the COVID-19 spread worldwide? This study aims to assess the reliability of COVID-19 global data as disclosed by local authorities in 202 countries. It is commonly accepted that the frequency distribution of leading digits of COVID-19 data shall comply with Benford’s law. In this context, the author collected and statistically assessed 106,274 records of daily infections, deaths, and tests around the world. The analysis of worldwide data suggests good agreement between theory and reported incidents. Approximately 69% of countries worldwide show some deviations from Benford’s law. The author found that records of daily infections, deaths, and tests from 28% of countries adhered well to the anticipated frequency of first digits. By contrast, six countries disclosed pandemic data that do not comply with the first-digit law. With over 82 million citizens, Germany publishes the most reliable records on the COVID-19 spread. In contrast, the Islamic Republic of Iran provides by far the most non-compliant data. The author concludes that inconsistencies with Benford’s law might be a strong indicator of artificially fabricated data on the spread of SARS-CoV-2 by local authorities. Partially consistent with prior research, the United States, Germany, France, Australia, Japan, and China reveal data that satisfies Benford’s law. Unification of reporting procedures and policies globally could improve the quality of data and thus the fight against the deadly virus.
Book
Full-text available
Benford's law states that the leading digits of many data sets are not uniformly distributed from one through nine, but rather exhibit a profound bias. This bias is evident in everything from electricity bills and street addresses to stock prices, population numbers, mortality rates, and the lengths of rivers. Here, Steven Miller brings together many of the world's leading experts on Benford's law to demonstrate the many useful techniques that arise from the law, show how truly multidisciplinary it is, and encourage collaboration. Beginning with the general theory, the contributors explain the prevalence of the bias, highlighting explanations for when systems should and should not follow Benford's law and how quickly such behavior sets in. They go on to discuss important applications in disciplines ranging from accounting and economics to psychology and the natural sciences. The contributors describe how Benford's law has been successfully used to expose fraud in elections, medical tests, tax filings, and financial reports. Additionally, numerous problems, background materials, and technical details are available online to help instructors create courses around the book. Emphasizing common challenges and techniques across the disciplines, this accessible book shows how Benford's law can serve as a productive meeting ground for researchers and practitioners in diverse fields.
Article
Full-text available
More than 100 years ago it was predicted that the distribution of first digits of real world observations would not be uniform, but instead follow a trend where measurements with lower first digit (1,2,…) occur more frequently than those with higher first digits (…,8,9). This result has long been known but regarded largely as a mathematical curiosity and received little attention in the natural sciences. Here we show that the first digit rule is likely to be a widespread phenomenon and may provide new ways to detect anomalous signals in data. We test 15 sets of modern observations drawn from the fields of physics, astronomy, geophysics, chemistry, engineering and mathematics, and show that Benford's law holds for them all. These include geophysical observables such as the length of time between geomagnetic reversals, depths of earthquakes, models of Earth's gravity, geomagnetic and seismic structure. In addition we find it also holds for other natural science observables such as the rotation frequencies of pulsars; green-house gas emissions, the masses of exoplanets as well as numbers of infectious diseases reported to the World Health Organization. The wide range of areas where it is manifested opens up new possibilities for exploitation. An illustration is given of how seismic energy from an earthquake can be detected from just the first digit distribution of displacement counts on a seismometer, i.e., without actually looking at the details of a seismogram at all. This led to the first ever detection of an earthquake using first digit information alone.
Article
Since the 1990s, a mathematical phenomenon known as Benford’s law has been held aloft as a guard against fraud – as a way to check whether data sets are free from interference. Benford’s law does tell us something interesting about the frequency of leading digits in many natural data sets. But if a data set deviates from Benford’s law, is that evidence that the figures within are fraudulent? Not necessarily. Without an error term (which many articles fail to mention) it is too imprecise to say simply that a data set “does not conform”. To rectify this, this paper presents a concrete, empirical estimate for the phenomenon’s sampling distribution, where it is applicable. Many published test results alleging to have found non-conformance to Benford’s Law in post-hoc examined records, actually report levels of variation that are well within the range of ordinary variation. Available online at: DOI: 10.1111/j.1740-9713.2016.00919.x
Article
The distribution of first significant digits known as Benford's Law has been used to test for erroneous and fraudulent data. By testing for confor-mance with the Law, applied researchers have pinpointed anomalous data using a standard hypothesis testing approach. While novel, there are two weaknesses in this methodology. First, test values used in practice are too conservative once Benford specific values are derived. The new test values of this paper are more powerful and I investigate their small sample properties. Second, testing requires the Null hypothesis of Benford's Law to hold, which often does not for real data. I therefore present a simple method by which all continuous distributions may be transformed to satisfy Benford with arbitrary precision and induce scale invari-ance, one of the properties underlying Benford's Law in the literature. This allows application of Benford tests to arbitrary samples, a hurdle to empirical work. I additionally derive a rate of convergence to Benford's Law. Finally, the theoretical results are applied to commonly used distributions to exhibit when the Law holds within distributional families. The results yield improved tests for Benford's Law applicable to a broader class of data and contribute to understanding occurrences of the Law.
Article