Content uploaded by Leonardo Campanelli
Author content
All content in this area was uploaded by Leonardo Campanelli on Sep 25, 2022
Content may be subject to copyright.
Monkeypox Obeys the (Benford’s) Law: A Dynamic Analysis
of Daily Case Counts in the United States of America
Leonardo Campanelli1
leonardo.s.campanelli@gmail.com
1All Saints University School of Medicine, 5145 Steeles Ave., M9L 1R5, Toronto (ON), Canada
(Dated: September 25, 2022)
Abstract. We analyze the first-digit distribution of the Monkeypox daily cases in the United States
of America, from May 17 to September 21, 2022. As expected for the spread of infectious diseases,
the overall data follow Benford’s law. Moreover, we find that the temporal series of daily cases
conforms to the Benford’s distribution to an exceptionally high confidence level.
1. Introduction
Monkeypox is a viral zoonotic infectious disease caused
by a virus in the genus Orthopoxvirus. An ongoing out-
break started on May 6, 2022 in London, United King-
dom. From May 18 onwards, cases were reported world-
wide in more than about 100 countries. This is the
first time Monkeypox has spread outside Central (Congo
Basin Clade) and West Africa (West African Clade),
where the disease is endemic (WHO, 2022).
There is evidence that the spread of infectious dis-
eases conforms to Benford’s law. Indeed, Sambridge et
al. (2010) found that the total numbers of cases of 18
infectious diseases reported to the World Health Orga-
nization (WHO) by 193 countries worldwide in 2007 fol-
low a Benford’s distribution. Recently, Benford’s law has
been applied to the study of Covid-19 data, in particular
to daily, weekly, and cumulative case and death counts
of various countries [see, e.g., Sambridge and Jackson
(2020), Farhadi (2021), and Campanelli (2022a).] The
general result is that the Benford’s distribution well de-
scribes the first-digit distributions of Covid-19 data for
most of the countries and, then, it can be used to flag
“anomalies” in the data of specific countries.
Benford’s law (Benford, 1938) is an empirical statisti-
cal law according to which the probability PB(d) of oc-
currence of the first significant digit din “particular”
data sets is
PB(d) = log1 + 1
d.(1)
Although it is now know that some distributions satisfy
Benford’s law [see, e.g., Morrow (2014) and references
therein] and that particular principles lead to the emer-
gence of the Benford’s phenomenon in data (Hill, 1995a,
1995b, and 1995c), no general criteria has be found that
fully explain when and why Benford’s law holds for a
“generic” set of data.
Although much work is still needed to understand the
theoretical basis of the law, the number of its applica-
tions has grown in the last few decades [for theoretical
insights and general applications of Benford’s law, see
Miller (2015)]. Probably, the most famous applications
are to detecting tax (Nigrini, 1996), campaign finance
(Cho and Gaines, 2007), and election (Roukema, 2013)
frauds. Other interesting applications are in image pro-
cessing (P´erez-Gonz´alez et al., 2007), where Benford’s
law can be used to test whether or not the image has
been compressed, in natural sciences, where the law has
been shown to hold for geophysical observables such as
the depths of earthquakes (Sambridge et al., 2010), and in
cryptology, where it can be used to examine the truthful-
ness of undeciphered numerical codes (Wase, 2021, Cam-
panelli, 2022b).
The aim of this paper is to assert if the data relative
to the Monkeypox daily counts in the United States of
America (USA) comply or not with Benford’s law.
2. Analysis
It is well known that the compliance of data sets to Ben-
ford’s law improves as the range of the data increases.
Daily and cumulative death cases by country are then
not appropriate when checking for the compliance of the
Monkeypox first-digit distributions to Benford’s law be-
cause there have been only few tens of deaths worldwide
since the start of the outbreak (WHO, 2022). Another
possibility would be the use of cumulative confirmed case
counts. The disadvantage of using this type of data is
that as cumulative case numbers begin to flatten (e.g.,
after a Monkeypox “wave” has passed), first digits tend
to become all the same, thus distorting relative digit fre-
quencies. In order to overcome this problem, we will
only analyze the data on daily confirmed cases by coun-
try. However, the only country with daily case numbers
which extend on a statistically appreciable range is the
USA: Here, the data cover about three orders of magni-
tude, while in all the other countries affected by Monkey-
pox they extend at most on two (WHO, 2022). Accord-
ingly, we will focus our analysis on the daily case counts
from the USA.
Overall analysis. The most common tests in use for
testing whether a numerical sample satisfies Benford’s
law are the Pearson’s χ2and Kolmogorov-Smirnov tests.
Although being general tests in that they can be used to
quantify the conformance of data sets to any theoretical
2
TABLE I: The Euclidean distance d∗
Nin Eq. (2) and its corresponding pvalue for the first-digit distribution of the Monkeypox
daily case counts in the USA. Also indicated are the range of cases, [min,max], and the number of days, N. Counts are from
the CDC (2022) and are updated to September 21, 2022. The last three columns show the reduced χ2score, χ2
red =χ2/ν, the
number νof degrees of freedom, and the pvalue, p(χ2), of the χ2statistic defined in Eq. (3).
Range N d∗
Np χ2
red ν p(χ2)
[1,916] 125 1.0031 0.284 0.5462 76 0.9996
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.05
0.10
0.15
0.20
0.25
0.30
0.35
fHdL
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
60
80
100
120
0.0
0.2
0.4
0.6
0.8
1.0
n
p
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
60
80
100
120
0.4
0.6
0.8
1.0
1.2
n
dn
*
FIG. 1: Left panel. Observed first-digit frequencies of the Monkeypox daily case counts in the USA. The (blue) continuous line
represents Benford’s law. Middle panel. pvalues of the Euclidean distance statistic d∗
nas a function of the number of data
points n(number of days). The (blue) dashed line is p= 0.10. Right panel. The Euclidean distance statistic as a function of n.
The (blue) continuous line represents the expected value of d∗
nfor a Benford’s distribution, while the (blue) dashed lines show
the corresponding one-sigma interval.
distribution, they are generally conservative for testing
Benford’s law [see, e.g., Morrow (2014)]. The most ap-
propriate test for checking Benford’s law in data is the
“Euclidean distance test”, which has been recently pro-
posed by the author to specifically quantify the goodness
of fit of a data sample to Benford’s law (Campanelli,
2022c).
The Euclidean distance test is based on the Euclidean
distance estimator d∗
N, first introduced by Cho and
Gaines (2007) and then analyzed by Morrow (2014),
d∗
N=v
u
u
tN
9
X
d=1
[P(d)−PB(d)]2,(2)
where P(d) is the observed first-digit frequency distribu-
tion of a sample of size N. The (empirical) cumulative
distribution function (CDF) of the Euclidean distance
statistic found by the author (Campanelli, 2022c) allow
us to evaluate pvalues as p= 1 −CDF(d∗
N).
Data of the 2022 USA Monkeypox outbreak are from
the Centers for Disease Control and Prevention (CDC,
2022) and are updated to September 21, 2022. They are
the confirmed daily cases reported to the CDC since May
17, 2022, the start of the response to the current out-
break. They include either the positive laboratory test
report date, CDC call center reporting date, or the case
data entry date into CDC’s emergency response common
operating platform.
In Tab. I, we show the range of daily cases, [min,max],
the number of days, N, the Euclidean distance, d∗
N, and
the corresponding pvalue. In the left panel of Fig. 1,
instead, we show the observed first-digit frequency dis-
tribution of daily case counts superimposed to Benford’s
law. As it is clear from the table and figure, the data
comply with Benford’s law at a high level of significance.
Dynamic analysis. Since the daily counts relative to
Monkeypox, as well as to any other infectious disease,
evolve in time, it is interesting and statistically befitting
to quantify the deviation of the timeline of those counts
from Benford’s law. Indeed, a dynamic data analysis of
the chronology of the counts better captures the statisti-
cal properties of the spread of a disease.
Such a dynamic analysis can be performed by consid-
ering the following χ2statistic:
χ2=
N
X
n=N0d∗
n−d∗
n
σn2
.(3)
Here, d∗
nand σnare the expected value and standard
deviation of the Euclidean distance statistic for the Ben-
ford’s distribution (Campanelli, 2022c), while d∗
nis the
value of the observed Euclidean distance statistic for n
data points (the ordinal number of days in our case).
As already noticed, the compliance to Benford’s law
improves as the range of the data increases. For this
reason, we let the sum in Eq. (3) to begin from N0, the
day starting from which the data range extends at least
3
on two orders of magnitude (in the case at hand, N0=
50). The number νof degrees of freedom for the χ2
statistic is then ν=N−N0.
In the middle and right panels of Fig. 1 we show, re-
spectively, the pvalues and scores of the Euclidean dis-
tance statistic d∗
nas a function of n. The (blue) continu-
ous line in the right panel represents the expected value
of d∗
nfor a Benford’s distribution, while the (blue) dashed
lines show the corresponding one-sigma interval.
As it is clear from the figure, the null hypothesis of
conformance to Benford’s law can never be rejected at
a 10% level of significance. Moreover, the values of the
observed d∗
nare relatively close to the ones expected for a
Benford’s distribution. This closeness can be quantified
by using Eq. (3), which gives χ2= 41.51 for 76 degrees
of freedom. This corresponds to a reduced χ2score as
low as χ2
red =χ2/ν = 0.5462 and to a pvalue as large
as p(χ2)=0.9996. These values for the reduced χ2and
pvalues, reported in Tab. I for convenience, show that
the temporal series of the Monkeypox daily case counts
in the USA conforms to the Benford’s distribution to a
very high significance level.
3. Conclusions
We analyzed the first-digit distribution of the daily case
counts for the 2022 Monkeypox outbreak in the USA. In
order to test the null hypothesis – conformance to the
Benford’s distribution – we used the “Euclidean distance
test”, which has been proposed by the author elsewhere
to specifically quantify the goodness of fit of a data sam-
ple to Benford’s law.
Our results are consistent with the results about the
first-digit distribution of case counts for other infectious
diseases, such as Covid 19, according to which such a dis-
tribution follows Benford’s law. In particular, the tem-
poral series of the Monkeypox daily cases in the USA
conforms to the Benford’s distribution to a remarkably
high significance level of about 99.96%.
References
Benford F. (1938). The Law of Anomalous Numbers. Pro-
ceedings of the American Physical Society 78: 551-572.
Campanelli L. (2022a). Breaking Benford’s law: A statistical
analysis of Covid-19 data using the Euclidean distance statis-
tic. To appear in Statistics in Transition new series.
Campanelli L. (2022b). A Statistical Cryptanalysis of the Beale
Ciphers. To appear in Cryptologia.
Campanelli L. (2022c). On the Euclidean Distance Statistic of
Benford’s Law. Communications in Statistics - Theory and
Methods. DOI: 10.1080/03610926.2022.2082480.
CDC (2022). U.S. Monkeypox Case Trends Reported to CDC.
https://www.cdc.gov/poxvirus/monkeypox/response/2022/
(accessed on 2022-09-24).
Cho W. K. T., Gaines B. J. (2007). Breaking the (Benford)
Law: Statistical Fraud Detection in Campaign Finance. Am.
Stat. 61: 218-223.
Farhadi N. (2021). Can we rely on COVID-19 data? An assess-
ment of data from over 200 countries worldwide. Sci. Prog.
104: 1-19.
Hill T. P. (1995a). The significant-digit phenomenon. Am.
Math. Mon. 102: 322-327.
Hill T. P. (1995b). Base-invariance implies Benford’s law. Proc.
Am. Math. Soc. 123: 887-895.
Hill T. P. (1995c). A statistical derivation of the significant-
digit law. Stat. Sci. 10: 354-363.
Miller S. J. (ed.) (2015). Benford’s Law: Theory and Applica-
tions. Princeton University Press. Princeton.
Morrow J. (2014). Benford’s Law, Families of Distributions and
a Test Basis. Centre for Economic Performance. London.
Nigrini M. (1996). A taxpayer compliance application of Ben-
ford’s law. Journal of the American Taxation Association 18:
72-91.
P´erez-Gonz´alez, F., Abdallah, C. T., Heileman, G. L. (2007).
Benford’s Law in Image Processing. IEEE International Con-
ference on Image Processing, 405–408.
Roukema, B. F. (2013). A first-digit anomaly in the 2009 Ira-
nian presidential election. J. Appl. Stat. 41: 1, 164-199.
Sambridge M., Jackson A. (2020). National COVID numbers -
Benford’s law looks for errors. Nature 581: 384.
Sambridge M., Tkalˇci´c H., Jackson A. (2010). Benford’s law in
the natural sciences. Geophys. Res. Lett. 37: L22301.
Wase V. (2021). Benford’s law in the Beale ciphers. Cryptologia
45: 3, 282-286.
WHO (2022). https://www.who.int/emergencies/situations/
monkeypox-outbreak-2022 (accessed on 2022-09-24).