PreprintPDF Available

Monkeypox Obeys the (Benford's) Law: A Dynamic Analysis of Daily Case Counts in the United States of America

Authors:
  • All Saints University School of Medicine
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

We analyze the first-digit distribution of Monkeypox daily cases in the United States of America, from May 17 to September 21, 2022. As expected for the spread of infectious diseases, the overall data follow Benford's law. Moreover, we find that the temporal series of daily cases conforms to the Benford's distribution to an exceptionally high confidence level.
Monkeypox Obeys the (Benford’s) Law: A Dynamic Analysis
of Daily Case Counts in the United States of America
Leonardo Campanelli1
leonardo.s.campanelli@gmail.com
1All Saints University School of Medicine, 5145 Steeles Ave., M9L 1R5, Toronto (ON), Canada
(Dated: September 25, 2022)
Abstract. We analyze the first-digit distribution of the Monkeypox daily cases in the United States
of America, from May 17 to September 21, 2022. As expected for the spread of infectious diseases,
the overall data follow Benford’s law. Moreover, we find that the temporal series of daily cases
conforms to the Benford’s distribution to an exceptionally high confidence level.
1. Introduction
Monkeypox is a viral zoonotic infectious disease caused
by a virus in the genus Orthopoxvirus. An ongoing out-
break started on May 6, 2022 in London, United King-
dom. From May 18 onwards, cases were reported world-
wide in more than about 100 countries. This is the
first time Monkeypox has spread outside Central (Congo
Basin Clade) and West Africa (West African Clade),
where the disease is endemic (WHO, 2022).
There is evidence that the spread of infectious dis-
eases conforms to Benford’s law. Indeed, Sambridge et
al. (2010) found that the total numbers of cases of 18
infectious diseases reported to the World Health Orga-
nization (WHO) by 193 countries worldwide in 2007 fol-
low a Benford’s distribution. Recently, Benford’s law has
been applied to the study of Covid-19 data, in particular
to daily, weekly, and cumulative case and death counts
of various countries [see, e.g., Sambridge and Jackson
(2020), Farhadi (2021), and Campanelli (2022a).] The
general result is that the Benford’s distribution well de-
scribes the first-digit distributions of Covid-19 data for
most of the countries and, then, it can be used to flag
“anomalies” in the data of specific countries.
Benford’s law (Benford, 1938) is an empirical statisti-
cal law according to which the probability PB(d) of oc-
currence of the first significant digit din “particular”
data sets is
PB(d) = log1 + 1
d.(1)
Although it is now know that some distributions satisfy
Benford’s law [see, e.g., Morrow (2014) and references
therein] and that particular principles lead to the emer-
gence of the Benford’s phenomenon in data (Hill, 1995a,
1995b, and 1995c), no general criteria has be found that
fully explain when and why Benford’s law holds for a
“generic” set of data.
Although much work is still needed to understand the
theoretical basis of the law, the number of its applica-
tions has grown in the last few decades [for theoretical
insights and general applications of Benford’s law, see
Miller (2015)]. Probably, the most famous applications
are to detecting tax (Nigrini, 1996), campaign finance
(Cho and Gaines, 2007), and election (Roukema, 2013)
frauds. Other interesting applications are in image pro-
cessing (P´erez-Gonz´alez et al., 2007), where Benford’s
law can be used to test whether or not the image has
been compressed, in natural sciences, where the law has
been shown to hold for geophysical observables such as
the depths of earthquakes (Sambridge et al., 2010), and in
cryptology, where it can be used to examine the truthful-
ness of undeciphered numerical codes (Wase, 2021, Cam-
panelli, 2022b).
The aim of this paper is to assert if the data relative
to the Monkeypox daily counts in the United States of
America (USA) comply or not with Benford’s law.
2. Analysis
It is well known that the compliance of data sets to Ben-
ford’s law improves as the range of the data increases.
Daily and cumulative death cases by country are then
not appropriate when checking for the compliance of the
Monkeypox first-digit distributions to Benford’s law be-
cause there have been only few tens of deaths worldwide
since the start of the outbreak (WHO, 2022). Another
possibility would be the use of cumulative confirmed case
counts. The disadvantage of using this type of data is
that as cumulative case numbers begin to flatten (e.g.,
after a Monkeypox “wave” has passed), first digits tend
to become all the same, thus distorting relative digit fre-
quencies. In order to overcome this problem, we will
only analyze the data on daily confirmed cases by coun-
try. However, the only country with daily case numbers
which extend on a statistically appreciable range is the
USA: Here, the data cover about three orders of magni-
tude, while in all the other countries affected by Monkey-
pox they extend at most on two (WHO, 2022). Accord-
ingly, we will focus our analysis on the daily case counts
from the USA.
Overall analysis. The most common tests in use for
testing whether a numerical sample satisfies Benford’s
law are the Pearson’s χ2and Kolmogorov-Smirnov tests.
Although being general tests in that they can be used to
quantify the conformance of data sets to any theoretical
2
TABLE I: The Euclidean distance d
Nin Eq. (2) and its corresponding pvalue for the first-digit distribution of the Monkeypox
daily case counts in the USA. Also indicated are the range of cases, [min,max], and the number of days, N. Counts are from
the CDC (2022) and are updated to September 21, 2022. The last three columns show the reduced χ2score, χ2
red =χ2, the
number νof degrees of freedom, and the pvalue, p(χ2), of the χ2statistic defined in Eq. (3).
Range N d
Np χ2
red ν p(χ2)
[1,916] 125 1.0031 0.284 0.5462 76 0.9996
æ
æ
æ
æ
æ
æ
æ
æ
æ
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
d
0.05
0.10
0.15
0.20
0.25
0.30
0.35
fHdL
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
60
80
0.0
0.2
0.4
0.6
0.8
1.0
n
p
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
60
80
100
120
0.4
0.6
0.8
1.0
1.2
n
dn
*
FIG. 1: Left panel. Observed first-digit frequencies of the Monkeypox daily case counts in the USA. The (blue) continuous line
represents Benford’s law. Middle panel. pvalues of the Euclidean distance statistic d
nas a function of the number of data
points n(number of days). The (blue) dashed line is p= 0.10. Right panel. The Euclidean distance statistic as a function of n.
The (blue) continuous line represents the expected value of d
nfor a Benford’s distribution, while the (blue) dashed lines show
the corresponding one-sigma interval.
distribution, they are generally conservative for testing
Benford’s law [see, e.g., Morrow (2014)]. The most ap-
propriate test for checking Benford’s law in data is the
“Euclidean distance test”, which has been recently pro-
posed by the author to specifically quantify the goodness
of fit of a data sample to Benford’s law (Campanelli,
2022c).
The Euclidean distance test is based on the Euclidean
distance estimator d
N, first introduced by Cho and
Gaines (2007) and then analyzed by Morrow (2014),
d
N=v
u
u
tN
9
X
d=1
[P(d)PB(d)]2,(2)
where P(d) is the observed first-digit frequency distribu-
tion of a sample of size N. The (empirical) cumulative
distribution function (CDF) of the Euclidean distance
statistic found by the author (Campanelli, 2022c) allow
us to evaluate pvalues as p= 1 CDF(d
N).
Data of the 2022 USA Monkeypox outbreak are from
the Centers for Disease Control and Prevention (CDC,
2022) and are updated to September 21, 2022. They are
the confirmed daily cases reported to the CDC since May
17, 2022, the start of the response to the current out-
break. They include either the positive laboratory test
report date, CDC call center reporting date, or the case
data entry date into CDC’s emergency response common
operating platform.
In Tab. I, we show the range of daily cases, [min,max],
the number of days, N, the Euclidean distance, d
N, and
the corresponding pvalue. In the left panel of Fig. 1,
instead, we show the observed first-digit frequency dis-
tribution of daily case counts superimposed to Benford’s
law. As it is clear from the table and figure, the data
comply with Benford’s law at a high level of significance.
Dynamic analysis. Since the daily counts relative to
Monkeypox, as well as to any other infectious disease,
evolve in time, it is interesting and statistically befitting
to quantify the deviation of the timeline of those counts
from Benford’s law. Indeed, a dynamic data analysis of
the chronology of the counts better captures the statisti-
cal properties of the spread of a disease.
Such a dynamic analysis can be performed by consid-
ering the following χ2statistic:
χ2=
N
X
n=N0d
nd
n
σn2
.(3)
Here, d
nand σnare the expected value and standard
deviation of the Euclidean distance statistic for the Ben-
ford’s distribution (Campanelli, 2022c), while d
nis the
value of the observed Euclidean distance statistic for n
data points (the ordinal number of days in our case).
As already noticed, the compliance to Benford’s law
improves as the range of the data increases. For this
reason, we let the sum in Eq. (3) to begin from N0, the
day starting from which the data range extends at least
3
on two orders of magnitude (in the case at hand, N0=
50). The number νof degrees of freedom for the χ2
statistic is then ν=NN0.
In the middle and right panels of Fig. 1 we show, re-
spectively, the pvalues and scores of the Euclidean dis-
tance statistic d
nas a function of n. The (blue) continu-
ous line in the right panel represents the expected value
of d
nfor a Benford’s distribution, while the (blue) dashed
lines show the corresponding one-sigma interval.
As it is clear from the figure, the null hypothesis of
conformance to Benford’s law can never be rejected at
a 10% level of significance. Moreover, the values of the
observed d
nare relatively close to the ones expected for a
Benford’s distribution. This closeness can be quantified
by using Eq. (3), which gives χ2= 41.51 for 76 degrees
of freedom. This corresponds to a reduced χ2score as
low as χ2
red =χ2 = 0.5462 and to a pvalue as large
as p(χ2)=0.9996. These values for the reduced χ2and
pvalues, reported in Tab. I for convenience, show that
the temporal series of the Monkeypox daily case counts
in the USA conforms to the Benford’s distribution to a
very high significance level.
3. Conclusions
We analyzed the first-digit distribution of the daily case
counts for the 2022 Monkeypox outbreak in the USA. In
order to test the null hypothesis conformance to the
Benford’s distribution we used the “Euclidean distance
test”, which has been proposed by the author elsewhere
to specifically quantify the goodness of fit of a data sam-
ple to Benford’s law.
Our results are consistent with the results about the
first-digit distribution of case counts for other infectious
diseases, such as Covid 19, according to which such a dis-
tribution follows Benford’s law. In particular, the tem-
poral series of the Monkeypox daily cases in the USA
conforms to the Benford’s distribution to a remarkably
high significance level of about 99.96%.
References
Benford F. (1938). The Law of Anomalous Numbers. Pro-
ceedings of the American Physical Society 78: 551-572.
Campanelli L. (2022a). Breaking Benford’s law: A statistical
analysis of Covid-19 data using the Euclidean distance statis-
tic. To appear in Statistics in Transition new series.
Campanelli L. (2022b). A Statistical Cryptanalysis of the Beale
Ciphers. To appear in Cryptologia.
Campanelli L. (2022c). On the Euclidean Distance Statistic of
Benford’s Law. Communications in Statistics - Theory and
Methods. DOI: 10.1080/03610926.2022.2082480.
CDC (2022). U.S. Monkeypox Case Trends Reported to CDC.
https://www.cdc.gov/poxvirus/monkeypox/response/2022/
(accessed on 2022-09-24).
Cho W. K. T., Gaines B. J. (2007). Breaking the (Benford)
Law: Statistical Fraud Detection in Campaign Finance. Am.
Stat. 61: 218-223.
Farhadi N. (2021). Can we rely on COVID-19 data? An assess-
ment of data from over 200 countries worldwide. Sci. Prog.
104: 1-19.
Hill T. P. (1995a). The significant-digit phenomenon. Am.
Math. Mon. 102: 322-327.
Hill T. P. (1995b). Base-invariance implies Benford’s law. Proc.
Am. Math. Soc. 123: 887-895.
Hill T. P. (1995c). A statistical derivation of the significant-
digit law. Stat. Sci. 10: 354-363.
Miller S. J. (ed.) (2015). Benford’s Law: Theory and Applica-
tions. Princeton University Press. Princeton.
Morrow J. (2014). Benford’s Law, Families of Distributions and
a Test Basis. Centre for Economic Performance. London.
Nigrini M. (1996). A taxpayer compliance application of Ben-
ford’s law. Journal of the American Taxation Association 18:
72-91.
erez-Gonz´alez, F., Abdallah, C. T., Heileman, G. L. (2007).
Benford’s Law in Image Processing. IEEE International Con-
ference on Image Processing, 405–408.
Roukema, B. F. (2013). A first-digit anomaly in the 2009 Ira-
nian presidential election. J. Appl. Stat. 41: 1, 164-199.
Sambridge M., Jackson A. (2020). National COVID numbers -
Benford’s law looks for errors. Nature 581: 384.
Sambridge M., Tkalˇci´c H., Jackson A. (2010). Benford’s law in
the natural sciences. Geophys. Res. Lett. 37: L22301.
Wase V. (2021). Benford’s law in the Beale ciphers. Cryptologia
45: 3, 282-286.
WHO (2022). https://www.who.int/emergencies/situations/
monkeypox-outbreak-2022 (accessed on 2022-09-24).
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
To fight COVID-19, global access to reliable data is vital. Given the rapid acceleration of new cases and the common sense of global urgency, COVID-19 is subject to thorough measurement on a country-by-country basis. The world is witnessing an increasing demand for reliable data and impactful information on the novel disease. Can we trust the data on the COVID-19 spread worldwide? This study aims to assess the reliability of COVID-19 global data as disclosed by local authorities in 202 countries. It is commonly accepted that the frequency distribution of leading digits of COVID-19 data shall comply with Benford’s law. In this context, the author collected and statistically assessed 106,274 records of daily infections, deaths, and tests around the world. The analysis of worldwide data suggests good agreement between theory and reported incidents. Approximately 69% of countries worldwide show some deviations from Benford’s law. The author found that records of daily infections, deaths, and tests from 28% of countries adhered well to the anticipated frequency of first digits. By contrast, six countries disclosed pandemic data that do not comply with the first-digit law. With over 82 million citizens, Germany publishes the most reliable records on the COVID-19 spread. In contrast, the Islamic Republic of Iran provides by far the most non-compliant data. The author concludes that inconsistencies with Benford’s law might be a strong indicator of artificially fabricated data on the spread of SARS-CoV-2 by local authorities. Partially consistent with prior research, the United States, Germany, France, Australia, Japan, and China reveal data that satisfies Benford’s law. Unification of reporting procedures and policies globally could improve the quality of data and thus the fight against the deadly virus.
Book
Full-text available
Benford's law states that the leading digits of many data sets are not uniformly distributed from one through nine, but rather exhibit a profound bias. This bias is evident in everything from electricity bills and street addresses to stock prices, population numbers, mortality rates, and the lengths of rivers. Here, Steven Miller brings together many of the world's leading experts on Benford's law to demonstrate the many useful techniques that arise from the law, show how truly multidisciplinary it is, and encourage collaboration. Beginning with the general theory, the contributors explain the prevalence of the bias, highlighting explanations for when systems should and should not follow Benford's law and how quickly such behavior sets in. They go on to discuss important applications in disciplines ranging from accounting and economics to psychology and the natural sciences. The contributors describe how Benford's law has been successfully used to expose fraud in elections, medical tests, tax filings, and financial reports. Additionally, numerous problems, background materials, and technical details are available online to help instructors create courses around the book. Emphasizing common challenges and techniques across the disciplines, this accessible book shows how Benford's law can serve as a productive meeting ground for researchers and practitioners in diverse fields.
Article
Full-text available
More than 100 years ago it was predicted that the distribution of first digits of real world observations would not be uniform, but instead follow a trend where measurements with lower first digit (1,2,…) occur more frequently than those with higher first digits (…,8,9). This result has long been known but regarded largely as a mathematical curiosity and received little attention in the natural sciences. Here we show that the first digit rule is likely to be a widespread phenomenon and may provide new ways to detect anomalous signals in data. We test 15 sets of modern observations drawn from the fields of physics, astronomy, geophysics, chemistry, engineering and mathematics, and show that Benford's law holds for them all. These include geophysical observables such as the length of time between geomagnetic reversals, depths of earthquakes, models of Earth's gravity, geomagnetic and seismic structure. In addition we find it also holds for other natural science observables such as the rotation frequencies of pulsars; green-house gas emissions, the masses of exoplanets as well as numbers of infectious diseases reported to the World Health Organization. The wide range of areas where it is manifested opens up new possibilities for exploitation. An illustration is given of how seismic energy from an earthquake can be detected from just the first digit distribution of displacement counts on a seismometer, i.e., without actually looking at the details of a seismogram at all. This led to the first ever detection of an earthquake using first digit information alone.
Article
The encryption method used to encode the second Beale cipher leads to a ε-Benford’s distribution for the first significant digit of the numbers in the coded message. The relative level of deviation from Benford’s law, ε, is about 0.15 for the second decoded cipher. The other two undeciphered codes show a statistically significant deviation from a 0.15-Benford’s law, suggesting that either ciphers 1 and 3 are fake or the encryption method used to encode them is different from the one used for cipher 2.
Article
We numerically compute test values of the Euclidean distance statistic of Benford’s law as a function of the sample size. We also find an approximate analytical expression of the cumulative distribution function of such a statistic that makes possible the computation of p values.
Article
The Beale Papers is an 1885 pamphlet in which there are three ciphers, said to contain the location of a hidden treasure. In this paper the ciphers are viewed through Benford’s Law. Statistical analysis show that the the ciphers deviate from the law—cipher 2 less so than 1 & 3. Furthermore the numbers in the uncracked ciphers do not come from the same distribution as the cracked one, but it seems that the uncracked ciphers might share a similar random distribution. One possible explanation is that cipher 1 & 3 are faked in a similar manner, another is that they might share the same key.
Article
The distribution of first significant digits known as Benford's Law has been used to test for erroneous and fraudulent data. By testing for confor-mance with the Law, applied researchers have pinpointed anomalous data using a standard hypothesis testing approach. While novel, there are two weaknesses in this methodology. First, test values used in practice are too conservative once Benford specific values are derived. The new test values of this paper are more powerful and I investigate their small sample properties. Second, testing requires the Null hypothesis of Benford's Law to hold, which often does not for real data. I therefore present a simple method by which all continuous distributions may be transformed to satisfy Benford with arbitrary precision and induce scale invari-ance, one of the properties underlying Benford's Law in the literature. This allows application of Benford tests to arbitrary samples, a hurdle to empirical work. I additionally derive a rate of convergence to Benford's Law. Finally, the theoretical results are applied to commonly used distributions to exhibit when the Law holds within distributional families. The results yield improved tests for Benford's Law applicable to a broader class of data and contribute to understanding occurrences of the Law.