Content uploaded by Leonardo Campanelli

Author content

All content in this area was uploaded by Leonardo Campanelli on Sep 25, 2022

Content may be subject to copyright.

Monkeypox Obeys the (Benford’s) Law: A Dynamic Analysis

of Daily Case Counts in the United States of America

Leonardo Campanelli1

leonardo.s.campanelli@gmail.com

1All Saints University School of Medicine, 5145 Steeles Ave., M9L 1R5, Toronto (ON), Canada

(Dated: September 25, 2022)

Abstract. We analyze the ﬁrst-digit distribution of the Monkeypox daily cases in the United States

of America, from May 17 to September 21, 2022. As expected for the spread of infectious diseases,

the overall data follow Benford’s law. Moreover, we ﬁnd that the temporal series of daily cases

conforms to the Benford’s distribution to an exceptionally high conﬁdence level.

1. Introduction

Monkeypox is a viral zoonotic infectious disease caused

by a virus in the genus Orthopoxvirus. An ongoing out-

break started on May 6, 2022 in London, United King-

dom. From May 18 onwards, cases were reported world-

wide in more than about 100 countries. This is the

ﬁrst time Monkeypox has spread outside Central (Congo

Basin Clade) and West Africa (West African Clade),

where the disease is endemic (WHO, 2022).

There is evidence that the spread of infectious dis-

eases conforms to Benford’s law. Indeed, Sambridge et

al. (2010) found that the total numbers of cases of 18

infectious diseases reported to the World Health Orga-

nization (WHO) by 193 countries worldwide in 2007 fol-

low a Benford’s distribution. Recently, Benford’s law has

been applied to the study of Covid-19 data, in particular

to daily, weekly, and cumulative case and death counts

of various countries [see, e.g., Sambridge and Jackson

(2020), Farhadi (2021), and Campanelli (2022a).] The

general result is that the Benford’s distribution well de-

scribes the ﬁrst-digit distributions of Covid-19 data for

most of the countries and, then, it can be used to ﬂag

“anomalies” in the data of speciﬁc countries.

Benford’s law (Benford, 1938) is an empirical statisti-

cal law according to which the probability PB(d) of oc-

currence of the ﬁrst signiﬁcant digit din “particular”

data sets is

PB(d) = log1 + 1

d.(1)

Although it is now know that some distributions satisfy

Benford’s law [see, e.g., Morrow (2014) and references

therein] and that particular principles lead to the emer-

gence of the Benford’s phenomenon in data (Hill, 1995a,

1995b, and 1995c), no general criteria has be found that

fully explain when and why Benford’s law holds for a

“generic” set of data.

Although much work is still needed to understand the

theoretical basis of the law, the number of its applica-

tions has grown in the last few decades [for theoretical

insights and general applications of Benford’s law, see

Miller (2015)]. Probably, the most famous applications

are to detecting tax (Nigrini, 1996), campaign ﬁnance

(Cho and Gaines, 2007), and election (Roukema, 2013)

frauds. Other interesting applications are in image pro-

cessing (P´erez-Gonz´alez et al., 2007), where Benford’s

law can be used to test whether or not the image has

been compressed, in natural sciences, where the law has

been shown to hold for geophysical observables such as

the depths of earthquakes (Sambridge et al., 2010), and in

cryptology, where it can be used to examine the truthful-

ness of undeciphered numerical codes (Wase, 2021, Cam-

panelli, 2022b).

The aim of this paper is to assert if the data relative

to the Monkeypox daily counts in the United States of

America (USA) comply or not with Benford’s law.

2. Analysis

It is well known that the compliance of data sets to Ben-

ford’s law improves as the range of the data increases.

Daily and cumulative death cases by country are then

not appropriate when checking for the compliance of the

Monkeypox ﬁrst-digit distributions to Benford’s law be-

cause there have been only few tens of deaths worldwide

since the start of the outbreak (WHO, 2022). Another

possibility would be the use of cumulative conﬁrmed case

counts. The disadvantage of using this type of data is

that as cumulative case numbers begin to ﬂatten (e.g.,

after a Monkeypox “wave” has passed), ﬁrst digits tend

to become all the same, thus distorting relative digit fre-

quencies. In order to overcome this problem, we will

only analyze the data on daily conﬁrmed cases by coun-

try. However, the only country with daily case numbers

which extend on a statistically appreciable range is the

USA: Here, the data cover about three orders of magni-

tude, while in all the other countries aﬀected by Monkey-

pox they extend at most on two (WHO, 2022). Accord-

ingly, we will focus our analysis on the daily case counts

from the USA.

Overall analysis. The most common tests in use for

testing whether a numerical sample satisﬁes Benford’s

law are the Pearson’s χ2and Kolmogorov-Smirnov tests.

Although being general tests in that they can be used to

quantify the conformance of data sets to any theoretical

2

TABLE I: The Euclidean distance d∗

Nin Eq. (2) and its corresponding pvalue for the ﬁrst-digit distribution of the Monkeypox

daily case counts in the USA. Also indicated are the range of cases, [min,max], and the number of days, N. Counts are from

the CDC (2022) and are updated to September 21, 2022. The last three columns show the reduced χ2score, χ2

red =χ2/ν, the

number νof degrees of freedom, and the pvalue, p(χ2), of the χ2statistic deﬁned in Eq. (3).

Range N d∗

Np χ2

red ν p(χ2)

[1,916] 125 1.0031 0.284 0.5462 76 0.9996

æ

æ

æ

æ

æ

æ

æ

æ

æ

1

2

3

4

5

6

7

8

9

1

2

3

4

5

6

7

8

9

d

0.05

0.10

0.15

0.20

0.25

0.30

0.35

fHdL

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

60

80

100

120

0.0

0.2

0.4

0.6

0.8

1.0

n

p

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

æ

60

80

100

120

0.4

0.6

0.8

1.0

1.2

n

dn

*

FIG. 1: Left panel. Observed ﬁrst-digit frequencies of the Monkeypox daily case counts in the USA. The (blue) continuous line

represents Benford’s law. Middle panel. pvalues of the Euclidean distance statistic d∗

nas a function of the number of data

points n(number of days). The (blue) dashed line is p= 0.10. Right panel. The Euclidean distance statistic as a function of n.

The (blue) continuous line represents the expected value of d∗

nfor a Benford’s distribution, while the (blue) dashed lines show

the corresponding one-sigma interval.

distribution, they are generally conservative for testing

Benford’s law [see, e.g., Morrow (2014)]. The most ap-

propriate test for checking Benford’s law in data is the

“Euclidean distance test”, which has been recently pro-

posed by the author to speciﬁcally quantify the goodness

of ﬁt of a data sample to Benford’s law (Campanelli,

2022c).

The Euclidean distance test is based on the Euclidean

distance estimator d∗

N, ﬁrst introduced by Cho and

Gaines (2007) and then analyzed by Morrow (2014),

d∗

N=v

u

u

tN

9

X

d=1

[P(d)−PB(d)]2,(2)

where P(d) is the observed ﬁrst-digit frequency distribu-

tion of a sample of size N. The (empirical) cumulative

distribution function (CDF) of the Euclidean distance

statistic found by the author (Campanelli, 2022c) allow

us to evaluate pvalues as p= 1 −CDF(d∗

N).

Data of the 2022 USA Monkeypox outbreak are from

the Centers for Disease Control and Prevention (CDC,

2022) and are updated to September 21, 2022. They are

the conﬁrmed daily cases reported to the CDC since May

17, 2022, the start of the response to the current out-

break. They include either the positive laboratory test

report date, CDC call center reporting date, or the case

data entry date into CDC’s emergency response common

operating platform.

In Tab. I, we show the range of daily cases, [min,max],

the number of days, N, the Euclidean distance, d∗

N, and

the corresponding pvalue. In the left panel of Fig. 1,

instead, we show the observed ﬁrst-digit frequency dis-

tribution of daily case counts superimposed to Benford’s

law. As it is clear from the table and ﬁgure, the data

comply with Benford’s law at a high level of signiﬁcance.

Dynamic analysis. Since the daily counts relative to

Monkeypox, as well as to any other infectious disease,

evolve in time, it is interesting and statistically beﬁtting

to quantify the deviation of the timeline of those counts

from Benford’s law. Indeed, a dynamic data analysis of

the chronology of the counts better captures the statisti-

cal properties of the spread of a disease.

Such a dynamic analysis can be performed by consid-

ering the following χ2statistic:

χ2=

N

X

n=N0d∗

n−d∗

n

σn2

.(3)

Here, d∗

nand σnare the expected value and standard

deviation of the Euclidean distance statistic for the Ben-

ford’s distribution (Campanelli, 2022c), while d∗

nis the

value of the observed Euclidean distance statistic for n

data points (the ordinal number of days in our case).

As already noticed, the compliance to Benford’s law

improves as the range of the data increases. For this

reason, we let the sum in Eq. (3) to begin from N0, the

day starting from which the data range extends at least

3

on two orders of magnitude (in the case at hand, N0=

50). The number νof degrees of freedom for the χ2

statistic is then ν=N−N0.

In the middle and right panels of Fig. 1 we show, re-

spectively, the pvalues and scores of the Euclidean dis-

tance statistic d∗

nas a function of n. The (blue) continu-

ous line in the right panel represents the expected value

of d∗

nfor a Benford’s distribution, while the (blue) dashed

lines show the corresponding one-sigma interval.

As it is clear from the ﬁgure, the null hypothesis of

conformance to Benford’s law can never be rejected at

a 10% level of signiﬁcance. Moreover, the values of the

observed d∗

nare relatively close to the ones expected for a

Benford’s distribution. This closeness can be quantiﬁed

by using Eq. (3), which gives χ2= 41.51 for 76 degrees

of freedom. This corresponds to a reduced χ2score as

low as χ2

red =χ2/ν = 0.5462 and to a pvalue as large

as p(χ2)=0.9996. These values for the reduced χ2and

pvalues, reported in Tab. I for convenience, show that

the temporal series of the Monkeypox daily case counts

in the USA conforms to the Benford’s distribution to a

very high signiﬁcance level.

3. Conclusions

We analyzed the ﬁrst-digit distribution of the daily case

counts for the 2022 Monkeypox outbreak in the USA. In

order to test the null hypothesis – conformance to the

Benford’s distribution – we used the “Euclidean distance

test”, which has been proposed by the author elsewhere

to speciﬁcally quantify the goodness of ﬁt of a data sam-

ple to Benford’s law.

Our results are consistent with the results about the

ﬁrst-digit distribution of case counts for other infectious

diseases, such as Covid 19, according to which such a dis-

tribution follows Benford’s law. In particular, the tem-

poral series of the Monkeypox daily cases in the USA

conforms to the Benford’s distribution to a remarkably

high signiﬁcance level of about 99.96%.

References

Benford F. (1938). The Law of Anomalous Numbers. Pro-

ceedings of the American Physical Society 78: 551-572.

Campanelli L. (2022a). Breaking Benford’s law: A statistical

analysis of Covid-19 data using the Euclidean distance statis-

tic. To appear in Statistics in Transition new series.

Campanelli L. (2022b). A Statistical Cryptanalysis of the Beale

Ciphers. To appear in Cryptologia.

Campanelli L. (2022c). On the Euclidean Distance Statistic of

Benford’s Law. Communications in Statistics - Theory and

Methods. DOI: 10.1080/03610926.2022.2082480.

CDC (2022). U.S. Monkeypox Case Trends Reported to CDC.

https://www.cdc.gov/poxvirus/monkeypox/response/2022/

(accessed on 2022-09-24).

Cho W. K. T., Gaines B. J. (2007). Breaking the (Benford)

Law: Statistical Fraud Detection in Campaign Finance. Am.

Stat. 61: 218-223.

Farhadi N. (2021). Can we rely on COVID-19 data? An assess-

ment of data from over 200 countries worldwide. Sci. Prog.

104: 1-19.

Hill T. P. (1995a). The signiﬁcant-digit phenomenon. Am.

Math. Mon. 102: 322-327.

Hill T. P. (1995b). Base-invariance implies Benford’s law. Proc.

Am. Math. Soc. 123: 887-895.

Hill T. P. (1995c). A statistical derivation of the signiﬁcant-

digit law. Stat. Sci. 10: 354-363.

Miller S. J. (ed.) (2015). Benford’s Law: Theory and Applica-

tions. Princeton University Press. Princeton.

Morrow J. (2014). Benford’s Law, Families of Distributions and

a Test Basis. Centre for Economic Performance. London.

Nigrini M. (1996). A taxpayer compliance application of Ben-

ford’s law. Journal of the American Taxation Association 18:

72-91.

P´erez-Gonz´alez, F., Abdallah, C. T., Heileman, G. L. (2007).

Benford’s Law in Image Processing. IEEE International Con-

ference on Image Processing, 405–408.

Roukema, B. F. (2013). A ﬁrst-digit anomaly in the 2009 Ira-

nian presidential election. J. Appl. Stat. 41: 1, 164-199.

Sambridge M., Jackson A. (2020). National COVID numbers -

Benford’s law looks for errors. Nature 581: 384.

Sambridge M., Tkalˇci´c H., Jackson A. (2010). Benford’s law in

the natural sciences. Geophys. Res. Lett. 37: L22301.

Wase V. (2021). Benford’s law in the Beale ciphers. Cryptologia

45: 3, 282-286.

WHO (2022). https://www.who.int/emergencies/situations/

monkeypox-outbreak-2022 (accessed on 2022-09-24).