ArticlePDF Available

Was there any widespread fraud in 2020 presidential election? What does Benford's Law say?

Authors:

Abstract and Figures

Fair elections free of any interference are integral tenets of any functioning democracy, and widespread election fraud is undoubtedly a serious threat to a free republic. While instances of electoral fraud are much more prevalent in countries with illiberal democracies, the U.S has recently faced such an accusation. Although he was unable to provide any concrete evidence, the former U.S. President Donald Trump accused his opponent, Joe Biden, now president, of electoral fraud after the presidential election. Fortunately, election forensics are often successful in investigating the validity of such fraud allegations. In this paper, I applied Benford’s law, a rule that should stand up to any large set of natural numbers, such as un-tampered electoral data. Using this law and basic statistical analysis of votes of U.S. counties for candidates of the two major parties, I completed a forensic analysis to investigate Mr. Trump’s allegation. My comprehensive investigation does not find any evidence supporting his allegation.
Content may be subject to copyright.
Was There Any Widespread Fraud in 2020 Presidential
Election? What does Benford's Law say?
Deeya Datta1 and David Banks#
!Gwinnett School of Mathematics Science and Technology, Lawrenceville, GA, USA
#Advisor
ABSTRACT
Fair elections free of any interference are integral tenets of any functioning democracy, and widespread election fraud
is undoubtedly a serious threat to a free republic. While instances of electoral fraud are much more prevalent in coun-
tries with illiberal democracies, the U.S has recently faced such an accusation. Although he was unable to provide any
concrete evidence, the former U.S. President Donald Trump accused his opponent, Joe Biden, now president, of elec-
toral fraud after the presidential election. Fortunately, election forensics are often successful in investigating the va-
lidity of such fraud allegations. In this paper, I applied Benford’s law, a rule that should stand up to any large set of
natural numbers, such as un-tampered electoral data. Using this law and basic statistical analysis of votes of U.S.
counties for candidates of the two major parties, I completed a forensic analysis to investigate Mr. Trump’s allegation.
My comprehensive investigation does not find any evidence supporting his allegation.
Introduction
Fair elections free from any interference are primary requirements for a proper democracy. Any occurrences of voting
fraud threaten the integrity of a country. When the people no longer believe in the integrity of a government, democ-
racy itself is threatened. Although there are frequent accusations of voter fraud between political candidates in other
countries, until the 2020 U.S. presidential election, no presidential candidate had ever accused his or her opponent of
electoral fraud in the United States. Despite many election security experts confirming there were no major election
irregularities in that election, the incumbent candidate, former president Donald Trump, repeatedly made accusations
of fraud against his opponent. Multiple lawsuits were brought in by the incumbent candidate and his associates chal-
lenging the 2020 presidential election results, however these cases were thrown out by judges nationwide due to lack
of evidence. Although Mr. Trump’s accusations failing to generate any legal traction, they did generate an atmosphere
of doubt, distrust, confusion and “what-ifs” about the election integrity. This news motivated me to scientifically
investigate and explore if there had been any basis of such complaints of election frauds. Election experts believe
with a high degree of confidence that it is possible to detect occurrence of widespread fraud in an election by statistical
analysis of vote counts of the candidates participating in the election. In this particular instance, the data will be
analyzed and compared to Benford’s Law, which is a simple but effective concept.
Generally, if individuals are asked to pick an integer from 1 to 9, usually they choose their numbers randomly,
and each of these nine integers will have the same theoretical probability of 1/9th being chosen. Empirically though,
when a sample of 900 individuals are asked to participate in this experiment, these nine integers may not be chosen
each exactly 100 times. In terms of relative frequency, each integer will be chosen by approximately 1/9th of the
participating individuals. As more individuals choose integers, the closer the relative frequency will get to 1/9th ex-
actly.
Although the integers all have an equal chance of being randomly chosen in the above experiment, this prop-
erty does not hold true for large sets of natural numbers in the real world as these numbers do not occur randomly. To
Volume 10 Issue 3 (2021)
ISSN: 2167-1907
www.JSR.org
1
take a closer look we must look at integers in naturally occurring, multi-digit numbers such as utility bill numbers,
account invoices totals, tax returns amounts or counts of ballots. In these numbers, the frequencies of integers 1 to 9
in the leading digit, surprisingly, do not display an equal chance of 1/9th each. Interestingly, the integer 1 appears in
the leading digit much more often than the integer 9 does. This phenomenon, which at first may seem as strange,
appears to hold for many natural data sets, and it was discovered first in 1881 by a physicist, Simon Newcomb, and
rediscovered many decades later in 1938 by another physicist, Frank Benford. Benford published a formula about the
relative frequency of the integers 1 to 9 as the leading digit in a number. He also provided a formula about the relative
frequency of occurrence of the integers 0 to 9 as the second, the third or the fourth digit in a large set of naturally
occurring numbers. This formula, which is known as Benford’s Law, says that the leading digit of any naturally oc-
curring number is most likely to be the integer 1 and least likely to be 9. Specifically, there is about a 30% chance of
the leading digit being 1, and the probability steadily decreases until the probability of integer 9 is only about 5%. The
exact probabilities of the integers 1 through 9 of as the leading digit, according to Benford’s Law, are presented in
Table 1 below (Benford).
Table 1 . Probability table for the first digit in regular numbers by Benford’s Law
First digit
1
3
4
6
7
8
9
Probability
.3010
.1249
.0969
.0669
.0580
.0512
.0458
Benford's law also specifies the proportions of naturally occurring numbers which have integers 0 to 9 as the
second digit. Similarly, the law specifies proportions for the third or the fourth digit in a large set of regular numbers.
These probabilities are given in the row titled “Benford” in Tables 2 through 5 below.
A statistical method can be used to investigate any natural data sets for deviations from Benford’s Law, such
as a data set of vote counts of a presidential candidate in various counties of the US. Any significant deviation may
be evidence of some types of manipulation of the reported counts, or in this case, the number of votes received by a
candidate. In the event of widespread illegal “ballot stuffing”, reported counts of votes for many locations (for exam-
ple, counties) run the risk of violating Benford’s law. Diekmann (2004) presented that in applications involving tax,
financial data or survey interviews, various researchers reported success in identifying fraudulent information by using
Benford’s law. In order to gain insight to the procedure used to identify fraud in these data sets I read several of
Diekmann’s citations (for example, Carslaw 1988, Berton 1995, Nigrini 1996).
I n my effort to explore the usefulness of Benford’s law in detecting data fabrication, I considered vote counts
in each U.S. county recorded for the two major presidential candidates in 2008, 2012, and 2020 elections. For the
years 2008 and 2012 I utilized data from the MIT Data Lab and for the 2020 election I utilized the county-wise vote
counts for each state reported separately at the website of Politico. I imported data from the Politico website manually
state-by-state and automatically from MIT Data Lab (MIT Election Data and Science Lab and Politico). These data
sets provided the vote counts for the presidential candidates for each county of each state for the 2008, 2012, and 2020
elections. My data set showed the vote count for each candidate in each county. I broke each count into its individual
digits then created tables for each digit and how many of each number it had. Finally I compared my resultant tables
of proportions (Table 2-5) to their top rows which show Benford’s law for that digit and ran a chi square analysis to
determine if there was a significant difference.
Application of Benford's law asserting integrity of the 2020 presidential election
Volume 10 Issue 3 (2021)
ISSN: 2167-1907
www.JSR.org
2
Benford’s law is a sound method of detecting electoral fraud and has already been used in the past as evidence of
election fraud in foreign elections (Mebane 2009). The statistical method used to examine agreement (or disagreement)
of frequencies of the integers with Benford’s law is a Pearson’s chi-square test. Essentially, the more the data deviates
from the law, the larger the chi-square value will be.
To test statistically if a certain data set obeys or violates Benford’s Law we use the Pearson’s Chi-square
statistic. This statistic computes the discrepancy of a set of observed counts in a dataset from their corresponding
counts that would be predicted by a hypothesis or a scientific theory. Specifically, this statistic is given by
2 = Σ(Oi – Ei)2/Ei = Σ Oi2/EiN,
where N is the number of counties in our data, Oi denotes the number of our N counties whose leading digit of the
vote counts is i, and Ei denotes the number of the N counties that is predicted by the hypothesis of Benford’s First
law that will have the digit i as the leading digit in their vote counts. We can calculate Ei by multiplying the probability
under the column for digit i in Table 1 by the total number of counties N. The symbol means that we sum over all 9
possibilities of the leading digit (1 through 9). The other Chi-square statistic values for the second, third and fourth
significant digits can similarly be calculated by using the probabilities corresponding to the row “Benford” in Table
3, Table 4 and Table 5, respectively.
After applying the chi-square test for the county-wise ballot counts for the presidential candidates of the
2008, 2012, and 2020 elections, there are no large values of chi-square that may indicate any severe disagreement.
The value of the chi-square test from Biden’s county-wise votes in the 2020 election yields a value of 14.5. Statisti-
cians declare any value of this chi-square smaller than 15.51 does not provide any substantial evidence of a violation
of Benford’s law. The corresponding chi-square value for President Trump’s vote counts is 7.9. For county-wise vote
counts of the other candidates in 2008 and 2012, the chi-square values are 13.6 (Obama 2008), 10.6 (McCain 2008),
15.2 (Obama 2012) and 13.4 (Romney 2012). None of the six chi-square values exceeds the threshold 15.51, which
shows there is no substantive evidence of large-scale election irregularities in these years. These chi-square values are
displayed in the last column of Table 2.
Table 2 . Use of Benford’s Law for the first digit to check for any anomalies of total county vote counts of presiden-
tial candidates of two major parties in 2008, 2012 and 2020 elections.
Name
Proportion of counties where the first digit of vote counts
Chi-
sq
1
2
3
4
5
6
7
8
9
Benford
.30103
.17609
.12494
.09691
.07918
.06695
.05799
.05115
.04576
Biden
.29862
.17711
.12472
.11218
.06590
.06654
.05754
.05111
.04629
14.5
Trump
.29797
.16618
.12118
.09804
.08229
.07168
.05497
.05754
.05014
7.9
Obama
2012
.28834
.18980
.13150
.09854
.07731
.06369
.05070
.04658
.05355
15.2
Romney
.28834
.16477
.12769
.10741
.08777
.07130
.05608
.05450
.04214
13.4
Obama
2008
.28440
.18770
.13253
.10526
.07800
.06531
.06024
.04344
.04312
13.6
McCain
.28472
.16836
.12777
.10590
.08846
.06817
.05739
.05358
.04566
10.6
There were notably no accusations of electoral fraud in 2008 or 2012, however Benford’s Law holds up just
as well for those elections as it does for the 2020 presidential election. These results project a degree of confidence
Volume 10 Issue 3 (2021)
ISSN: 2167-1907
www.JSR.org
3
that this statistical method works. To provide more information from each candidate’s election counts, the table com-
pares the actual relative frequency of each digit to what Benford’s Law predicts. In order to take this analysis a step
further, the second, third and fourth digits of the voting data may be analyzed to further detect any fabrication or
falsification of votes. This is a valid statistical process according to Diekmann (2004), who states that tests based on
second and higher digits of Benford’s law are equally useful in detecting falsification in a data set.
In a historical context, Mebane (2009) actually applied Benford’s law for the second digit to detect election
irregularities in the election of Iranian President Mahmoud Ahmadinejad. Mebane (2009) found severe deviations
between the data and Benford’s law. Following the work of Mebane (2009), we used Benford’s law for the second
digit to analyze election count data for the years 2008, 2012, and 2020. This test is again a chi-square test. There are
six chi-square values based on the votes by county of the six candidates (Biden, Trump, Obama 2008 & 2012, Romney
and McCain). These values are reported in Table 3. None of these six values exceeds 16.92, a widely used threshold
of statistical significance at 5% level. These results do not show any evidence to the allegation of widespread voting
irregularities occurred in 2020. The table includes the relative frequencies of the integers 0 to 9 as the second digit in
the county-wise counts of votes for these candidates compared to the probabilities of these digits based on Benford’s
second law.
Table 3 . Use of Benford’s Law for the second digit to check for any anomalies of total county vote counts of presi-
dential candidates of two major parties in 2008, 2012 and 2020 elections.
Name
Proportion of counties where the third digit of vote counts
Chi-
sq
0
1
2
3
4
5
6
7
8
9
Benford
.11968
.11389
.10882
.10433
.10031
.09668
.09337
.09035
.08757
.08500
n/a
Biden
.11901
.11451
.11097
.10357
.10292
.09006
.09649
.08942
.08138
.09167
5.1
Trump
.12247
.11700
.10897
.09997
.10479
.09740
.09765
.08679
.08132
.08454
3.9
Obama
2012
.11319
.11160
.10526
.10273
.10114
.09417
.09004
.09892
.109417
.08878
7.0
Romney
.11217
.10393
.11470
.09379
.10424
.10044
.09854
.08682
.09474
.09062
13.9
Obama
2008
.12496
.11608
.10339
.10022
.10657
.09959
.08595
.08214
.08754
.09356
10.7
McCain
.12016
.10463
.10653
.10304
.10051
.09480
.08909
.10051
.08497
.09575
11.4
Following Diekmann’s recommendation, the third and fourth digits of the data set were also investigated.
The third digit chi-square values and the relative frequencies for the six election data sets are reported in Table 4
below. None of the six third digit chi-square values is large enough to cross the threshold of significance to indicate
any widespread irregularities in these counts.
Table 4 . Use of Benford’s Law for the third digit to check for any anomalies of total county vote counts of presiden-
tial candidates of two major parties in 2008, 2012 and 2020 elections.
Name
Proportion of counties where the third digit of vote counts
Chi-
sq
0
1
2
3
4
5
6
7
8
9
Benford
.10178
.10138
.10097
.10057
.10018
.09979
.09940
.09902
.09864
.09827
Biden
.09788
.10799
.09331
.09559
.09396
.10375
.10701
.09135
.10930
.09984
13.2
Trump
.10486
.10389
.10068
.09874
.10068
.09778
.10646
.10164
.09907
.08620
7.1
Obama
2012
.09586
.09522
.11189
.10484
.09554
.10516
.09362
.09683
.10741
.09362
12.4
Volume 10 Issue 3 (2021)
ISSN: 2167-1907
www.JSR.org
4
Romney
.09353
.09226
.10399
.10526
.09987
.09575
.09987
.09734
.09892
.11319
13.4
Obama
2008
.10185
.10536
.10473
.10728
.10057
.09259
.10185
.09387
.09642
.09547
5.4
McCain
.10216
.10057
.09613
.09898
.09772
.09676
.10279
.10279
.10565
.09645
3.8
The chi-square values for the 4th digit are also nearly as convincing as those for the other digits, still showing
no evidence of election irregularities. These values, from the smallest to the largest, are 4.1 (Trump), 5.7 (Obama),
6.0 (Biden), 9.2 (Obama), 12.7 (McCain) and 17.0 (Romney). Only the chi-square value for Mitt Romney from the
2012 election barely exceeded the cut-off of 16.92. For the 2020 candidates, in particular, the two chi-square values
are well under the threshold (4.1 and 6.0).
Table 5. Use of Benford’s Law for the fourth digit to check for any anomalies of total county vote counts of presi-
dential candidates of two major parties in 2008, 2012 and 2020 elections.
Name
Proportion of counties where the fourth digit of vote counts
Chi-
sq
0
1
2
3
4
5
6
7
8
9
Benford
.10018
.10014
.10010
.10006
.10002
.09998
.09994
.09990
.09986
.09982
Biden
.10775
.09583
.09543
.10378
.10497
.09543
.10457
.09861
.10139
.09225
6.0
Trump
.09749
.10088
.09715
.10564
.10292
.10462
.09511
.10020
.10190
.09409
4.1
Obama
2012
.09538
.10674
.10522
.10901
.08933
.09841
.09349
.10030
.09841
.10371
9.2
Romney
.10368
.09862
.09591
.08443
.11077
.10706
.10942
.09895
.09389
.09726
17.0
Obama
2008
.10011
.09426
.10742
.10047
.09828
.10815
.09682
.09646
.09536
.10267
5.7
McCain
.09076
.10430
.10566
.10091
.08703
.11006
.10261
.10159
.10058
.09651
12.7
To depict our numerical findings reported in Tables 2 through 5 graphically, we plotted various probabilities and
relative frequencies from these tables. In particular, the plots below graphically display the projected probabilities
according to Benford’s law and the actual corresponding relative frequencies based on county votes of the presidential
candidates. These plots provide partial visual summary of the analysis in Tables 2 to 5 above. Figure 1 presents
results for the 2020 election and Figure 2 presents results for the 2012 election. Figure 1 below compares probabilities
specified by Benford’s law with the respective relative frequencies based on Biden votes and Trump votes. The four
panels of this figure correspond to Benford’s law for the first, second, third and fourth digit. The solid green lines
display the probabilities according to Benford’s law. The corresponding relative frequencies for a democratic candi-
date are depicted by broken blue lines and those for a republican candidate are shown by dotted red lines. The panels
of both figures also display the relevant chi-square values from Tables 2 through 5.
Investigating election irregularity by comparing frequencies of first four digits of intra-party candidates
The preceding statistical investigation of Benford’s law for the six candidates in three election cycles did not show
any evidence of widespread voting irregularities. To further investigate any possible irregularities, relative frequencies
of digits of vote counts of intra-party candidates from different election cycles can be compared. By comparing Obama
with Biden, Romney with Trump, and McCain with Trump, twelve chi-square values were computed. In Table 6
below, there are twelve chi-square values and out of those twelve, only one barely exceeds a significant threshold (this
is for the comparison of Romney vs. Trump with a value of 17.46). It is unlikely that this marginal disagreement may
Volume 10 Issue 3 (2021)
ISSN: 2167-1907
www.JSR.org
5
possibly indicate any election irregularities. In this statistical exploration, 36 chi-square values were computed (6 in
each of Table 2 to 5, and 12 in Table 6) and only two of these values show marginally significant results indicating
presence of possible election irregularities. However, the underlying theory behind this chi-square method says that
one would expect about two chi-square values to exceed the threshold even if there were no election irregularities at
all. Therefore, only two significant chi-square values, both borderline, fail to convey any evidence of election irregu-
larities.
Table 6 . Comparison of leading digits of county vote totals for intra-party presidential candidates of major parties in
2008, 2012 and 2020 elections. The table displays chi-square statistics comparing various candidates based on first-,
second-, third- and fourth-digit frequencies.
Comparison of candidates from the same party
Digit
Obama vs. Biden
Romney vs. Trump
McCain vs. Trump
First
11.98
5.41
4.72
Second
6.65
8.99
9.42
Third
13.06
17.46 (*)
3.44
Fourth
11.87
12.69
7.56
Volume 10 Issue 3 (2021)
ISSN: 2167-1907
www.JSR.org
6
Fi g ure 1 . Visual Exploration of 2020 Election Integrity by Benford's Law
Volume 10 Issue 3 (2021)
ISSN: 2167-1907
www.JSR.org
7
Figure 2. Visual Exploration of 2012 Election Integrity by Benford's Law
Acknowledgements
I’d like to thank Professor David Banks of Duke University for his encouragement and guidance on this research.
References
Benford, F. (1938). The Law of Anomalous Numbers. Proceedings of the American Philosophical Society, 78(4),
551-572. Retrieved from http://www.jstor.org/stable/984802
Volume 10 Issue 3 (2021)
ISSN: 2167-1907
www.JSR.org
8
Berton, L. (1995, Jul 10). He's got their number: Scholar uses math to foil financial fraud. Wall Street Journal Re-
trieved from https://www.proquest.com/newspapers/hes-got-their-number-scholar-uses-math-
foil/docview/398472965/se-2?accountid=14537
Charles A. P. N. Carslaw. (1988). Anomalies in Income Numbers: Evidence of Goal Oriented Behavior. The Ac-
counting Review, 63(2), 321-327. Retrieved June 29, 2021, from http://www.jstor.org/stable/248109
Diekmann, A (2007). Not the First Digit! Using Benford's Law to detect fraudulent
scientific data. J Appl Stat., 34 (3), 321-329. Retrieved from https://doi.org/10.1080/02664760601004940
Mebane, Jr., W.R. (2009). Note on the presidential election in Iran, June 2009. University
of Michigan, June 29, 2009, 22-23. Retrieved from http://www-personal.umich.edu/~wmebane/note29jun2009.pdf
MIT Election Data and Science Lab. (2021). County presidential election returns 2000-2020. Retrieved from
https://doi.org/10.7910/DVN/VOQCHQ
Newcomb, S. (1881). Note on the Frequency of Use of the Different Digits in Natural Numbers. American Journal
of Mathematics, 4(1), 39-40. Retrieved from www.jstor.org/stable/2369148 .
Nigrini, M. J. (1996). A Taxpayer Compliance Application of Benford's Law. The Journal
of the American Taxpayer Association, 18, 72-91. Retrieved from https://www.econbiz.de/Record/a-taxpayer-com-
pliance-application-of-benford-s-law-nigrini-mark-john/10001202885
Politico (2020). Live 2020 election results: Presidency, Senate and House. Retrieved from https://www.polit-
ico.com/2020-election/results/
Schäfer, C., Schräpler, J.-P., Müller, K.-R., & Wagner, G. G. (2004). Automatic identification of faked and fraudu-
lent interviews in surveys by two different methods (Working Paper No. 441). DIW Discussion Papers. Retrieved
from https://www.econstor.eu/handle/10419/18293
Volume 10 Issue 3 (2021)
ISSN: 2167-1907
www.JSR.org
9
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper presents two new tools for the identification of faking interviewers in surveys. One method is based on Benford?s Law, and the other exploits the empirical observation that fakers most often produce answers with less variability than could be expected from the whole survey. We focus on fabricated data, which were taken out of the survey before the data were disseminated in the German Socio-Economic Panel (SOEP). For two samples, the resulting rankings of the interviewers with respect to their cheating behavior are given. For both methods all of the evident fakers are identified.
Article
Full-text available
Digits in statistical data produced by natural or social processes are often distributed in a manner described by 'Benford's law'. Recently, a test against this distribution was used to identify fraudulent accounting data. This test is based on the supposition that first, second, third, and other digits in real data follow the Benford distribution while the digits in fabricated data do not. Is it possible to apply Benford tests to detect fabricated or falsified scientific data as well as fraudulent financial data? We approached this question in two ways. First, we examined the use of the Benford distribution as a standard by checking the frequencies of the nine possible first and ten possible second digits in published statistical estimates. Second, we conducted experiments in which subjects were asked to fabricate statistical estimates (regression coefficients). The digits in these experimental data were scrutinized for possible deviations from the Benford distribution. There were two main findings. First, both digits of the published regression coefficients were approximately Benford distributed or at least followed a pattern of monotonic decline. Second, the experimental results yielded new insights into the strengths and weaknesses of Benford tests. Surprisingly, first digits of faked data also exhibited a pattern of monotonic decline, while second, third, and fourth digits were distributed less in accordance with Benford's law. At least in the case of regression coefficients, there were indications that checks for digit-preference anomalies should focus less on the first (i.e. leftmost) and more on later digits.
He's got their number: Scholar uses math to foil financial fraud
  • L Berton
Berton, L. (1995, Jul 10). He's got their number: Scholar uses math to foil financial fraud. Wall Street Journal Retrieved from https://www.proquest.com/newspapers/hes-got-their-number-scholar-uses-mathfoil/docview/398472965/se-2?accountid=14537
MIT Election Data and Science Lab. (2021). County presidential election returns
  • Jr Mebane
Mebane, Jr., W.R. (2009). Note on the presidential election in Iran, June 2009. University of Michigan, June 29, 2009, 22-23. Retrieved from http://www-personal.umich.edu/~wmebane/note29jun2009.pdf MIT Election Data and Science Lab. (2021). County presidential election returns 2000-2020. Retrieved from https://doi.org/10.7910/DVN/VOQCHQ
A Taxpayer Compliance Application of Benford's Law. The Journal of the American Taxpayer Association
  • S Newcomb
Newcomb, S. (1881). Note on the Frequency of Use of the Different Digits in Natural Numbers. American Journal of Mathematics, 4(1), 39-40. Retrieved from www.jstor.org/stable/2369148. Nigrini, M. J. (1996). A Taxpayer Compliance Application of Benford's Law. The Journal of the American Taxpayer Association, 18, 72-91. Retrieved from https://www.econbiz.de/Record/a-taxpayer-compliance-application-of-benford-s-law-nigrini-mark-john/10001202885
Live 2020 election results: Presidency, Senate and House
Politico (2020). Live 2020 election results: Presidency, Senate and House. Retrieved from https://www.politico.com/2020-election/results/