ArticlePDF Available

Abstract and Figures

Benford's Law (sometimes also called Benford's Distribution or Benford's Test) is one of the possible tools for verification of a data structure in a given file regarding the relative frequencies of occurrence of the first (or second, etc.) digit from the left . If it is used as a goodness-of-fit test on sample data, there are usually no problems with its interpretation. However, certain factual questions arise in connection with validity of Benford's Law in large data sets in governmental statistics; such questions should be resolved before the law is used. In this paper we discuss the application potential of Benford's Law when working with extensive data sets in the areas of economic and social statistics.
Content may be subject to copyright.
ANALYSES
54
Benford's Law and Possibilities
for Its Use in Governmental
Statistics1
Richard Hindls2 |University of Economics, Prague, Czech Republic
Stanislava Hronová3 |University of Economics, Prague, Czech Republic
1 is article was written thanks to support from the Institutional Support to Long-Term Conceptual Development of Re-
search Organisation, the Faculty of Informatics and Statistics of the University of Economics, Prague.
2 Faculty of Informatics and Statistics, Department of Statistics and Probability, nám. W. Churchilla 4, 130 67 Prague 3,
Czech Republic. E-mail: hindls@vse.cz.
3 Faculty of Informatics and Statistics, Department of Economic Statistics, nám. W. Churchilla 4, 130 67 Prague 3, Czech
Republic. E-mail: hronova@vse.cz.
Abstract
Benford's Law (sometimes also called Benford's Distribution or Benford's Test) is one of the possible tools for
veri cation of a data structure in a given  le regarding the relative frequencies of occurrence of the  rst (or
second, etc.) digit from the le . If it is used as a goodness-of- t test on sample data, there are usually no prob-
lems with its interpretation. However, certain factual questions arise in connection with validity of Benford's
Law in large data sets in governmental statistics; such questions should be resolved before the law is used.
In this paper we discuss the application potential of Benford's Law when working with extensive data sets in
theareas of economic and social statistics.
Keywords
Benford's Law, goodness-of- t test, Z-test, national accounts
JEL code
E22, C43
INTRODUCTION
Correctness and indisputability of macroeconomic data is one of the basic principles in governmental
statistics.  ese attributes are achieved by the use of veri ed methods to collect and process data, attested
procedures, and balance computations with the aid of all available sources of information.  e national
accounts system is one of the “tools” we use for verifying the meaningfulness and cohesion of the govern-
mental statistics. National accounts is a system of inter-related macroeconomic statistical data, arranged
in the form of integrated economic accounts. We can compare this system with a crossword puzzle in
which indices stand for letters. In other words, each entry is added to the total index value in the row, and
one of di erent indices in the column, similar to letters in a crossword puzzle being parts to “down” and
“across” words.  is arrangement of data ensures that all items are inter-related and balanced – nothing is
lost and nothing is used to excess. Without disputing the national accounts of any country, it is clear that
a balanced inter-related system of data can be created from  ctitious or even incorrect data items. Other
2015
55
95 (2)
STATISTIKA
tools are suitable for verifying that the items of national accounts are indeed correct. In addition to the
usual factual and logical checks on the data sources and procedures, such veri cation can be supported
by certain formal tools. Benford’s Distribution is one of them.
1 WHAT IS BENFORD’S LAW?
e substance of Benford’s Law can easily be expressed in words: in a given set of data, the probability
of occurrence as the  rst digit from the le is di erent for each of the digits 1, 2, ... 9. Numbers starting
with one occur more o en than those starting with two, which are in turn more frequent than those
starting with three, etc., and numbers starting with nine are the least frequent ones.  is observation
is hard to believe at rst sight. However, its validity has been empirically con rmed ( rst in 1881, and
then again in 1938).  anks to a new mathematical approach developed at the end of the 20th century,
this law found its way to be included into the theory of probability. Many a time, successful applications,
including testing mathematical models and computer designs, as well as error detection in accounting,
have indicated its validity.
1.1 Historical Note
By the irony of fate, it was not Frank Benford who assisted at the birth of the distribution that is now
called Benford’s. Neither was he the  rst who tried to prove it mathematically. As a matter of fact, Simon
Newcomb in the late 19th century  rst de ned a distribution governing the occurrence of numbers with
a given digit as the  rst one from the le . R.A. Raimi and T.P. Hill tried to put forth a mathematical
proof of this speci c law in the 1990s.
Curiosity and imagination, besides knowledge and experience, undoubtedly play an important role
in scienti c discoveries.  is was also the case of the distribution (law) later called Benford’s. American
mathematician and astronomer Simon Newcomb noticed in a library that the beginning pages in loga-
rithm table books are much more worn out than the rest. On the basis of this observation he realised
that students much more o en look up logarithms of numbers beginning with one than those begin-
ning with two, the latter more o en than those beginning with three, etc., and from that he deduced: the
probability of occurrence for numbers beginning with one is largest, and larger than that for numbers
beginning with two, etc. Empirically he derived4 the following formula for the probability of occurrence
for numbers in which digit d stands the  rst from the le :
P(d) = , for d = 1, 2, …, 9. (1)
is rule means that the probability of occurrence of a number beginning with one is 0.3010, beginning
with two 0.1761, etc., to the probability of a number beginning with nine, which is 0.0458. He also derived
probabilities corresponding to the digit second from the le (now, of course, zero has to be included);
mutual di erences are signi cantly lower for digits 0, 1, …, 9 at the second position: the probability of
zero is 0.1197, and that of nine is 0.0850).5
Nowadays Newcombs paper has hundreds of citations, but in its time it passed practically without notice
and more or less fell into oblivion. Many years later American physicist Frank Benford also noticed the
irregular wear of logarithmic table books’ pages, and derived the same logarithmic formula for the  rst
and second digits from the le . In 1938 he published his conclusions based on studying a large number
of data sets for di erent areas (hydrology, chemistry, but also baseball or daily press – Benford, 1938).
+d
1
1log10
4 Cf. Newcomb (1881).
5 Cf. Table 1.
ANALYSES
56
Unlike Newcombs paper, Frank Benford’s met certain attention, perhaps thanks to recognition of his
name in physics. Newcomb had been forgotten by then and the logarithmic relationship for occurrence
of the  rst (and second) digit from the le was “christened” Benfords.
e wider use of Benford’s Law in the second half of the 20th century brought about a number of ques-
tions concerning its validity.  ere were data sets (from natural sciences, economics, but also everyday
life) in which Benford’s Law was valid, but it was always possible to  nd situations for its rejection (phone
numbers from a certain area, shoe or cloth sizes, etc.). Naturally, a question arose whether Benfords Law
can or cannot be proved mathematically. In particular, T. P. Hill (Hill, 1995a; Hill, 1995b; and Hill, 1998),
and R. A. Raimi (Raimi, 1969a; Raimi, 1969b; and Raimi, 1976) tried to  nd such a proof, but no strict
mathematical proof was found.6 If nothing else, their theoretical e orts led to an approximate formula-
tion of Benford’s Law validity: if we take random samples from arbitrary distributions, the collection of
these random samples approximately obey the Benford’s Law.7
1.2 Theoretical basis
Formula (1),  rst derived by Newcomb and later again by Benford, has a more general validity; or rather,
it can be adapted into a form which de nes occurrence of any digit at the second, third, etc. positions. In
this connection, however, we have to ask whether such occurrence does or does not depend on occurrence
of preceding digit(s) from the le , or is conditional with respect to such occurrence. In other words, in
the former case we deal with probabilities of independent events, while in the latter conditional prob-
abilities are due to be used.
Occurrence of a digit from 1, 2, …, 9 at the  rst position from the le is governed by Formula (1),
but occurrence of a digit from 0, 1, …, 9 at the second position from the le (on assumption that it is
independent of occurrence of a particular digit at the  rst position from the le ) is given as
, for d = 0, 1, …, 9. (2)
Regarding independent occurrences of digits from 0, 1, …, 9 at the third and following positions,
thelast formula can be generalised:
, for dk = 0, 1, …, 9. (3)
and the mutual di erences between probabilities of occurrence of a particular digit get smaller already at
the second position from the le ; and starting at the  h position (independent of the preceding ones)
Benford’s Law approaches the uniform multinomial distribution. Table 1 shows the changes in the prob-
ability values for independent occurrence of digits 0, 1, …, 9 at the  rst to  h positions from the le .
e results presented above imply that, starting from the third position from the le , di erences in
probability values are very small and only occurrence of digits at the  rst and second positions from
thele are interesting from the viewpoint of practical applications.
6 Perhaps the best characterisation is that by R. A. Raimi in the conclusion of his paper (Raimi, 1969b, p. 347). Referring
to the validity of Benford's Law for addresses of 5 000 people from a "Who is Who" publication, he says: "Why should
thestreet addresses of a thousand famous men obey the logarithm law? I know no answer to this question".
7 Cf. Hill (1998) and Raimi (1969b).
+
+=
=dk
dP
k10
1
1log)(
9
1
10
=
=
==
+=
9
0
1
10
9
0
9
1
121
10.
1
1log...)(
k
dik
k
i
i
dd
k
d
dP
2015
57
95 (2)
STATISTIKA
Another situation arises when probability of occurrence of a digit from 0, 1, …, 9 at the second posi-
tion from the le is conditional on occurrence of a particular digit from 1, 2, …, 9 at the  rst position
from the le . Conditional probability of occurrence for d2 at the second position from the le on the
condition that the  rst digit from the le is d1 equals
, for d1 = 1, 2, …, 9, and for d2 = 0, 1, …, 9. (4)
For example, probability of “2” occurring at the second position on condition of “3” being the  rst
digit from the le is
.
Values of conditional probability for pairs of digits calculated with the aid of Formula (4) are shown
in Table 2.
Source: Authors' own calculations
j
d 12345
0 x 0.1197 0.1018 0.1002 0.1000
1 0.3010 0.1139 0.1014 0.1001 0.1000
2 0.1761 0.1088 0.1010 0.1001 0.1000
3 0.1249 0.1043 0.1006 0.1001 0.1000
4 0.0969 0.1003 0.1002 0.1000 0.1000
5 0.0792 0.0967 0.0998 0.1000 0.1000
6 0.0669 0.0934 0.0994 0.0999 0.1000
7 0.0580 0.0904 0.0990 0.0999 0.1000
8 0.0512 0.0876 0.0986 0.0999 0.1000
9 0.0458 0.0850 0.0983 0.0998 0.1000
Table 1 Probability of occurrence for digit d at the jth position from the left
d1( rst
digit
from
the left)
d2 (second digit from the left)
0123456789
1 0.1375 0.1255 0.1155 0.1069 0.0995 0.0931 0.0875 0.0825 0.0780 0.0740
2 0.1203 0.1147 0.1096 0.1050 0.1007 0.0967 0.0931 0.0897 0.0865 0.0836
3 0.1140 0.1104 0.1070 0.1038 0.1008 0.0979 0.0952 0.0927 0.0903 0.0880
4 0.1107 0.1080 0.1055 0.1030 0.1007 0.0985 0.0964 0.0943 0.0924 0.0905
5 0.1086 0.1065 0.1045 0.1025 0.1006 0.0988 0.0971 0.0954 0.0938 0.0922
6 0.1072 0.1055 0.1038 0.1022 0.1006 0.0990 0.0976 0.0961 0.0947 0.0933
7 0.1062 0.1047 0.1033 0.1019 0.1005 0.0992 0.0979 0.0966 0.0954 0.0942
8 0.1055 0.1042 0.1029 0.1017 0.1005 0.0993 0.0982 0.0970 0.0959 0.0949
9 0.1049 0.1037 0.1026 0.1015 0.1004 0.0994 0.0984 0.0973 0.0964 0.0954
Table 2 Conditional probability values of occurrence for d2 on condition d1
+
+
+
=
1
10
21
10
12 1
1log
10
1
1log
)/(
d
dd
ddP
1070.0
1249.0
0134.0
3
1
1log
32
1
1log
)3/2(
10
10
12 ==
+
+
=== DDP
Source: Authors' own calculations
ANALYSES
58
e relationships considered above for Benford’s Law are valid for arbitrary data sets and are invar-
iant with respect to the change of radix base or units of measurement. Equivalently expressed, data sets
governed by Benfords Law will remain governed even if expressed in a base other than decimal, or in
other units of measurement (physical, currency, etc.) or if the original data items are all multiplied by
an arbitrary constant.  is fact implies that any arithmetical operations carried out on data governed by
Benford’s Law will again be governed by the same law.8
e fact that we have at our disposal Benfords Distribution of the  rst (and second) digit from the
le 9 provides us with an option to check any data set for a  t to the data structure governed by Benfords
Law.  e best choice for such a procedure is the 2 goodness-of- t test, which can be used as a standard
hypothesis test if the respective data set comes from a random sample.  e tested hypothesis, denoted
by H0, asserts the  t of the empirical distribution with Benfords Law, and the alternative hypothesis H1
claims the contrary.  e test criterion is the statistics
, (5)
which has, under validity of H0, approximate distribution 2 [8], and
where πd – theoretical relative frequencies under Benford’s “Law;
p
d – empirical relative frequencies; and
n – sample size.
e critical values are the respective quantiles of 2 [8]; on a 5% signi cance level, the 95% quantile
will be used, that is, 20.95 [8] = 15.5. For a test of the  t at the second position the procedure would be
similar, but there are ten groups and nine degrees of freedom. If the underlying sample is small, we also
have to respect the condition of a su cient frequency count in each “cell” (d
5).
Another option for testing the  t of sample data to Benford’s Law is the use of Z-statistics; this proce-
dure again veri es the  t between empirical and theoretical frequencies, but separately for each digit,
not as a whole. Under hypothesis H0, the following Z-statistics has approximate normal distribution
, (6)
where πd – theoretical relative frequencies under Benford’s “Law;
p
d – empirical relative frequencies; and
n – sample size.
e critical value (in this case, separate for each digit) is the respective quantile u1– α/2 of the normed
normal distribution. On a 5% signi cance level, we get u0.975 = 1.96. Kossovsky (2015) recommends that
the two-tailed test should always be used, i.e., the critical value given by quantile u1– α/2, because absolute
value stands in the numerator in Formula (6), and therefore it is not necessary to distinguish between
directions of the deviation from Benford’s Law (it means that both lower and higher relative frequencies
than the theoretical value under Benfords Law admit the same interpretation).
Although both tests lead to conclusions that are intuitively similar, there is a di erence between them.
Namely, the former (G-statistics) comprehensively assesses the validity of Benford’s Law for a given set
8 Cf., e.g., Watrin et al. (2008).
9 For the above-mentioned reasons we are not going to consider more positions from the le .
10 Cf. Kossovsky (2015).
=
=
9
1
2
)(
dd
dd
p
nG
π
π
()
dd
dd
d
n
pn
Z
ππ
π
=
1
2
1
2015
59
95 (2)
STATISTIKA
of  rst digits (possibly second ones as well).  e particular digit for which the deviation from Benford’s
Law is the highest must be looked up among values
, for d = 1, 2, …, 9, or d = 0, 1, …, 9.
e second approach (Zd-statistics) evaluates the deviation for each individual  rst digit indepen-
dently, and it is immediately obvious which  rst digits do or do not comply with Benford’s Law.  esame
considerations of course apply to testing the  t of empirical data to Benford’s Law for the second digit
from the le .
Mean Absolute Di erence (MAD) is also o en used to test the  t to Benford’s Law.  is approach,
however, goes beyond standard hypothesis testing because the distribution of the MAD statistics is
unknown.  e mean absolute di erence value (for the case of the  rst digit from the le )11 is
9
9
1
i
dd
p
MAD
, (7)
where πd – theoretical relative frequencies under Benford’s “Law;
p
d – empirical relative frequencies.
Since we do not know the distribution of the MAD statistics, empirical threshold values12 are used for
evaluation the outcome for MAD – cf. Table 3.
11 For testing the second digit from the le , the calculation is similar but there are ten groups.
12 Cf. Nigrini (2011).
13 e fact that validity of Benford's Law has not been proved mathematically is also a frequent topic.
14 From among the most recent ones, we refer to Miller (2015) – it is a very good presentation of applications and experi-
ence with them, especially in the areas of economy, accounting, and also natural sciences.
Table 3 Degrees of  t for MAD statistics
Source: Nigrini (2011)

d
dd
p
2
MAD value Degree of  t between empirical and theoretical
(Benford’s) distributions
0.000 – 0.006 Close  t
0.006 – 0.012 Acceptable  t
0.012 – 0.015 Loose  t
0.015 plus No  t
Unlike the previous approaches, which are classical statistical inference instances, the MAD statistics
is more suitable for verifying the  t in a data set not considered a random sample because all data items
in the given area are included.  is is o en the case when checking extensive sets in corporate accounting
and macroeconomic data.
2 PRACTICAL APPLICATIONS
e simplicity and, undoubtedly, a certain degree of mystery of Benford’s Law13 have led to a large volume
of literature on this subject.14 Most o en, discussions appear about the use of Benford’s Law in checking
accounting and macroeconomic data.
ANALYSES
60
Using Benfords Law for veri cation of accounting data correctness is one of the approaches that have
recently been o en used in  nancial auditing and (tax) inspections. However, we have to realise that this
approach never will and never can substitute for professional, comprehensive and extensive e ort carried
out by auditors and inspectors – it can only help them  nd the “weak points”. If an accounting data set
deviates from Benford’s Law, this mere fact is not evidence of data falsi cation or improper manipula-
tions. It is just an indicator of where attention of auditors/inspectors should be focused. If there is such
adeviation, the total  t according to (5) is usually not assessed, but deviations of individual digits are
evaluated to show where the attention should be focused. In other words, tests of  t to Benford’s Law should
only be employed in auditing and inspections as an auxiliary tool in addition to standard procedures, or
as the first step in searching for possible instances of data falsification. All authors who deal with
theuse of Benford’s Law in auditing, taxes and inspections agree on the statement cited in the preceding
sentence.15
Benford’s Law has a similar application potential in the area of macroeconomic data. Literature in
this area is substantially less extensive than in the previous case, but interesting approaches and results
can even be found here. Undoubtedly the best-known contribution to the discussion on Benford’s Law
is that of Rauch et al. (2011).  e authors of that paper focus on veri cation of Benford’s Law validity
for selected data of national accounts in 27 member states of the European Union in the period from
1999 to 2009 (data in the ESA 1995 methodology). Aware of the problem implied by the large power
of a goodness-of- t test applied to extensive data sets, they decided for a “descriptive” approach based
on ordering themember states according to their values of the total deviation from Benfords Law (5).
e position of each state on this scale may, in their opinion, be of assistance to Eurostat – to what
extent and in what direction Eurostat’s veri cation procedures should be used.  eir analysis (based
on relative frequencies of occurrence for the  rst digit from the le ) showed that the least trustworthy,
from theBenford’s Law viewpoint (more exactly, the average value of the G-statistics) were the national
accounts data of not only Greece, but also of Belgium, Romania, and Latvia. On the other hand, the best
t to Benford’s Law was identi ed for national accounts data of Luxembourg, Portugal, the Netherlands,
Hungary, Poland, and the Czech Republic.
ose excellent results of the Czech Republic inspired us to verify the validity of Benfords Law on
new data of national accounts processed and published by the Czech Statistical O ce according to the
ESA 2010. Our ambition is not to prove the validity of Benfords Law in a wider context of national
accounts time series, in which even more favourable results would certainly be achieved, but to illustrate
the possibilities of this tool in checking data quality.  e data set we tested for  t to Benford’s Law for
the  rst and second digits from the le was that of national accounts data of the Czech Republic in 2013
(the preliminary report for 2013). Altogether there were 2 817 digits at the  rst position from the le ,
and 2 729 digits at the second position. Statistics (5), (6), and (7) are used for testing the  t.  e results
for the  rst digit from the le are shown in Table 4.
15 Cf., e.g., Carslaw (1988), Nigrini (2005), Nigrini (1996), Guan et al. (2006), Niskanen and Keloharju (2000) or Watrin
etal. (2008).
16 Cf., e.g., Nye and Moul (2007) or Gonzales-Garcia and Pastor (2009).
17 Generally, data sets connected with the Stability and Growth Pact were considered. Altogether there were 36 691 nume-
rals in 297 sets.
18 Nonetheless, the problem with Greece's national accounts had been known before. As early as in 2002, Eurostat twice re-
jected data of the general government in Greece due to untrustworthiness, and again in 2004 (cf. Report by Eurostat on the
Revision of the Greek Government De cit and Debt Figures – <http://ec.europa.eu/eurostat/documents/4187653/5765001/
GREECE-EN.PDF>).
19 Data of the Czech Republic only showed a signi cant deviation from Benford's Law in 2002, when the value of the test
criterion (5) exceeded the critical value of 15.5.
2015
61
95 (2)
STATISTIKA
The entries in Table 4 clearly show that, regarding the first digit from the left, the data of the
national accounts of the Czech Republic in 2013 comply with Benford’s Law for all three characteris-
tics. In the goodness-of- t test we obtain statistics G = 13.00, which is smaller than the critical value of
20.95 [8] =15.5; hence the hypothesis is accepted that the empirical and theoretical (Benford’s) distribu-
tions are identical.  e values of the Zd-statistics for each of the digits are all smaller than the critical
values of the normed normal distribution (u0.975 = 1.96). We can therefore observe that, for none of the
digits, the di erences between the empirical and theoretical frequencies are deemed statistically signi -
cant.  e MAD characteristic also indicates a good  t (cf. Table 3) of the data structure of the national
accounts of the Czech Republic in 2013 to Benford’s Law. Figure 1 illustrates the  r between the empirical
frequencies and theoretical probabilities for the  rst digit from the le .
First digit from
the left
Absolute
frequency
nd
Relative
frequency
pd
Probability
πd
GZ
dMAD
1 858 0.305 0.301 0.000042 0.390146 0.004
2 517 0.184 0.176 0.000314 1.011605 0.007
3 384 0.136 0.125 0.001036 1.797649 0.011
4 262 0.093 0.097 0.000157 0.668436 0.004
5 198 0.070 0.079 0.000999 1.713259 0.009
6 181 0.064 0.067 0.000108 0.534417 0.003
7 180 0.064 0.058 0.000601 1.300798 0.006
8 124 0.044 0.051 0.000995 1.675934 0.007
9 113 0.040 0.046 0.000696 1.388463 0.006
Total 2 817 1.000 1.000 13.004949 x 0.006
Table 4 Fit to Benford's Law –  rst digit from the lef t
Source: <www.czso.cz>, authors' own calculations
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
123456789
First digit from the left
empirical theoretical
Figure 1 Fit to Benford's Law –  rst digit from the left
Source: <www.czso.cz>, authors' own calculations
ANALYSES
62
e results of the comparison between the data structure of the national accounts of the Czech Republic
in 2013 and Benford’s Law are shown in Table 5.
Second digit
from the left
Absolute
frequency
nd
Relative
frequency
pd
Probability
πd
GZ
dMAD
0 374 0.137 0.120 0.002422 2.710896 0.017
1 318 0.117 0.114 0.000056 0.385125 0.003
2 307 0.112 0.109 0.000112 0.555222 0.003
3 267 0.098 0.104 0.000365 1.023155 0.006
4 314 0.115 0.100 0.002268 2.590616 0.015
5 236 0.086 0.097 0.001141 1.824810 0.011
6 211 0.077 0.093 0.002644 2.787807 0.016
7 227 0.083 0.090 0.000517 1.211364 0.007
8 235 0.086 0.088 0.000041 0.314340 0.002
9 240 0.088 0.085 0.000102 0.517204 0.003
Total 2 729 0.863 1.000 19.774976 x 0.009
Table 5 Fit to Benford's Law – second digit from the left
Source: <www.czso.cz>, authors' own calculations
Items in Table 5 prove that national accounts data of the Czech Republic in 2013 do not fully comply
with Benford’s Distribution regarding the second digit from the le . In the goodness-of- t test we obtain
statistics G = 19.77, which is higher than the critical value of 20.95 [9] = 16.9; hence the hypothesis is rejected
that the empirical and theoretical (Benford’s) distributions are identical.  e values of the Zd-statistics
show that the deviations (bold print in Table 5) from the probabilities given by Benford’s Law are present
for digits 0, 4, and 6; for them, the corresponding values of the Zd-statistics are larger than the critical
value, which is the quantile of the normed normal distribution (u0.975 = 1.96); hence these deviations are
deemed statistically signi cant.  e MAD characteristic indicates “only” acceptable  t (cf.Table 3) of
thedata structure of the national accounts of the Czech Republic in 2013 to Benfords Law.
empirical theoretical
0.000
0.020
0.040
0.060
0.080
0.100
0.120
0.140
0.160
0123456789
Second digit from the left
Figure 2 Fit to Benford's Law – second digit from the left
Source: <www.czso.cz>, authors' own calculations
2015
63
95 (2)
STATISTIKA
Let us recapitulate: the evaluation of the  t of the national accounts data of the Czech Republic in 2013
to Benford’s Law with respect to the second digit from the le , the  t has not been proved and thedi er-
ences are signi cant for digits 0, 4, and 6. However, their more frequent occurrence does not enable us
to draw any principal conclusions because this phenomenon is related to a preliminary report. It will be
interesting to re-evaluate the situation when the  nal report of 2013 has been published. We can also see
in Figure 2 that the di erences for the second digit from the le are not of a principal nature.
CONCLUSIONS
As already stated above, the role of Benford’s Law is that of a detection and indicator tool. Deviations of
empirical data, i.e., relative frequencies of occurrence for digits 1, 2, …, 9 as the  rst (or second) digit
from the le , from Benford’s Law at the beginning of the veri cation process are not, as such, manifes-
tations of infringement on (say, accounting) rules. At the beginning of the analysis, such deviations are
just partial signals that there is certain discrepancy from Benford’s Law. Nothing more, and nothing less.
Such a signal may be used as recommendation in what direction subsequent analysis should be carried
out. Namely, it should focus on the items (accounts, subsets, etc.) for which the highest degree of devia-
tion is shown, e.g., within the Z-test, – Formula (6).
Di erent situations may arise. Either the revealed deviations are explained in a factual and prescribed
way (if the deviation is not random) or no such explanation is identi ed. In the latter case, it should be
seriously investigated why and how the deviation occurred. From experience, a number of instances are
known in which unexplained deviations led to identi cation of principal departures from prescribed
procedures and even forensic proceedings were initiated against the parties concerned.
e described approach is open to discussion. Economists, auditors, accountants etc. have varied
opinions about the detection potential of Benford’s Law. On the one hand there are zealous advocates of
a notion that a signal triggered by a deviation from Benford’s Law in, say, macroeconomic data (i.e., data
on the macroeconomic level) or accounting data (i.e., on the corporate level) is a really serious event to
which proper attention should be given because it will lead to the root from which errors – sometimes
fully intentional – stem. On the contrary, there are those who feel that the detection role of Benfords
Law is a mere formality because the root of the errors will be discovered anyway.
Trust in detection and signalling roles of Benfords Law thus mainly depends on the level of personal
experience of those who may use this checking approach. A theoretical dispute aimed at creating a feeling
that Benford’s Law is useful usually misses this target.  is observation is based on practical experience
of the authors of the present paper.
References
BENFORD, F.  e Law of Anomalous Numbers. Proceedings of the American Philosophical Society, Vol. 78, No. 4, 1938,
pp. 551–572.
CARSLAW, C. A. P. N. Anomalies in Income Numbers: Evidence of Goal Oriented Behavior. e Accounting Review,
Vol. 63, No. 2, 1988, pp. 321–366.
GONZALESGARCIA, J., PASTOR, G. Benford’s Law and Macroeconomic Data Quality [online]. IFM Working Paper 09/10,
2009, International Monetary Fund. <https://www.imf.org/external/pubs/ /wp/2009/wp0910.pdf>.
GUANG, L. et al. Auditing, Integral Approach to Quarterly Reporting, and Cosmetic Earnings Management. Managerial
Auditing Journal. Vol. 21, No. 6, 2006, pp. 569–581.
HILL, T. P. A Statistical Derivation of the Signi cant-Digit Law. Statistical Science, Vol. 10, No. 4, 1995a, pp. 354–363.
HILL, T. P. Base-Invariance Implies Benford’s Law. Proceedings of the American Mathematical Society, Vol. 123, No. 3, 1995b,
pp. 887–895.
HILL, T. P.  e First Digit Phenomenon. American Scientist, Vol. 86, 1998, pp. 358–363.
KNUTH, D. E. e Art of Computer Programming. 3rd ed. Addison-Wesley, Reading, MA, Vol. 2, 1997, pp. 253–264.
KOSSOVSKY, A. E. Benford’s Law. Singapore: World Scienti c Publishing, 2015. ISBN 978-98-145-8368-8.
MILLER, S. J. (ed.) Benford’s Law.  eory and Applications. Princeton University Press, 2015. ISBN 978-0-691-14761-1.
ANALYSES
64
NEWCOMB, S. Note on the Frequency of Use of the Di erent Digits in Natural Numbers. American Journal of Mathemat-
ics, Vol. 4, No. 1, 1881, pp. 39–40.
NIGRINI, M. J. A Taxpayer Compliance Application of Benford’s Law: Tests and Statistics for Auditors. Journal of the Ameri-
can Taxation Association, Vol. 18, No. 1, 1996, pp. 72–79.
NIGRINI, M. J. An Assessment of the Change in the Incidence of Earnings Management around the Enron-Andersen
Episod e. Review of Accounting and Finance, Vol. 4, No. 1, 2005, pp. 92–110.
NIGRINI, M. J. Forensic Analytics: Methods and Techniques for Forensic Accounting Investigations. John Wiley, 2011. ISBN
978-0-470-89046-2.
NISKANEN, J., KELOHARJU, M. Earnings cosmetics in a tax-driven accounting environment: evidence from Finnish
public  rms. European Accounting Review, Vol. 9, No. 3, 2000, pp. 443–452.
NYE, J., MOUL, C.  e Political Economy of Numbers: On the Application of Benford’s Law to International Macroeco-
nomic Statistics. e B. E. Journal of Macroeconomics, Vol. 7, No. 1, 2007, pp. 1–14.
RAIMI, R. A.  e Peculiar Distribution of First Digits. Scienti c American, Vol. 221, No. 6, 1969a, pp. 109–119.
RAIMI, R. A. On Distribution of First Significant Figures. American Mathematical Monthly, Vol. 76, No. 4, 1969b,
pp. 342–348.
RAIMI, R. A.  e First Digit Problem. American Mathematical Monthly, Vol. 83, No. 7, 1976, pp. 521–538.
RAUCH, B. et al. Fact and Fiction in EU-Governmental Economic Data. German Economic Review, Vol. 12, No. 3, 2011,
pp. 243–255.
WATRIN, CH. et al. Benford’s Law. An Instrument for Selecting Tax Audit Targets. Review of Managerial Science, Vol. 3,
No 3, 2008, pp. 219–237.
... Zákon tedy říká, že pravděpodobnost výskytu číslice 1 na první pozici je log 10 2 ∼ = 0,301, pravděpodobnost výskytu číslice 2 na první pozici je log 10 (3/2) ∼ = 0,176, pravděpodobnost výskytu číslice 3 na první pozici je log 10 (4/3) ∼ = 0,124 atd., až k číslici 9, kde log 10 (10/9) ∼ = 0,046. V souborech obsahujících alespoň stovky čísel se tak vyskytují na první pozici číslice s relativní četností uvedenou v tab. 1 a obr. 1. Tab. 1: Očekávané relativní četnosti číslic podle Benfordova zákona (Nigrini, 1996) Benfordův zákon je možné formulovat v obecnějším tvaru popisujícím pravděpodobnost výskytu druhé číslice (Berger, 2011;Hindls, 2015): ...
... Na závěr můžeme tedy konstatovat, že relativní četnosti výskytu první a druhých číslic jsou v dobré shodě s Benfordovým zákonem. 9 Benford -ano, či ne? ...
... . Testem dobré shody nulovou hypotézu nezamítáme, distribuce prvních číslic může odpovídat Benfordovu zákonu (obr. 3).9 V biologii, ekonomii a inženýrských disciplínách se obvykle používá hladina významnosti α = 0,05, tj. ...
Article
Full-text available
This article refers to the Benford's Law, also known as the first- -digit law, which is one of the most mysterious laws of nature. The article provides the basic characteristic of the law and a simple, intuitive explanation of why and when the law applies. The last part is focused on using the law in case of suspicion that the data are manipulated.
... After that, the method was used by Michalski and Stoltz to detect errors in macroeconomic data [25]. Application of Benford's Law and possibilities for its use in international and governmental macroeconomic statistics can be found in [16,20,32]. A guide for detecting errors in transaction data was given in [30,31]. ...
Article
This paper studies the fundamental properties of Benford’s Law which investigates the distribution of the first digits’ appearance within datasets. The purpose and the usefulness of the research developed within the paper are to identify additional distributions, beyond those already investigated, that conform to the Benford distribution. As a main contribution, we state and prove with the new approach that the Pareto distribution and appropriate constant times Weibull density function, under some parameter constraint, obey Benford’s Law. Further, with the statistical tests and simulation method, we quantify how the fit varies as the parameters of the Pareto distribution change. As Benford’s Law is one of the main used approaches for detecting data manipulations and frauds in practice, we use that methodology to consider eventual manipulations in a set of data from the financial reports of three private hospitals operating in Serbia. Moreover, we present the conformity of the Weibull distribution to Benford’s Law through the analysis of real-world data, where in the Weibull distribution demonstrates a good fit, even proof of that conformity is a known result in the literature. By demonstrating the adherence of Benford’s characteristics to the Pareto and Weibull distributions, commonly employed for modeling in various fields, those findings can be utilized in many practical studies.
... This research was applied to identify data corruption in social sciences [4]. Starting with [5][6][7], deviations from BL were used to expose possible cases of tax evasion [8], election fraud [9], fabrication of clinical data [10], or misrepresentation of data in government statistics [11]. There are further examples in other fields, such as astrophysics [12], atomic physics [13], biochemistry [14], library science [15], material sciences [16], theology [17], or epidemiology [18], where a recent application used BL to identify distorted COVID-19 counts [19]. ...
Article
Full-text available
Benford’s law (BL) specifies the expected digit distributions of data in social sciences, such as demographic or financial data. We focused on the first-digit distribution and hypothesized that it would apply to data on locations of animals freely moving in a natural habitat. We believe that animal movement in natural habitats may differ with respect to BL from movement in more restricted areas (e.g., game preserve). To verify the BL-hypothesis for natural habitats, during 2015–2018, we collected telemetry data of twenty individuals of wild red deer from an alpine region of Austria. For each animal, we recorded the distances between successive position records. Collecting these data for each animal in weekly logbooks resulted in 1132 samples of size 65 on average. The weekly logbook data displayed a BL-like distribution of the leading digits. However, the data did not follow BL perfectly; for 9% (99) of the 1132 weekly logbooks, the chi-square test refuted the BL-hypothesis. A Monte Carlo simulation confirmed that this deviation from BL could not be explained by spurious tests, where a deviation from BL occurred by chance.
... Newcomb-Benford Law has applications in various fields of economics but the most important one is as a tool for forensic accounting and fraud detection, Nigrini (1996). Other applications of Newcomb-Benford Law are for campaign fraud detection, Cho and Gaines (2007), governmental statistics inspection, Hindls and Hronová (2015), fraudulent scientific data, Diekmann (2007) and for inspection whether countries falsify their economic data strategically, Michalski and Stoltz (2013). Jošić and Žmuk (2018) used Benford's Law for psychological pricing detection. ...
Article
Full-text available
The COVID-19 infection started in Wuhan, China, spreading all over the world, creating global healthcare and economic crisis. Countries all over the world are fighting hard against this pandemic; however, there are doubts on the reported number of cases. In this paper Newcomb-Benford Law is used for the detection of possible false number of reported COVID-19 cases. The analysis, when all countries have been observed together, showed that there is a doubt that countries potentially falsify their data of new COVID-19 cases of infection intentionally. When the analysis was lowered on the individual country level, it was shown that most countries do not diminish their numbers of new COVID-19 cases deliberately. It was found that distributions of COVID-19 data for 15% to 19% of countries for the first digit analysis and 30% to 39% of countries for the last digit analysis do not conform with the Newcomb-Benford Law distribution. Further investigation should be made in this field in order to validate the results of this research. The results obtained from this paper can be important for economic and health policy makers in order to guide COVID-19 surveillance and implement public health policy measures.
... Idea of using Benford's law in fraud detection has been also used in campaign fraud detection, Cho and Gaines (2007), Deckert, Myagkov and Ordeshook (2011). The application of Benford's law and possibilities for its use in international and governmental macroeconomic statistics can be found in papers Nye and Moul(2007)andHindls and Hronová (2015). It was found that World Bank international GDP data and purchasing power parity corrected Penn World tables for OECD countries conformed well to Benford's Law. ...
Article
Full-text available
This paper presents the application of Benford's law in psychological pricing detection. Benford's law is naturally occurring law which states that digits have predictable frequencies of appearance with digit one having the highest frequency. Psychological pricing is one of the marketing pricing strategies directed on price setting which have the psychological impact on certain consumers. In order to investigate the application of Benford's law in psychological pricing detection , Benford's law is observed in the case of first and last digits. In order to inspect if the first and last digits of the observed prices are distributed according to the Benford's law distribution or discrete uniform distribution respectively, mean absolute deviation measure, chi-square tests and Kolmogorov-Smirnov Z tests are used. Results of the analysis conducted on three price datasets have shown that the most dominating first digits are 1 and 2. On the other side, the most dominating last digits are 0, 5 and 9 respectively. The chi-square tests and Kolmogorov-Smirnov Z tests have showed that, at significance level of 5%, none of the three observed price datasets does have first digit distribution that fits to the Benford's law distribution. Likewise, mean absolute deviation values have shown that there are large differences between the last digit distributions and the discrete uniform distribution implying psychological pricing in all price datasets.
Article
Full-text available
A new method was proposed to estimate ecosystem naturalness. Three species-plot (𝑆 Χ 𝐴) datasets were used. Those data sets belong to Sultan mountain sub-district (𝐵𝑆) (60 Χ 96) Dedegül mountain sub-district (𝐵𝐷) (89 Χ 119) and, Beyşehir Watershed (𝐵) (98 Χ 215) consisting of both of the sub-districts. Firstly, chi square test (𝜒2) was applied to define the statistical goodness of fit between the first digit observed probabilities (𝑑1𝑝𝑜) and the theoretical probabilities of Benford’s Law (𝑑1𝑝𝑒). It was found that 𝜒2(𝑒𝐵𝑆)=16.579 and 𝜒2(𝑒𝐵𝐷)=2.406. Secondly, to find the fittest theoretical probabilities for 𝐵𝑆 and 𝐵𝐷, generalized Benford’s Law (𝐺𝐵(𝑑;𝛾)) was applied. Minimal 𝜒2 values were obtained at 𝛾=0.65 and 𝛾=0.07 for 𝐵𝑆 and 𝐵𝐷 respectively (𝜒2(𝑒𝐵𝐷𝛾)=4.992, 𝜒2(𝑒𝐵𝐷𝛾)=2.209). As expected, 𝜒2 values of the sub-districts decreased by generalized Benford’s Law. The most dramatic 𝜒2 decrease occurred in 𝐵𝑆. The number of sample plots of the sub-districts are different. Two random iterative processes happened 10000 times were therefore performed considering the number of sample plots of the sub-districts in 𝐵 dataset. As a result 10000 𝜒2 values were obtained for each sub-district. Average values of those 𝜒2 values were then used (𝜒2̅̅̅ 𝑘(𝐸𝐵𝑆𝛾)=6.747 and 𝜒2̅̅̅ 𝑘(𝐸𝐵𝐷𝛾)=6.176) to calculate calibration coefficients of each sub-district. Naturalness values of 𝐵𝑆 and 𝐵𝐷 were found to be 4.992 and 2.414 respectively due to calibration coefficients of 𝐵𝑆= 𝜒2̅̅̅ 𝑘(𝐸𝑚𝑎𝑥𝛾)𝜒2̅̅̅ 𝑘(𝐸𝐵𝑆𝛾)⁄=1 and 𝐵𝐷=𝜒2̅̅̅ 𝑘(𝐸𝑚𝑎𝑥𝛾)𝜒2̅̅̅ 𝑘(𝐸𝐵𝐷𝛾)⁄=1.093. Since the perfect naturalness value is theoretically equal to 0, the obtained results indicate that 𝐵𝐷 ecosystems are more natural than 𝐵𝑆 ecosystems. Keywords: Hemeroby, Naturalness, Forest ecosystems, First digit rule, Generalization, Randomization
Chapter
Full-text available
In order to detect evidence of fraud effectively, it is essential for the auditor to be aware of new and differentiated methods. Thus, the auditor can identify and assess the risks of material misstatement so that auditing is as reliable as possible. In this sense, the relevance of the application of the Benford's Law arises in order to demonstrate that the identification of situations of greater risk of fraud is appropriate in auditing. The objective of this study is to analyze the behavior of 27,058 Portuguese companies.
Chapter
Full-text available
In this paper, we combine interviews, surveys, and panel data regression to find factors affecting corporate income tax (CIT) non-compliance. This study is based on the analysis of 105 Vietnamese companies which were inspected by tax officials in 2011–2015. The results show that the following seven factors affect CIT non-compliance: the ratio of Working Capital/Total Assets, Turnover/Total Assets, Previous Loss, Inventories/Total Assets, Accounts Receivable/Turnover, size of the enterprise, and debt fines for tax administrative/tax amounts payable in the period. The article shows that the information from the financial statements can help the tax officials detect the CIT non-compliance, and suggest appropriate tax management policies for enterprises in Vietnam.
Chapter
Full-text available
In this study, we apply statistical methods based on Benford’s Law for checking accuracy of tax reports. Specifically, instead of the usual practice of randomly selecting documents for detailed scrutiny, the proposed method selects the most suspicious document. Our experience of using this method has shown that its application has drastically increased the probability of detecting tax fraud. This method is relatively easy to use, it is based on a simple Excel-based algorithm.
Book
Full-text available
Benford's law states that the leading digits of many data sets are not uniformly distributed from one through nine, but rather exhibit a profound bias. This bias is evident in everything from electricity bills and street addresses to stock prices, population numbers, mortality rates, and the lengths of rivers. Here, Steven Miller brings together many of the world's leading experts on Benford's law to demonstrate the many useful techniques that arise from the law, show how truly multidisciplinary it is, and encourage collaboration. Beginning with the general theory, the contributors explain the prevalence of the bias, highlighting explanations for when systems should and should not follow Benford's law and how quickly such behavior sets in. They go on to discuss important applications in disciplines ranging from accounting and economics to psychology and the natural sciences. The contributors describe how Benford's law has been successfully used to expose fraud in elections, medical tests, tax filings, and financial reports. Additionally, numerous problems, background materials, and technical details are available online to help instructors create courses around the book. Emphasizing common challenges and techniques across the disciplines, this accessible book shows how Benford's law can serve as a productive meeting ground for researchers and practitioners in diverse fields.
Article
In 2001 Enron filed amended financial statements setting off a chain of events starting with its bankruptcy filing and including the conviction of Arthur Andersen for obstruction of justice. The end of 2001 and the first half of 2002 included a heightened level of publicity for the accounting practices of listed companies. This paper addresses whether there was a detectable change in the incidence of earnings management around this time period. Earnings reports released in 2001 and 2002 were analyzed. The results showed that revenue numbers were subject to upwards management. Benford's Law was used to detect such manipulations. Earnings Per Share (EPS) numbers showed a marked discontinuity in the distribution around zero which is consistent with upwards management. The results also showed a tendency towards neat round EPS numbers such as 0.10, 0.20, etc. The overall results are consistent with a small but noticeable increase in earnings management in 2002. Enron's reported numbers are reviewed and these show a strong tendency towards making financial thresholds.