Content uploaded by Richard Hindls

Author content

All content in this area was uploaded by Richard Hindls on Aug 29, 2016

Content may be subject to copyright.

Available via license: CC BY-NC-SA

Content may be subject to copyright.

ANALYSES

54

Benford's Law and Possibilities

for Its Use in Governmental

Statistics1

Richard Hindls2 |University of Economics, Prague, Czech Republic

Stanislava Hronová3 |University of Economics, Prague, Czech Republic

1 is article was written thanks to support from the Institutional Support to Long-Term Conceptual Development of Re-

search Organisation, the Faculty of Informatics and Statistics of the University of Economics, Prague.

2 Faculty of Informatics and Statistics, Department of Statistics and Probability, nám. W. Churchilla 4, 130 67 Prague 3,

Czech Republic. E-mail: hindls@vse.cz.

3 Faculty of Informatics and Statistics, Department of Economic Statistics, nám. W. Churchilla 4, 130 67 Prague 3, Czech

Republic. E-mail: hronova@vse.cz.

Abstract

Benford's Law (sometimes also called Benford's Distribution or Benford's Test) is one of the possible tools for

veri cation of a data structure in a given le regarding the relative frequencies of occurrence of the rst (or

second, etc.) digit from the le . If it is used as a goodness-of- t test on sample data, there are usually no prob-

lems with its interpretation. However, certain factual questions arise in connection with validity of Benford's

Law in large data sets in governmental statistics; such questions should be resolved before the law is used.

In this paper we discuss the application potential of Benford's Law when working with extensive data sets in

theareas of economic and social statistics.

Keywords

Benford's Law, goodness-of- t test, Z-test, national accounts

JEL code

E22, C43

INTRODUCTION

Correctness and indisputability of macroeconomic data is one of the basic principles in governmental

statistics. ese attributes are achieved by the use of veri ed methods to collect and process data, attested

procedures, and balance computations with the aid of all available sources of information. e national

accounts system is one of the “tools” we use for verifying the meaningfulness and cohesion of the govern-

mental statistics. National accounts is a system of inter-related macroeconomic statistical data, arranged

in the form of integrated economic accounts. We can compare this system with a crossword puzzle in

which indices stand for letters. In other words, each entry is added to the total index value in the row, and

one of di erent indices in the column, similar to letters in a crossword puzzle being parts to “down” and

“across” words. is arrangement of data ensures that all items are inter-related and balanced – nothing is

lost and nothing is used to excess. Without disputing the national accounts of any country, it is clear that

a balanced inter-related system of data can be created from ctitious or even incorrect data items. Other

2015

55

95 (2)

STATISTIKA

tools are suitable for verifying that the items of national accounts are indeed correct. In addition to the

usual factual and logical checks on the data sources and procedures, such veri cation can be supported

by certain formal tools. Benford’s Distribution is one of them.

1 WHAT IS BENFORD’S LAW?

e substance of Benford’s Law can easily be expressed in words: in a given set of data, the probability

of occurrence as the rst digit from the le is di erent for each of the digits 1, 2, ... 9. Numbers starting

with one occur more o en than those starting with two, which are in turn more frequent than those

starting with three, etc., and numbers starting with nine are the least frequent ones. is observation

is hard to believe at rst sight. However, its validity has been empirically con rmed ( rst in 1881, and

then again in 1938). anks to a new mathematical approach developed at the end of the 20th century,

this law found its way to be included into the theory of probability. Many a time, successful applications,

including testing mathematical models and computer designs, as well as error detection in accounting,

have indicated its validity.

1.1 Historical Note

By the irony of fate, it was not Frank Benford who assisted at the birth of the distribution that is now

called Benford’s. Neither was he the rst who tried to prove it mathematically. As a matter of fact, Simon

Newcomb in the late 19th century rst de ned a distribution governing the occurrence of numbers with

a given digit as the rst one from the le . R.A. Raimi and T.P. Hill tried to put forth a mathematical

proof of this speci c law in the 1990s.

Curiosity and imagination, besides knowledge and experience, undoubtedly play an important role

in scienti c discoveries. is was also the case of the distribution (law) later called Benford’s. American

mathematician and astronomer Simon Newcomb noticed in a library that the beginning pages in loga-

rithm table books are much more worn out than the rest. On the basis of this observation he realised

that students much more o en look up logarithms of numbers beginning with one than those begin-

ning with two, the latter more o en than those beginning with three, etc., and from that he deduced: the

probability of occurrence for numbers beginning with one is largest, and larger than that for numbers

beginning with two, etc. Empirically he derived4 the following formula for the probability of occurrence

for numbers in which digit d stands the rst from the le :

P(d) = , for d = 1, 2, …, 9. (1)

is rule means that the probability of occurrence of a number beginning with one is 0.3010, beginning

with two 0.1761, etc., to the probability of a number beginning with nine, which is 0.0458. He also derived

probabilities corresponding to the digit second from the le (now, of course, zero has to be included);

mutual di erences are signi cantly lower for digits 0, 1, …, 9 at the second position: the probability of

zero is 0.1197, and that of nine is 0.0850).5

Nowadays Newcomb’s paper has hundreds of citations, but in its time it passed practically without notice

and more or less fell into oblivion. Many years later American physicist Frank Benford also noticed the

irregular wear of logarithmic table books’ pages, and derived the same logarithmic formula for the rst

and second digits from the le . In 1938 he published his conclusions based on studying a large number

of data sets for di erent areas (hydrology, chemistry, but also baseball or daily press – Benford, 1938).

⎟

⎟

⎠

⎞

⎜

⎜

⎝

⎛+d

1

1log10

4 Cf. Newcomb (1881).

5 Cf. Table 1.

ANALYSES

56

Unlike Newcomb’s paper, Frank Benford’s met certain attention, perhaps thanks to recognition of his

name in physics. Newcomb had been forgotten by then and the logarithmic relationship for occurrence

of the rst (and second) digit from the le was “christened” Benford’s.

e wider use of Benford’s Law in the second half of the 20th century brought about a number of ques-

tions concerning its validity. ere were data sets (from natural sciences, economics, but also everyday

life) in which Benford’s Law was valid, but it was always possible to nd situations for its rejection (phone

numbers from a certain area, shoe or cloth sizes, etc.). Naturally, a question arose whether Benford’s Law

can or cannot be proved mathematically. In particular, T. P. Hill (Hill, 1995a; Hill, 1995b; and Hill, 1998),

and R. A. Raimi (Raimi, 1969a; Raimi, 1969b; and Raimi, 1976) tried to nd such a proof, but no strict

mathematical proof was found.6 If nothing else, their theoretical e orts led to an approximate formula-

tion of Benford’s Law validity: if we take random samples from arbitrary distributions, the collection of

these random samples approximately obey the Benford’s Law.7

1.2 Theoretical basis

Formula (1), rst derived by Newcomb and later again by Benford, has a more general validity; or rather,

it can be adapted into a form which de nes occurrence of any digit at the second, third, etc. positions. In

this connection, however, we have to ask whether such occurrence does or does not depend on occurrence

of preceding digit(s) from the le , or is conditional with respect to such occurrence. In other words, in

the former case we deal with probabilities of independent events, while in the latter conditional prob-

abilities are due to be used.

Occurrence of a digit from 1, 2, …, 9 at the rst position from the le is governed by Formula (1),

but occurrence of a digit from 0, 1, …, 9 at the second position from the le (on assumption that it is

independent of occurrence of a particular digit at the rst position from the le ) is given as

, for d = 0, 1, …, 9. (2)

Regarding independent occurrences of digits from 0, 1, …, 9 at the third and following positions,

thelast formula can be generalised:

, for dk = 0, 1, …, 9. (3)

and the mutual di erences between probabilities of occurrence of a particular digit get smaller already at

the second position from the le ; and starting at the h position (independent of the preceding ones)

Benford’s Law approaches the uniform multinomial distribution. Table 1 shows the changes in the prob-

ability values for independent occurrence of digits 0, 1, …, 9 at the rst to h positions from the le .

e results presented above imply that, starting from the third position from the le , di erences in

probability values are very small and only occurrence of digits at the rst and second positions from

thele are interesting from the viewpoint of practical applications.

6 Perhaps the best characterisation is that by R. A. Raimi in the conclusion of his paper (Raimi, 1969b, p. 347). Referring

to the validity of Benford's Law for addresses of 5 000 people from a "Who is Who" publication, he says: "Why should

thestreet addresses of a thousand famous men obey the logarithm law? I know no answer to this question".

7 Cf. Hill (1998) and Raimi (1969b).

⎟

⎠

⎞

⎜

⎝

⎛

+

+= ∑

=dk

dP

k10

1

1log)(

9

1

10

∑

∑

∑∑

=−

=

==

−

⎟

⎟

⎟

⎟

⎠

⎞

⎜

⎜

⎜

⎜

⎝

⎛

+=

9

0

1

10

9

0

9

1

121

10.

1

1log...)(

k

dik

k

i

i

dd

k

d

dP

2015

57

95 (2)

STATISTIKA

Another situation arises when probability of occurrence of a digit from 0, 1, …, 9 at the second posi-

tion from the le is conditional on occurrence of a particular digit from 1, 2, …, 9 at the rst position

from the le . Conditional probability of occurrence for d2 at the second position from the le on the

condition that the rst digit from the le is d1 equals

, for d1 = 1, 2, …, 9, and for d2 = 0, 1, …, 9. (4)

For example, probability of “2” occurring at the second position on condition of “3” being the rst

digit from the le is

.

Values of conditional probability for pairs of digits calculated with the aid of Formula (4) are shown

in Table 2.

Source: Authors' own calculations

j

d 12345

0 x 0.1197 0.1018 0.1002 0.1000

1 0.3010 0.1139 0.1014 0.1001 0.1000

2 0.1761 0.1088 0.1010 0.1001 0.1000

3 0.1249 0.1043 0.1006 0.1001 0.1000

4 0.0969 0.1003 0.1002 0.1000 0.1000

5 0.0792 0.0967 0.0998 0.1000 0.1000

6 0.0669 0.0934 0.0994 0.0999 0.1000

7 0.0580 0.0904 0.0990 0.0999 0.1000

8 0.0512 0.0876 0.0986 0.0999 0.1000

9 0.0458 0.0850 0.0983 0.0998 0.1000

Table 1 Probability of occurrence for digit d at the jth position from the left

d1( rst

digit

from

the left)

d2 (second digit from the left)

0123456789

1 0.1375 0.1255 0.1155 0.1069 0.0995 0.0931 0.0875 0.0825 0.0780 0.0740

2 0.1203 0.1147 0.1096 0.1050 0.1007 0.0967 0.0931 0.0897 0.0865 0.0836

3 0.1140 0.1104 0.1070 0.1038 0.1008 0.0979 0.0952 0.0927 0.0903 0.0880

4 0.1107 0.1080 0.1055 0.1030 0.1007 0.0985 0.0964 0.0943 0.0924 0.0905

5 0.1086 0.1065 0.1045 0.1025 0.1006 0.0988 0.0971 0.0954 0.0938 0.0922

6 0.1072 0.1055 0.1038 0.1022 0.1006 0.0990 0.0976 0.0961 0.0947 0.0933

7 0.1062 0.1047 0.1033 0.1019 0.1005 0.0992 0.0979 0.0966 0.0954 0.0942

8 0.1055 0.1042 0.1029 0.1017 0.1005 0.0993 0.0982 0.0970 0.0959 0.0949

9 0.1049 0.1037 0.1026 0.1015 0.1004 0.0994 0.0984 0.0973 0.0964 0.0954

Table 2 Conditional probability values of occurrence for d2 on condition d1

⎟

⎟

⎠

⎞

⎜

⎜

⎝

⎛+

⎟

⎟

⎠

⎞

⎜

⎜

⎝

⎛

+

+

=

1

10

21

10

12 1

1log

10

1

1log

)/(

d

dd

ddP

1070.0

1249.0

0134.0

3

1

1log

32

1

1log

)3/2(

10

10

12 ==

⎟

⎠

⎞

⎜

⎝

⎛+

⎟

⎠

⎞

⎜

⎝

⎛+

=== DDP

Source: Authors' own calculations

ANALYSES

58

e relationships considered above for Benford’s Law are valid for arbitrary data sets and are invar-

iant with respect to the change of radix base or units of measurement. Equivalently expressed, data sets

governed by Benford’s Law will remain governed even if expressed in a base other than decimal, or in

other units of measurement (physical, currency, etc.) or if the original data items are all multiplied by

an arbitrary constant. is fact implies that any arithmetical operations carried out on data governed by

Benford’s Law will again be governed by the same law.8

e fact that we have at our disposal Benford’s Distribution of the rst (and second) digit from the

le 9 provides us with an option to check any data set for a t to the data structure governed by Benford’s

Law. e best choice for such a procedure is the 2 goodness-of- t test, which can be used as a standard

hypothesis test if the respective data set comes from a random sample. e tested hypothesis, denoted

by H0, asserts the t of the empirical distribution with Benford’s Law, and the alternative hypothesis H1

claims the contrary. e test criterion is the statistics

, (5)

which has, under validity of H0, approximate distribution 2 [8], and

where πd – theoretical relative frequencies under Benford’s “Law;

p

d – empirical relative frequencies; and

n – sample size.

e critical values are the respective quantiles of 2 [8]; on a 5% signi cance level, the 95% quantile

will be used, that is, 20.95 [8] = 15.5. For a test of the t at the second position the procedure would be

similar, but there are ten groups and nine degrees of freedom. If the underlying sample is small, we also

have to respect the condition of a su cient frequency count in each “cell” (nπd

5).

Another option for testing the t of sample data to Benford’s Law is the use of Z-statistics; this proce-

dure again veri es the t between empirical and theoretical frequencies, but separately for each digit,

not as a whole. Under hypothesis H0, the following Z-statistics has approximate normal distribution

, (6)

where πd – theoretical relative frequencies under Benford’s “Law;

p

d – empirical relative frequencies; and

n – sample size.

e critical value (in this case, separate for each digit) is the respective quantile u1– α/2 of the normed

normal distribution. On a 5% signi cance level, we get u0.975 = 1.96. Kossovsky (2015) recommends that

the two-tailed test should always be used, i.e., the critical value given by quantile u1– α/2, because absolute

value stands in the numerator in Formula (6), and therefore it is not necessary to distinguish between

directions of the deviation from Benford’s Law (it means that both lower and higher relative frequencies

than the theoretical value under Benford’s Law admit the same interpretation).

Although both tests lead to conclusions that are intuitively similar, there is a di erence between them.

Namely, the former (G-statistics) comprehensively assesses the validity of Benford’s Law for a given set

8 Cf., e.g., Watrin et al. (2008).

9 For the above-mentioned reasons we are not going to consider more positions from the le .

10 Cf. Kossovsky (2015).

∑

=

−

=

9

1

2

)(

dd

dd

p

nG

π

π

()

dd

dd

d

n

pn

Z

ππ

π

−

⎟

⎠

⎞

⎜

⎝

⎛−−

=

1

2

1

2015

59

95 (2)

STATISTIKA

of rst digits (possibly second ones as well). e particular digit for which the deviation from Benford’s

Law is the highest must be looked up among values

, for d = 1, 2, …, 9, or d = 0, 1, …, 9.

e second approach (Zd-statistics) evaluates the deviation for each individual rst digit indepen-

dently, and it is immediately obvious which rst digits do or do not comply with Benford’s Law. esame

considerations of course apply to testing the t of empirical data to Benford’s Law for the second digit

from the le .

Mean Absolute Di erence (MAD) is also o en used to test the t to Benford’s Law. is approach,

however, goes beyond standard hypothesis testing because the distribution of the MAD statistics is

unknown. e mean absolute di erence value (for the case of the rst digit from the le )11 is

9

9

1

i

dd

p

MAD

, (7)

where πd – theoretical relative frequencies under Benford’s “Law;

p

d – empirical relative frequencies.

Since we do not know the distribution of the MAD statistics, empirical threshold values12 are used for

evaluation the outcome for MAD – cf. Table 3.

11 For testing the second digit from the le , the calculation is similar but there are ten groups.

12 Cf. Nigrini (2011).

13 e fact that validity of Benford's Law has not been proved mathematically is also a frequent topic.

14 From among the most recent ones, we refer to Miller (2015) – it is a very good presentation of applications and experi-

ence with them, especially in the areas of economy, accounting, and also natural sciences.

Table 3 Degrees of t for MAD statistics

Source: Nigrini (2011)

d

dd

p

2

MAD value Degree of t between empirical and theoretical

(Benford’s) distributions

0.000 – 0.006 Close t

0.006 – 0.012 Acceptable t

0.012 – 0.015 Loose t

0.015 plus No t

Unlike the previous approaches, which are classical statistical inference instances, the MAD statistics

is more suitable for verifying the t in a data set not considered a random sample because all data items

in the given area are included. is is o en the case when checking extensive sets in corporate accounting

and macroeconomic data.

2 PRACTICAL APPLICATIONS

e simplicity and, undoubtedly, a certain degree of mystery of Benford’s Law13 have led to a large volume

of literature on this subject.14 Most o en, discussions appear about the use of Benford’s Law in checking

accounting and macroeconomic data.

ANALYSES

60

Using Benford’s Law for veri cation of accounting data correctness is one of the approaches that have

recently been o en used in nancial auditing and (tax) inspections. However, we have to realise that this

approach never will and never can substitute for professional, comprehensive and extensive e ort carried

out by auditors and inspectors – it can only help them nd the “weak points”. If an accounting data set

deviates from Benford’s Law, this mere fact is not evidence of data falsi cation or improper manipula-

tions. It is just an indicator of where attention of auditors/inspectors should be focused. If there is such

adeviation, the total t according to (5) is usually not assessed, but deviations of individual digits are

evaluated to show where the attention should be focused. In other words, tests of t to Benford’s Law should

only be employed in auditing and inspections as an auxiliary tool in addition to standard procedures, or

as the first step in searching for possible instances of data falsification. All authors who deal with

theuse of Benford’s Law in auditing, taxes and inspections agree on the statement cited in the preceding

sentence.15

Benford’s Law has a similar application potential in the area of macroeconomic data. Literature in

this area is substantially less extensive than in the previous case, but interesting approaches and results

can even be found here. Undoubtedly the best-known contribution to the discussion on Benford’s Law

is that of Rauch et al. (2011). e authors of that paper focus on veri cation of Benford’s Law validity

for selected data of national accounts in 27 member states of the European Union in the period from

1999 to 2009 (data in the ESA 1995 methodology). Aware of the problem implied by the large power

of a goodness-of- t test applied to extensive data sets, they decided for a “descriptive” approach based

on ordering themember states according to their values of the total deviation from Benford’s Law (5).

e position of each state on this scale may, in their opinion, be of assistance to Eurostat – to what

extent and in what direction Eurostat’s veri cation procedures should be used. eir analysis (based

on relative frequencies of occurrence for the rst digit from the le ) showed that the least trustworthy,

from theBenford’s Law viewpoint (more exactly, the average value of the G-statistics) were the national

accounts data of not only Greece, but also of Belgium, Romania, and Latvia. On the other hand, the best

t to Benford’s Law was identi ed for national accounts data of Luxembourg, Portugal, the Netherlands,

Hungary, Poland, and the Czech Republic.

ose excellent results of the Czech Republic inspired us to verify the validity of Benford’s Law on

new data of national accounts processed and published by the Czech Statistical O ce according to the

ESA 2010. Our ambition is not to prove the validity of Benford’s Law in a wider context of national

accounts time series, in which even more favourable results would certainly be achieved, but to illustrate

the possibilities of this tool in checking data quality. e data set we tested for t to Benford’s Law for

the rst and second digits from the le was that of national accounts data of the Czech Republic in 2013

(the preliminary report for 2013). Altogether there were 2 817 digits at the rst position from the le ,

and 2 729 digits at the second position. Statistics (5), (6), and (7) are used for testing the t. e results

for the rst digit from the le are shown in Table 4.

15 Cf., e.g., Carslaw (1988), Nigrini (2005), Nigrini (1996), Guan et al. (2006), Niskanen and Keloharju (2000) or Watrin

etal. (2008).

16 Cf., e.g., Nye and Moul (2007) or Gonzales-Garcia and Pastor (2009).

17 Generally, data sets connected with the Stability and Growth Pact were considered. Altogether there were 36 691 nume-

rals in 297 sets.

18 Nonetheless, the problem with Greece's national accounts had been known before. As early as in 2002, Eurostat twice re-

jected data of the general government in Greece due to untrustworthiness, and again in 2004 (cf. Report by Eurostat on the

Revision of the Greek Government De cit and Debt Figures – <http://ec.europa.eu/eurostat/documents/4187653/5765001/

GREECE-EN.PDF>).

19 Data of the Czech Republic only showed a signi cant deviation from Benford's Law in 2002, when the value of the test

criterion (5) exceeded the critical value of 15.5.

2015

61

95 (2)

STATISTIKA

The entries in Table 4 clearly show that, regarding the first digit from the left, the data of the

national accounts of the Czech Republic in 2013 comply with Benford’s Law for all three characteris-

tics. In the goodness-of- t test we obtain statistics G = 13.00, which is smaller than the critical value of

20.95 [8] =15.5; hence the hypothesis is accepted that the empirical and theoretical (Benford’s) distribu-

tions are identical. e values of the Zd-statistics for each of the digits are all smaller than the critical

values of the normed normal distribution (u0.975 = 1.96). We can therefore observe that, for none of the

digits, the di erences between the empirical and theoretical frequencies are deemed statistically signi -

cant. e MAD characteristic also indicates a good t (cf. Table 3) of the data structure of the national

accounts of the Czech Republic in 2013 to Benford’s Law. Figure 1 illustrates the r between the empirical

frequencies and theoretical probabilities for the rst digit from the le .

First digit from

the left

Absolute

frequency

nd

Relative

frequency

pd

Probability

πd

GZ

dMAD

1 858 0.305 0.301 0.000042 0.390146 0.004

2 517 0.184 0.176 0.000314 1.011605 0.007

3 384 0.136 0.125 0.001036 1.797649 0.011

4 262 0.093 0.097 0.000157 0.668436 0.004

5 198 0.070 0.079 0.000999 1.713259 0.009

6 181 0.064 0.067 0.000108 0.534417 0.003

7 180 0.064 0.058 0.000601 1.300798 0.006

8 124 0.044 0.051 0.000995 1.675934 0.007

9 113 0.040 0.046 0.000696 1.388463 0.006

Total 2 817 1.000 1.000 13.004949 x 0.006

Table 4 Fit to Benford's Law – rst digit from the lef t

Source: <www.czso.cz>, authors' own calculations

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

123456789

First digit from the left

empirical theoretical

Figure 1 Fit to Benford's Law – rst digit from the left

Source: <www.czso.cz>, authors' own calculations

ANALYSES

62

e results of the comparison between the data structure of the national accounts of the Czech Republic

in 2013 and Benford’s Law are shown in Table 5.

Second digit

from the left

Absolute

frequency

nd

Relative

frequency

pd

Probability

πd

GZ

dMAD

0 374 0.137 0.120 0.002422 2.710896 0.017

1 318 0.117 0.114 0.000056 0.385125 0.003

2 307 0.112 0.109 0.000112 0.555222 0.003

3 267 0.098 0.104 0.000365 1.023155 0.006

4 314 0.115 0.100 0.002268 2.590616 0.015

5 236 0.086 0.097 0.001141 1.824810 0.011

6 211 0.077 0.093 0.002644 2.787807 0.016

7 227 0.083 0.090 0.000517 1.211364 0.007

8 235 0.086 0.088 0.000041 0.314340 0.002

9 240 0.088 0.085 0.000102 0.517204 0.003

Total 2 729 0.863 1.000 19.774976 x 0.009

Table 5 Fit to Benford's Law – second digit from the left

Source: <www.czso.cz>, authors' own calculations

Items in Table 5 prove that national accounts data of the Czech Republic in 2013 do not fully comply

with Benford’s Distribution regarding the second digit from the le . In the goodness-of- t test we obtain

statistics G = 19.77, which is higher than the critical value of 20.95 [9] = 16.9; hence the hypothesis is rejected

that the empirical and theoretical (Benford’s) distributions are identical. e values of the Zd-statistics

show that the deviations (bold print in Table 5) from the probabilities given by Benford’s Law are present

for digits 0, 4, and 6; for them, the corresponding values of the Zd-statistics are larger than the critical

value, which is the quantile of the normed normal distribution (u0.975 = 1.96); hence these deviations are

deemed statistically signi cant. e MAD characteristic indicates “only” acceptable t (cf.Table 3) of

thedata structure of the national accounts of the Czech Republic in 2013 to Benford’s Law.

empirical theoretical

0.000

0.020

0.040

0.060

0.080

0.100

0.120

0.140

0.160

0123456789

Second digit from the left

Figure 2 Fit to Benford's Law – second digit from the left

Source: <www.czso.cz>, authors' own calculations

2015

63

95 (2)

STATISTIKA

Let us recapitulate: the evaluation of the t of the national accounts data of the Czech Republic in 2013

to Benford’s Law with respect to the second digit from the le , the t has not been proved and thedi er-

ences are signi cant for digits 0, 4, and 6. However, their more frequent occurrence does not enable us

to draw any principal conclusions because this phenomenon is related to a preliminary report. It will be

interesting to re-evaluate the situation when the nal report of 2013 has been published. We can also see

in Figure 2 that the di erences for the second digit from the le are not of a principal nature.

CONCLUSIONS

As already stated above, the role of Benford’s Law is that of a detection and indicator tool. Deviations of

empirical data, i.e., relative frequencies of occurrence for digits 1, 2, …, 9 as the rst (or second) digit

from the le , from Benford’s Law at the beginning of the veri cation process are not, as such, manifes-

tations of infringement on (say, accounting) rules. At the beginning of the analysis, such deviations are

just partial signals that there is certain discrepancy from Benford’s Law. Nothing more, and nothing less.

Such a signal may be used as recommendation in what direction subsequent analysis should be carried

out. Namely, it should focus on the items (accounts, subsets, etc.) for which the highest degree of devia-

tion is shown, e.g., within the Z-test, – Formula (6).

Di erent situations may arise. Either the revealed deviations are explained in a factual and prescribed

way (if the deviation is not random) or no such explanation is identi ed. In the latter case, it should be

seriously investigated why and how the deviation occurred. From experience, a number of instances are

known in which unexplained deviations led to identi cation of principal departures from prescribed

procedures and even forensic proceedings were initiated against the parties concerned.

e described approach is open to discussion. Economists, auditors, accountants etc. have varied

opinions about the detection potential of Benford’s Law. On the one hand there are zealous advocates of

a notion that a signal triggered by a deviation from Benford’s Law in, say, macroeconomic data (i.e., data

on the macroeconomic level) or accounting data (i.e., on the corporate level) is a really serious event to

which proper attention should be given because it will lead to the root from which errors – sometimes

fully intentional – stem. On the contrary, there are those who feel that the detection role of Benford’s

Law is a mere formality because the root of the errors will be discovered anyway.

Trust in detection and signalling roles of Benford’s Law thus mainly depends on the level of personal

experience of those who may use this checking approach. A theoretical dispute aimed at creating a feeling

that Benford’s Law is useful usually misses this target. is observation is based on practical experience

of the authors of the present paper.

References

BENFORD, F. e Law of Anomalous Numbers. Proceedings of the American Philosophical Society, Vol. 78, No. 4, 1938,

pp. 551–572.

CARSLAW, C. A. P. N. Anomalies in Income Numbers: Evidence of Goal Oriented Behavior. e Accounting Review,

Vol. 63, No. 2, 1988, pp. 321–366.

GONZALESGARCIA, J., PASTOR, G. Benford’s Law and Macroeconomic Data Quality [online]. IFM Working Paper 09/10,

2009, International Monetary Fund. <https://www.imf.org/external/pubs/ /wp/2009/wp0910.pdf>.

GUANG, L. et al. Auditing, Integral Approach to Quarterly Reporting, and Cosmetic Earnings Management. Managerial

Auditing Journal. Vol. 21, No. 6, 2006, pp. 569–581.

HILL, T. P. A Statistical Derivation of the Signi cant-Digit Law. Statistical Science, Vol. 10, No. 4, 1995a, pp. 354–363.

HILL, T. P. Base-Invariance Implies Benford’s Law. Proceedings of the American Mathematical Society, Vol. 123, No. 3, 1995b,

pp. 887–895.

HILL, T. P. e First Digit Phenomenon. American Scientist, Vol. 86, 1998, pp. 358–363.

KNUTH, D. E. e Art of Computer Programming. 3rd ed. Addison-Wesley, Reading, MA, Vol. 2, 1997, pp. 253–264.

KOSSOVSKY, A. E. Benford’s Law. Singapore: World Scienti c Publishing, 2015. ISBN 978-98-145-8368-8.

MILLER, S. J. (ed.) Benford’s Law. eory and Applications. Princeton University Press, 2015. ISBN 978-0-691-14761-1.

ANALYSES

64

NEWCOMB, S. Note on the Frequency of Use of the Di erent Digits in Natural Numbers. American Journal of Mathemat-

ics, Vol. 4, No. 1, 1881, pp. 39–40.

NIGRINI, M. J. A Taxpayer Compliance Application of Benford’s Law: Tests and Statistics for Auditors. Journal of the Ameri-

can Taxation Association, Vol. 18, No. 1, 1996, pp. 72–79.

NIGRINI, M. J. An Assessment of the Change in the Incidence of Earnings Management around the Enron-Andersen

Episod e. Review of Accounting and Finance, Vol. 4, No. 1, 2005, pp. 92–110.

NIGRINI, M. J. Forensic Analytics: Methods and Techniques for Forensic Accounting Investigations. John Wiley, 2011. ISBN

978-0-470-89046-2.

NISKANEN, J., KELOHARJU, M. Earnings cosmetics in a tax-driven accounting environment: evidence from Finnish

public rms. European Accounting Review, Vol. 9, No. 3, 2000, pp. 443–452.

NYE, J., MOUL, C. e Political Economy of Numbers: On the Application of Benford’s Law to International Macroeco-

nomic Statistics. e B. E. Journal of Macroeconomics, Vol. 7, No. 1, 2007, pp. 1–14.

RAIMI, R. A. e Peculiar Distribution of First Digits. Scienti c American, Vol. 221, No. 6, 1969a, pp. 109–119.

RAIMI, R. A. On Distribution of First Significant Figures. American Mathematical Monthly, Vol. 76, No. 4, 1969b,

pp. 342–348.

RAIMI, R. A. e First Digit Problem. American Mathematical Monthly, Vol. 83, No. 7, 1976, pp. 521–538.

RAUCH, B. et al. Fact and Fiction in EU-Governmental Economic Data. German Economic Review, Vol. 12, No. 3, 2011,

pp. 243–255.

WATRIN, CH. et al. Benford’s Law. An Instrument for Selecting Tax Audit Targets. Review of Managerial Science, Vol. 3,

No 3, 2008, pp. 219–237.