Content uploaded by Mihaela Denisa Coman

Author content

All content in this area was uploaded by Mihaela Denisa Coman on Oct 18, 2018

Content may be subject to copyright.

Journal of Science and Arts Year 18, No. 1(42), pp. 167-172, 2018

ISSN: 1844 – 9581 Mathematics Section

ORIGINAL PAPER

USING BENFORD’S LAW IN THE ANALYSIS

OF SOCIO-ECONOMIC DATA

DAN-MARIUS COMAN1*, MARIA-GABRIELA HORGA2, ALEXANDRA DANILA2,

MIHAELA-DENISA COMAN3

_________________________________________________

Manuscript received: 12.09.2017; Accepted paper: 22.11.2017;

Published online: 30.03.2018.

Abstract: The article represents a theoretical research which takes into account the

statistical model known as Benford’s Law, followed by a quantitative application-based

research in which we have used Microsoft Excel elements (statistical and graphical functions)

on a set of socio-economic data in order to emphasize how easy it is for a wide range of users

to apply the theoretical concepts mentioned.

Keywords: Benford’s Law, statistical test, spreadsheet

1. INTRODUCTION

The statistical model popularized by Benford [1] in 1938 and developed by Newcomb

[8] is researched and discussed in a wide range of articles and highlights the universality of

applying the method on a set of data such as: electricity bill payment [2], accounting fraud

detection and financial statement audits [7], socio-demographic studies [9]. This research

aims to use the statistical model popularized by Benford in order to show how easy it is to

apply it to a set of data by utilizing a software application (Microsoft Excel) available in the

application package a computer is usually equipped with.

2. OBJECTIVES AND METHODOLOGY OF THE RESEARCH

The article represents a positive research approach, in which the theoretical concept is

reflected and practically demonstrated by an example in the socio-economic field. The

method employed in this research was the consultation of the specialized literature presented

in the bibliography, along with the development and detailing of the theoretical concept by

means of a practical example. Through its practical nature, the analytical procedure based on

the quantitative model of Benford’s law highlights a statistical relationship and falls under the

pragmatic research.

1 Valahia University of Targoviste, Faculty of Economic Sciences, 130004 Targoviste, Romania.

* Corresponding author E-mail: cmnmarius@yahoo.com.

2 Ovidius University of Constanta, Faculty of Economic Sciences, 900470 Constanta, Romania.

E-mail: alexandradanila14@yahoo.com , gabi_horga@gmail.com.

3 Valahia University of Targoviste, Institute of Multidisciplinary Research for Science and Technology, 130004

Targoviste, Romania. E-mail: cmndenisa@yahoo.com.

Using Benford’s Law in … Dan-Marius Coman

www.josa.ro Mathematics Section

168

3. MATHEMATICAL FOUNDATIONS OF BENFORD’S LAW

Benford’s law refers to the frequency distribution of digits in various situations of

occurrence of numerical data within real-life sources of data. The digit 1 occurs as leading

digit in 30% of the cases, while greater digits occur with smaller probability. The distribution

of first digits is as large as the interval of a logarithmic scale and the results apply to a wide

variety of data. Mathematically, the distribution law applies to base-10 numbers, but there are

applications of the law for the distribution of other numbers in other bases or for the second or

the following digits of a number.

A number string follows Benford’s distribution if the distribution probability of one

digit (di) of the numbers (d=d0d1.....) represented in the in the b-base number system satisfies

the law:

()=1 + 1

,= 1 … ( 1)

(1)

()= log

1 + 1

+,= 0 . . ( 1)

(2)

()= log1 + 1

+ +,

= 0 . . ( 1)

(3)

Numerically, the first digits of a number following Benford’s law have the distribution

in Table 1. Table 1. Data distribution according to Benford’s law.

Digit

Probability

1

0.30103

2

0.17609

3

0.12494

4

0.09691

5

0.07918

6

0.06695

7

0.05799

8

0.05115

9

0.04576

The probability p(d) is proportional to the width of the interval between d and d+1 on

a logarithmic scale, so it will comply with the expected distribution of the mantissa of the

logarithm of that particular number, but not of that number proper, as it is probabilistically

uniformly distributed. The distribution law is named after the physicist Frank Benford who

formulated it intuitively in 1938. According to [3], Benford’s law was explained in several

ways:

• Consequence of the exponential growth process: in principle, it is based on the

assumption that the mantissa of logarithms of numbers is uniformly distributed. This is

correct if the numbers are themselves distributed on multiple orders of magnitude;

• Scalar invariance: the distribution of digits in a real list does not depend on the

unit of measurement used, in other words, multiplication by the same constant will not affect

Using Benford’s Law in … Dan-Marius Coman

ISSN: 1844 – 9581 Mathematics Section

169

distribution. The phenomenon is called scalar invariance and the variables which are log-

normally distributed comply with this property;

• Multiple probability distributions.

The model of Benford’s law applies to datasets which are distributed on several orders

of magnitude. A direct consequence resulting from this observation is that the model is not

valid if we want to check the values in a list (for example bills, payments) between two limit

values (for example between 50,000 and 100,000) or above a minimal value or below a

maximal value.

The particular criteria [10] which limit the application of Benford’s law have been

emphasized on accounting datasets and refer to:

• Assigning numbers for bills, cheques;

• Influencing numbers by human subjective decisions (prices like 9.99);

• Accounts with maximum and minimum limits.

4. PRACTICAL APPLICATION

In addition to applications in the field of fundamental sciences [5] Benford’s law has

found an application in fraud detection in all economic activities. That is why the model has

been included in the CAATS (Computer Auditing Techniques) analytical procedures.

In this article, Benford’s model is applied in order to detect abnormal results in a list

which include data related to the number of people in small towns of the United States of

America [11]. The main goal of applying Benford’s model in the dataset established is

determined by the verification of the following research hypothesis:

“The chosen dataset follows the distribution of Benford’s law.”

The dataset comprises 3,141 observations on the number of people in small towns

located in the United States of America. For space reasons, Fig. 1 presents only a screenshot

of the list used in the analysis.

Figure 1. Screenshot of the dataset.

Using Benford’s Law in … Dan-Marius Coman

www.josa.ro Mathematics Section

170

Benford’s law has been applied in Microsoft Excel, which has options (functions,

graphs) to make the necessary calculations. Figures 3 and 4 show the tables in which all the

operations required by this analytical method have been performed.

The calculation of the frequency of occurrence of each digit from 1 to 9 is done in

Table 2.

The formulas which have determined the calculation of the occurrence frequency of

each digit are the following:

• Determining the first digit of numbers in the data series

FLOOR(10^MOD(LOG(Data!A8,10),1),1);

• Creating a contingency table in which every digit determined at the previous point

is represented by the digit 1 on the position corresponding to the digit sequence in the table

header IF(ISERROR($B5) = TRUE,0,IF($B5=3,IF($A5 <= $C$1,1,0),0));

• Determining the sum for each digit in the table header SUM(D5:D4004).

Table 2. Determining the frequency of occurrence of each digit.

Frequency of occurrence of

digits (1 ..9)

992 534 424 260 228 210 173 175 145

Digits 1…9

First digit extracted

from the data series

1 2 3 4 5 6 7 8 9

3

0

0

1

0

0

0

0

0

0

9

0

0

0

0

0

0

0

0

1

2

0

1

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

3

0

0

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

2

0

1

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

3

0

0

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

3

0

0

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

2

0

1

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

4

0

0

0

1

0

0

0

0

0

5

0

0

0

0

1

0

0

0

0

1

1

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

The elements determined at 3 in the previous paragraph help us make the table in

which we shall apply the elements of Benford’s law. The table of analysis is illustrated in

Table 3.

Using Benford’s Law in … Dan-Marius Coman

ISSN: 1844 – 9581 Mathematics Section

171

Table 3. Determining the elements of Benford’s law.

Digit Observed

Probability

% of

Observed

Probability

Expected

Probability

(Benford

models)

% of Expected

Probability

(Benford

models)

Variation

(%)

1

992

31.58

946

30.10

4.68

2

534

17.00

553

17.61

-3.58

3

424

13.50

392

12.49

7.45

4

260

8.28

304

9.69

-17.07

5

228

7.26

249

7.92

-9.08

6

210

6.69

210

6.69

-0.13

7

173

5.51

182

5.80

-5.29

8

175

5.57

161

5.12

8.19

9

145

4.62

144

4.58

0.88

Total

3141

100

3141

100

In Table 3, the frequency of occurrence of the first digit has been inserted from Table

2 by copy-paste and, based on this information, we have determined:

• The proportion of elements observed in the total sample, B2/$B$11*100;

• The number of expected elements according to Benford’s law,

LOG((A2+1)/A2,10)*$B$11;

• The proportion of expected elements according to Benford’s law,

(LOG((A2+1)/A2,10))*100;

• The variation of observed elements in relation to the expected elements, ((B2-

D2)/B2)*100.

At a first view of the Table 3, one may note that the greatest variation between the

probability of observed elements and the probability of expected elements according to

Benford’s law manifests for digit 4 (-17.07%). This is also shown by Fig. 2, in which we see

that the statistical distribution of the first digit follows the theoretical distribution in a similar

manner; we should, however, mention that in the case of the digit 4 there is a greater variation

between statistical distribution and theoretical distribution.

Figure 2. First digit vs. Benford’s Law.

31.58

17.00

13.50

8.28

7.26

6.69

5.51

5.57

4.62

30.10

17.61

12.49

9.69

7.92

6.69

5.80

5.12

4.58

-3.00

2.00

7.00

12.00

17.00

22.00

27.00

32.00

1

2

3

4

5

6

7

8

9

Firstdigit in county population vs. Benford's Law

% of Observed Probability

% of Expected Probability (Benford model)

Using Benford’s Law in … Dan-Marius Coman

www.josa.ro Mathematics Section

172

The observation makes us consider the research question: the chosen dataset follows

the distribution of Benford’s law, which we can transpose in the following hypothesis:

(1) H0: The first digit of the dataset on the observed population follows the theoretical

distribution of Benford’s law

(2) H1: The first digit of the dataset on the observed population does not follow the

theoretical distribution of Benford’s law

To test this hypothesis, we shall use the chi-square test [4] which allows the

comparison between two distributions (observed vs. theoretical) with a view to establishing

either an independence between them or their homogeneity. The chi-square test formula is:

= ( )

(4)

The application of the calculation formula was done in table 4, obtaining a value of

15.43. The obtained value is interpreted by comparing it with the corresponding value in the

chi-square table. Since in this case there are nine variants, the number of degrees of freedom

is c-1, i.e. 8.

Table 4. Determining the elements of the chi-square test.

Digit Observed

Probability

% of

Observed

Probability

Expected

Probability

(Benford

models)

% of Expected

Probability

(Benford

models)

Variation

(%) Chi

square

1

992

31.58

946

30.10

4.68

2.28

2

534

17.00

553

17.61

-3.58

0.66

3

424

13.50

392

12.49

7.45

2.54

4

260

8.28

304

9.69

-17.07

6.47

5

228

7.26

249

7.92

-9.08

1.72

6

210

6.69

210

6.69

-0.13

0.00

7

173

5.51

182

5.80

-5.29

0.46

8

175

5.57

161

5.12

8.19

1.28

9

145

4.62

144

4.58

0.88

0.01

Total

3141

100

3141

100

15.43

The chi-square value of 15.43 is less than that in the Table 4 (15.5073) at 8 degrees of

freedom and a significance threshold p = 0.9847, calculated based on the Microsoft Excel

function CHISQ.DIST(G11,8,TRUE).

Based on chi-square distribution, the probability to reject the null hypothesis is

calculated. Usually the research question is accepted if the probability (p) of rejecting the

hypothesis H0 is less than 5%. In the context of p=0.9847 > 0.05, which denotes that there are

no significant differences between the observed distribution and the theoretical one and

there is no sufficient evidence to believe that the variation observed in the case of digit 4

between the two distributions may be due to the occurrence of a mystification (alteration) of

data reporting regarding the town population.

In the specialized literature [6] a number of problems are reported when comparing

observed and theoretical distributions. Among the observations noted are: issues regarding the

choice of the number of classes, issues on the width of frequency classes, issues regarding the

number of observations inside each frequency class. All these elements lead, in the case of

Benford’s law, to the calculation of an additional test called mean absolute deviation (MAD),

which measures the average of absolute deviation of the frequencies of each digit from

Using Benford’s Law in … Dan-Marius Coman

ISSN: 1844 – 9581 Mathematics Section

173

Benford’s ideal frequency. In this application, MAD is calculated in Table 5 based on the

following formula:

= | |

(5)

The value obtained by applying the formula may fall into three classes according to

the following algorithm: 0.000 < MAD >0.006 – close conformity; 0.006 < MAD < 0.012 –

acceptable conformity; 0.012 < MAD < 0.015 – marginal conformity; MAD > 0.015 – non

conformity.

Table 5. MAD Calculation

Digit Observed

Probability

% of

Observed

Probability

Expected

Probability

(Benford

models)

% of

Expected

Probability

(Benford

models)

Variation

(%) Chi

square MAD

1

992

31.58

946

30.10

4.68

2.28

1.48

2

534

17.00

553

17.61

-3.58

0.66

0.61

3

424

13.50

392

12.49

7.45

2.54

1.01

4

260

8.28

304

9.69

-17.07

6.47

1.41

5

228

7.26

249

7.92

-9.08

1.72

0.66

6

210

6.69

210

6.69

-0.13

0.00

0.01

7

173

5.51

182

5.80

-5.29

0.46

0.29

8

175

5.57

161

5.12

8.19

1.28

0.46

9

145

4.62

144

4.58

0.88

0.01

0.04

Total

3141

100

3141

100

15.43

0.6625

Following MAD calculation, the value 0.6625 is classified as non-conformity, which

supports the idea that there are no significant differences between observed and theoretical

distributions, which has also been shown by the chi-square test.

5. CONCLUSIONS

Benford’s law is a powerful tool, integrated into dedicated applications (e.g. Idea

Casewear), for auditors in identifying the risk of economic fraud, but the example presented is

not intended to highlight abnormal aspects in reporting socio-economic data on the number of

inhabitants of towns in the USA. This paper aims to present, in a simple way, how to use an

easy working tool in order to calculate specific elements of Benford’s law: chi-square test,

mean absolute deviation, the graph of correlation of the researched data series with the

expected data series.

For regular users, beginning practitioners, who do not have access to specialized

applications, the example presented may be a good start in acquiring the skills of using

Microsoft Excel spreadsheet to apply statistical tests for fraud detection in judicial accounting

expertise or in auditing.

Using Benford’s Law in … Dan-Marius Coman

www.josa.ro Mathematics Section

174

REFERENCES

[1] Benford, F., Proceedings of the American Philosophical Society, 4(78), 551, 1938.

[2] Christian, C., Gupta, S., Lin, S. M., National Taxa Journal, 4(46), 487, 1993.

[3] Coracioni, A., Audit Financiar, 11(104), 23, 2013.

[4] Florea, N.V., Mihai, D.C., Journal of Science and Arts, 1(38), 81, 2017.

[5] Jäntschi, L., Bolboaca, S., Stoenoiu, C., Bulletin UASM Agriculture, 66(1), 82, 2009.

[6] Jäntschi L., Proceedings of International Conference on Recent Achievements in

Mechatronics, Automation, Computer Science and Robotics, 239, 2011.

[7] Marcini, S., Hamilton T., Journal of the Risk and Uncertainty, 1(32), 57, 2006.

[8] Newcomb, S., American Journal of Mathematics, 4, 39, 1881.

[9] Sandron, F., Population, 57(4-5), 755, 2002.

[10] Shalini, T., Kinjal, M., IOSR Journal of Economics and Finance, 1-9, 2014.

[11] https://introductorystats.wordpress.com/2011/11/25/benfords-law-and-us-census-data-

part-ii/," last accessed 27.01.2018.