Content uploaded by Adrian D. Saville

Author content

All content in this area was uploaded by Adrian D. Saville on Dec 21, 2017

Content may be subject to copyright.

SAJEMS NS 9 (2006) No 3 341

USING BENFORD’S LAW TO DETECT DATA ERROR AND FRAUD:

AN EXAMINATION OF COMPANIES LISTED ON THE JOHANNESBURG

STOCK EXCHANGE

AD Saville1

Gordon Institute of Business Science, University of Pretoria

Abstract

Accounting numbers generally obey a mathematical law called Benford’s Law, and this outcome is

so unexpected that manipulators of information generally fail to observe the law. Armed with this

knowledge, it becomes possible to detect the occurrence of accounting data that are presented

fraudulently. However, the law also allows for the possibility of detecting instances where data are

presented containing errors. Given this backdrop, this paper uses data drawn from companies

listed on the Johannesburg Stock Exchange to test the hypothesis that Benford’s Law can be used

to identify false or fraudulent reporting of accounting data. The results support the argument that

Benford’s Law can be used effectively to detect accounting error and fraud. Accordingly, the findings

are of particular relevance to auditors, shareholders, financial analysts, investment managers, private

investors and other users of publicly reported accounting data, such as the revenue services.

JEL M 40

1

Introduction

Albert Ein stein was playing his violin in

a duet with Werner Heisenberg, who was

accompanying him on the piano. After a while

Heisenberg slammed his hands down on the

keys and said: ‘It’s one, two, one, two, Einstein!

Can’t you count?’

(Arthur, 1993)

From the mid-1990s investment markets

witnessed a surge in the incidence of exposed

accounting frauds and irregularities which, in

turn, prompted a significant tightening in the

regulatory environment as part of a regulatory

effort to stamp out the occurrence of accounting

deceit.2 Although recent evidence suggests that

this regulatory response has been effective in

reducing the occurrence of dishonest accounting,

the impact has not been comprehensive.

Moreover, the experience of the past decade

demonstrates that, whilst the country, industry

and business detail behind the data distortions

vary, the cases share a common harmful ailment:

accounting frauds have resulted in considerable

destruction of investor wealth.3 In addition,

recent evidence shows that the number and

size of companies that are disclosing accounting

irregularities and frauds have grown with time.

For example, the number of restatements due

to accounting irregularities in the United States

(US) increased by over 150 percent between

1997 and 2001 (Floyd, 2003, 5). Moreover, the

median size of companies making restatements

in the US, measured by market capitalisation,

increased from $500 million in 1997 to $2 billion

in 2002 (Floyd, 2003, 7). In South Africa, the

trends have been similar, with a growing number

of firms reporting accounting irregularities and

frauds over the past decade.

Against this backdrop, and as noted, numerous

efforts are being made to improve accounting

standards and auditing practices. Regulators

are also at pains to make firms’ managers and

directors more sensitive to the consequences

of financial malpractice. However, the pace

342 SAJEMS NS 9 (2006) No 3

of progress is slow and the effects unfinished.

Moreover, human behaviour is such that

fraudulent practices will linger even in a world

of perfect accounting systems and watertight

auditing practices. Thus, those interested in the

accuracy of publicly reported accounting data

– including auditors, shareholders, financial

analysts, investment managers, private investors

and other users of publicly reported accounting

data, such as the revenue services – must remain

vigilant for fraudulent accounting practices.

Helpfully, at this juncture, a little known but

powerful mathematical law, called Benford’s

Law (Benford, 1938), presents itself as a

potentially potent tool for rooting out fraudulent

practices from a wide array of information sets,

include accounting data. Significantly, the law

has been used in a range of international settings

to detect data error and fraud, including the case

of accounting data. Despite this potential, it is

surprising to find that whilst Benford’s Law has

been used by practitioners in the South African

setting, no attempt has been made to publish

evidence on the effectiveness of Benford’s

Law in detecting accounting data error or

fraud in a domestic setting. This paper aims to

address this gap in research by exploring the

relevance of Benford’s Law in the detection of

anomalies in data presented by firms listed on

the Johannesburg Stock Exchange (JSE).

The remainder of the paper is divided into

five sections. Section 2 provides an overview

of Benford’s Law, while Section 3 examines

the mechanics of employing Benford’s Law to

detect accounting data irregularities as well as

the data set employed in this study. Section 4 is

devoted to analysing the results, and provides

comment on the reliability and relevance of the

tool as a detector of fraudulent or erroneous

accounting data. On this score, the findings of

this study suggest that Benford’s Law has the

capacity to play a helpful role in assisting users

of accounting data detect error or fraud in

financial information. These findings are in line

with expectations and concur with the results of

similar studies carried out in other countries.

Some comment is also made in this section on

areas for further research. Section 6 is devoted

to concluding remarks.

2

An overview of Benford’s Law

In 1881, the astronomer-mathematician

Simon Newcomb published a short article

in the ‘American Journal of Mathematics’

describing his observation that books of

logarithms were more worn in the beginning and

progressively unspoiled throughout (Newcomb,

1881). From this, Newcomb inferred that

researchers (including fellow astronomers and

mathematicians, as well as biologists, sociologists

and other scholars) using the logarithmic tables

were looking up numbers starting with the digit 1

more often than numbers starting with the digit

2. Similarly, Newcomb inferred that researchers

were looking up numbers starting with the digit 2

more often than those beginning with the digit 3,

and so on (Hill, 1998: 1). After a short heuristic

argument, Newcomb (1881: 40) concluded

that the probability (P) that a number (D1) has

the first significant digit (that is, first non-zero

digit) d1 is:

P (D1 = d1) = log10 ,

+

�

��

,

where d1 ∈ {1, 2, … , 9}.

From Newcomb’s rule, it can be calculated that

the probability of 1 occurring as the first digit is

0.301 (or 30.1 percent). Similarly, the probably

of 2 being the first digit is 0.176 (17.6 percent).

In this vein, Table 1 shows the probabilities of

first digits based on the above equation. That

the digits are not equally likely comes as a

surprise to most observers. However, it is even

more striking that Newcomb (1881) was able to

claim the existence of an exact rule describing

the distribution of first digits.

Despite the profound insights offered,

Newcomb’s article went unnoticed. However,

more than half a century later, and independently

of Newcomb’s findings, American physicist Frank

Benford made exactly the same observation

about logarithmic books and concluded the

same first-digit law. But Benford went further

than Newcomb by testing his conjecture with

an ‘effort to collect data from as many fields as

possible and to include a wide variety of types

[of data]’ (Benford, 1938: 551). To be more

specific, Benford’s published findings were

SAJEMS NS 9 (2006) No 3 343

based on 20 229 observations from such diverse

data sets as areas of rivers, atomic weights and

street addresses (in all, 20 widely different data

sets were sampled). Benford’s findings indicated

that the data closely fitted the logarithmic law.4

Moreover, apart from this empirical advantage,

Benford’s paper benefited from a second factor:

it was published adjacent to a soon-to-be famous

physics paper. With Newcomb’s contribution

having become completely forgotten, the

logarithmic probability law came to be known

as Benford’s Law.

Table 1

Probabilities of first- and higher-order significant digits

Digit (= d) Probability of first

significant digit = d

Probability of

second significant

digit = d

Probability of third

significant digit = d

Probability of fourth

significant digit = d

0 Not Applicable 0.11968 0.10178 0.10018

1 0.30103 0.11389 0.10138 0.10014

2 0.17609 0.10882 0.10097 0.10010

3 0.12494 0.10433 0.10057 0.10006

4 0.09691 0.10031 0.10018 0.10002

5 0.07918 0.09668 0.09979 0.09998

6 0.06695 0.09337 0.09940 0.09994

7 0.05799 0.09035 0.09902 0.09990

8 0.05115 0.08757 0.09864 0.09986

9 0.04576 0.08500 0.09827 0.09982

Source: Nigrini (1999: 2)

Before proceeding, it is useful to offer an

intuitive explanation of Benford’s Law. Consider

making a deposit of R100 in a bank account

that pays interest at the rate of 10 percent per

annum. The first digit will continue to be 1

until the account balance rises to R200. This

will take a 100 percent increase which, at an

annual compound rate of 10 percent, would

take about 7.3 years. When the account balance

reaches R200, the first digit will be 2. However,

growing at 10 percent per annum, the account

balance will rise from R200 to R300 in about

4.2 years. Moving from R300 to R400 will take

about three years, and from R900 to R1 000

will require roughly 1.1 years. However, moving

from R1 000 (where the first digit is once again

1) to R2 000, will take 7.3 years. Thus earlier

digits have higher frequencies of occurrence,

with the law holding with any phenomenon that

has a constant or erratic growth rate (Nigrini,

1999: 2-3).

Interestingly, there is also a general significant-

digit law which includes first digits but also

higher order digits (which may be equal to 0)

(Hill, 1996).5 For example, the general law holds

that the probability that the second significant

digit (D2) of a number is equal to d2 is:

P (D2 = d2) =

=

+

+

∑

,

where d2 ∈ {0, 1, … , 9}.

From this general law it follows that the second

significant digits, although monotonically

decreasing in frequency through the digits (as in

the case of first digits), are much more uniformly

distributed than the first digits. As noted, the

rule holds for higher order digits; to illustrate

this point, Table 1 shows the unconditional

probabilities of occurrence for the second, third

and fourth significant digits.

Furthermore, the general law also specifies

the joint distribution of significant digits. For

344 SAJEMS NS 9 (2006) No 3

instance, the general law allows for calculation

of the probability that the first and second digits

are 1 and 2, respectively. Importantly, the joint

distribution is not purely the probability of

the first digit multiplied by the probability of

the second digit. Rather, the significant digits

are dependent.6 To demonstrate this point, a

simple calculation shows that the unconditional

probability that the second digit is 2 is

≅

0.109.

But, the conditional probability that the second

digit is 2 given that the first digit is 1 is

≅

0.115

(Hill, 1998: 2). As an aside, Benford’s Law is

the only probability distribution on significant

digits which is invariant under changes of scale

(for example, converting from English to metric

units or from Yen to Euros), or under changes

of base (for example, replacing base 10 by base

8 or base 2, in which case the logarithmic base

10 is replaced by logarithm to the new base)

(Hill, 1996).7

In proceeding, it is worth noting that in the

65 years since Benford’s article appeared there

have been numerous attempts to ‘prove’ the

law (Hill, 1998: 3). Indeed, by 1990 close on

100 papers had been published focusing on

explaining or deriving the law in theoretical

terms.8 But there have been two main stumbling

blocks to explaining the law. First, some data

sets satisfy the law, whilst others do not. Until

recently, there has not been a clear definition

of a general statistical experiment that would

predict which tables would comply with the

law. Second, although there was some success

in showing that Benford’s Law is the only set of

digital frequencies which remain fixed under

scale changes, none of the proofs were rigorous

as far as probability theory is concerned.

Recently, however, these stumbling blocks have

been removed by the discovery of mathematical

laws of probability which explain and predict

the appearance of the logarithmic distribution

(Hill, 1995a and 1996). In this vein, Hill (1996:

2) shows that if probability distributions are

selected at random, and random samples are

then taken from each of these distributions so

that the overall process is ‘unbiased’, then the

leading significant digits of the combined sample

will converge to Benford’s Law (Hill, 1996: 2).

More specifically, using modern mathematical

probability theory it has been shown that the

frequencies of significant digits will conform to

the law when data distributions are selected at

random and random samples are taken from

these distributions. As an aside, not all writers

are in agreement with Hill’s (1996) conclusion.

Brookes (2002), for instance, is critical of

Benford’s Law. However, Brookes (2002: 4)

acknowledges that in the case of data sets that

consist of ‘quantisized items such as oranges,

cows … trees [and money]’ these criticisms are

not serious.

Histrionics aside, the theorems alluded to

above explain why many tables of numerical data

follow the logarithmic distribution described by

Benford’s Law and why others do not. The latter

set includes items such as telephone numbers

in a given region that usually begin with the

same few digits, administered numbers such as

personal identify numbers, hourly wage rates,

bank account numbers, postal codes and tax

payer numbers. As already noted, however,

and significantly in the current argument,

the theorem also explains why a surprisingly

diverse collection of information tends to

obey Benford’s Law. Examples of such data

include large accounting tables, stock market

figures, tables of physical constants, numbers

appearing in newspaper articles, demographic

data, numerical computations in computing and

aspects of scientific calculations (Raimi, 1969;

Ashcraft, 1992; Dehaene and Mehler, 1992;

Hill, 1996 and 1998; Ley, 1996; Nigrini, 1999).

The explanation for conformity with Benford’s

Law now is well established: the data sets are

composed of samples from many different

distributions.

Returning to the main focus of this paper, the

prevalence of the logarithmic distribution in

true accounting data sets has led to Benford’s

Law being used in an international setting to

detect fraud or fabrication of data in financial

documents under the hypothesis that when

people fabricate data they do not choose

numbers which follow a logarithmic distribution

(Hill, 1996). Moreover, it is well documented

that people cannot behave truly randomly even

when such behaviour is to their advantage

(Chapanis, 1953; Bakan, 1960; Neuringer,

1986; Hill, 1999). Further to this, recent studies

support the hypothesis that concocted data do

SAJEMS NS 9 (2006) No 3 345

not follow Benford’s Law closely. Nigrini (1996

and 1999) has led the way in this respect, by

amassing extensive empirical evidence of the

occurrence of Benford’s Law in many areas of

accounting data.

On the back of the accumulated evidence,

Nigrini has come to the conclusion that in a wide

variety of accounting situations, the significant-

digit frequencies of true data confirm closely to

Benford’s Law (see also Carslaw, 1988; Thomas

1989). Conversely, then, Benford’s Law serves

as an ideal tool for detecting variances between

true accounting data and data that have been

manipulated or that contains errors. However,

apart from providing a tool that can alert users

to possible errors or potential fraud, Benford’s

Law holds a second advantage over other

methods used to detect data corruption: the

law is easily applied (Nigrini, 1999: 1). Such a

tool for testing data conformity is described in

Section 3 below.

3

Application of Benford’s Law

3.1 Test method

The aim of the current study is to test the

potential effectiveness of Benford’s Law in

detecting data error or fraud in accounting

information produced by JSE-listed companies.

As a point of departure, it should be recognised

that testing need not be confined to the first-digit

level. Nigrini and Mittermaier (1997) provide a

review of the range of tests available. To start

with, because of the general law, testing can

be applied to higher-order digits as easily as to

first digits (Nigrini, 1999: 4). The law can also

be used to test joint frequencies, such as the

first-two, first-three or, more generally, first-n

digit combinations. Other tests are available.

For instance, the analyst can test for rounding

of numbers, which suggests estimation. Testing

for duplication of numbers or combinations

of numbers is also a potential investigative

tool that hints at fraudulent or administrative

manipulation. Thus, numbers can be binned

to test for conformity in various ways. Most

commonly, though, testing is done at the level

of first- or first-two significant digits. This paper

tests data conformity with Benford’s Law at

the level of the first-significant digit. This basis

for testing conforms to the broad-level testing

criteria established by Nigrini (2000).

Having identified the test level, the process

turns to establishing whether the observed

digit(s) deviate(s) significantly from the expected

frequencies derived from Benford’s Law. In

this regard, following Nigrini (2000) a simple

regression analysis is employed to assess the

significance of any observed deviations from the

expected frequencies.

Specifically, to test for conformity with

Benford’s Law, a regression line is estimated

of the form:

Yi = β0 + β1Xi + εi

where Yi is the value of the frequency of the

i-th significant digit(s) drawn from the sample

data; β0 and β1 are parameters; Xi is a known

constant, namely the value of the independent

variable (frequency of the ith significant digit[s])

as per Benford’s Law; and εi is a random error

term with mean E{εi} = 0 and variance σ2 =

{εi} = σ2; and εi and εj are uncorrelated so that

the covariance σij = 0 for all i, j where i ≠ j and

i = 1,2, … , n. A perfect correlation between the

sample data and Benford’s Law would yield:

β0 = 0; and

β1 = 1.

From this, a t-test is used to test the joint null

hypotheses that β0 = 0 and β1 = 1, which are

the necessary conditions for observed data to

conform to Benford’s Law.

Given the testing method, it becomes necessary

to establish the data sampling technique

adopted. Unfortunately, Benford (1938) offered

no comment in this regard. Indeed, some writers

have gone so far as to hint at Benford having

mined the data analysed (Scott and Fasli, 2001:

7).9 Elsewhere, little insight is offered into

suitable data sampling techniques. For this

reason, this paper adopts a more ‘classical’

sampling stance by observing principles that are

widely recognised as the basis for generating

adequate samples: the samples used are random

and sufficiently large and variable to deliver

test statistics that offer an appropriate level of

precision. The data set is described below.

346 SAJEMS NS 9 (2006) No 3

3.2 Data set

To test the potential of Benford’s Law to detect

error or fraud in accounting data, two data sets

are employed. The first consists of a sample of

‘errant’ companies that were listed on the JSE

during the five-year period 1 July 1998 to 30 June

2003. These companies are commonly suspected

or known to have committed accounting fraud

or produced erroneous data, and their shares

were either suspended or delisted during the

reference period as a consequence.10 This

sample of 17 so-called ‘errant’ companies is

detailed in Table 2. One firm, Amalia Gold

Mining and Exploration Company Limited, was

dropped from the sample due to lack of data.

Table 2

Errant companies (1 July 1998-30 June 2003)

Company name Date of suspension or delisting

Amlac Limited 6 May 2002

Beige Holdings Limited 27 September 1999*

Essential Beverage Holdings Limited 1 July 2002

Internet Gaming Corporation Limited 4 November 2002

Leisurenet Limited 6 October 2000

Macmed Limited 2 July 2001

Noble Minerals Limited 1 July 2002

Oxbridge Online Limited 1 July 2002

REF Finance and Investment Corporation Limited 8 January 2002

Regal Treasury Bank Holdings Limited 27 June 2001

Shawcell Telecommunications Limited 18 January 2002

Taufin Holdings Limited 2 June 2003

Terrafin Limited 24 June 2002

Tigon Limited 18 January 2002

Tridelta Magnet Technology Holdings Limited 27 August 2001

Unifer Holdings Limited 19 June 2002

Whetstone Industrial Holdings Limited 19 April 2001

* Beige Holdings Limited’s suspension was subsequently lifted by the JSE.

Source: Alexander and Oldert (2003)

In order to verify the effectiveness of the above

test of Benford’s Law, data drawn from a control

group of an equal number (17) of companies

was used to test for ‘false positives’. This second

sample consists of a group of firms, as ranked

by Ernst and Young (2002), as having the top

reporting standards amongst listed companies

on the JSE.11 The Ernest and Young survey is

generated annually. For the sake of the tests

conducted in this study, in-sample period data

were drawn from the results of the 2002 survey.

This was done to ensure that the data sets used

are homogenous. It is also believed that using

the 2002 data set allows for sufficient time to

have elapsed from the date of the survey for any

data anomalies to have been reported or to have

emerged. This sample of so-called ‘compliant’

firms is detailed in Table 3.

SAJEMS NS 9 (2006) No 3 347

Table 3

‘Compliant’ companies (2002)

Company name Company name (continued)

ABSA Group Limited Illovo Sugar Limited

African Bank Investments Limited Kersaf Investments Limited

African Oxygen Limited Liberty Group Limited

Allan Gray Property Trust Nampak Limited

AngloGold Limited Nedcor Limited

Aveng Limited Pretoria Portland Cement Company Limited

Anglovaal Mining Limited Sanlam Limited

Firstrand Limited Sasol Limited

Gold Fields Limited

Source: Adapted from Ernst and Young (2002)

to explore statements that are more likely to

include errant data. The most obvious place to

search for data error is in the income statement.

Thus, testing is done on first-digit data drawn

from the income statement. The other principal

statements produced by firms in their annual

financial reports – namely the cash flow, change

of equity and balance sheet statements – are less

prone to manipulation. That said, data error

or fraud that arises in the income statement

is likely to percolate into derived statements

that include statements of change in equity and

balance sheets. So, to eliminate the potential for

double-counting of errors, the data set is based

on income statement data.

Third, in the case of errant firms, only the

last set of publicly reported information is

used. For ‘compliant’ companies, the sample

set is drawn from the 2002 financial year, as

explained above.

Thus, two sets of data are produced by

the sampling method, namely: (a) income

statement first-digit data drawn from ‘errant’

companies on a per company basis and (b)

income statement first-digit data drawn from

‘compliant’ companies on a per company basis,

with the income statement data consisting of

30 line items as reported by the companies.

Accordingly, the full data set consists of 1 020

income statement observations as reported

Data drawn from the financial statements of

the two sets of companies are confined by

three additional parameters. First, the tests

run are confined to raw data whose significant

number frequencies are expected to follow a

geometric sequence when ordered and counted.

Raw accounting data read as line items are

appropriate for testing. Numbers that are a

function of more than one set of other numbers

(such as earnings per share, which is a function

of earnings and the number of share in issue)

are not expected to follow Benford’s Law.12 To

ensure data homogeneity, the same line items

are used for all companies, as published by

data vendor I-Net Bridge’s Financial Analysis

System (FAS). Moreover, the data that are

sampled are ‘as reported’, which thus excludes

all possible influences of adjustments that are

typically made by data vendors in their efforts

to standardise accounting data. In proceeding,

it should be noted that the raw data identified

satisfy the main criteria for having expected

digit frequencies that are Benford-like, namely:

the numbers describe the sizes of similar

phenomena; the numbers have no built-in

maximums or minimums; and the numbers are

not assigned numbers (such as bank account

numbers) (Nigrini and Mittermaier, 1997).

Second, because the aim of the tests is to

identify data manipulation, it makes sense

348 SAJEMS NS 9 (2006) No 3

by the companies. The sampling method then

binned data on a per company basis, with testing

at the company level justified by the argument

that knowing that a group of companies employ

errant or questionable reporting practices is of

marginal use when compared to the knowledge

that a single company adopts such reporting

practices.

Thus, for each company the reported income

statement data are binned. The binned data

frequencies are then regressed on theoretical

frequencies to test for significant deviations from

Benford’s Law. It is expected that the testing

process would reveal significant deviations from

Benford’s Law in the case of ‘errant’ companies,

whilst the frequencies generated by ‘compliant’

company data are expected to observe Benford’s

Law. In proceeding, it ought to be noted that in

a priori testing, rejection of the null hypothesis

does not prove data error , bias or fraud

– legitimate explanations for deviations are

sometimes found. Rather, a positive test result

signals potential data problems, which the data

user should then employ as grounds for a more

detailed examination of the information. This

argument, however, does not necessarily apply

in the case of backward-looking tests. Related

to this point, it must be recognised that the

unit of analysis is the firm, although clearly it

is not firms that falsify data, but rather agents

of the firm. However, detection of data error

at the firm level is arguably a first, necessary

step required in any search for the existence of

fraudulent company data (this point is returned

to below).

4

Test outputs

4.1 Test results

Tables 4 and 5 set out the test results on a per

company basis. Table 4 deals with the results

of tests conducted on ‘errant’ companies, and

shows the estimated values of β0 and β1; the

standard deviation of the estimated values; and

the t-statistic on the estimated values.

The acceptance of the independent null

hypotheses that β0 = 0 and β1 = 1 at the

five percent level of significance is indicated by

an asterisk on cell entries in Table 4. However,

to satisfy the test requirements, it is necessary

that β0 = 0 and β1 = 1 lie within two standard

deviations of the estimated values of β0 and β1.

Accordingly, the test results lead to acceptance of

the null hypothesis that β0 = 0 in 13 of 17 cases.

However, as can be inferred from the estimates

of β1, in all 13 cases the test results reject the null

hypothesis that β1 = 1 at the five percent level.

Hence, the joint requirement that β0 = 0 and β1

= 1 is rejected in all of these cases. As an aside,

there are three instances of significant estimates

of β1. But all three results fail to meet the criteria

of β1 = 1 lying within two standard deviations

of the estimated value of β1. Moreover, none of

these three cases coincide with acceptance of

the null hypothesis that β0 = 0. Further to this,

it is interesting to note that four estimates of β1

carry the wrong sign. These cases hint at ‘extreme’

violation of Benford’s Law: as first-digits increase

from one through to nine, the frequency of first-

Table 4

Test results on errant company data sets

Company Estimate

of β0

σtEstimate

of β1

σt

Amlac –0.11 0.03 –3.75 2.01* 0.22 9.11

Beige 0.17 0.06 3.10 –0.54 0.41 –1.32

Essential 0.13* 0.15 0.89 –0.19 1.10 –0.17

Igaming 0.08* 0.21 0.40 0.25 1.53 0.17

Leisurenet –0.12 0.05 –2.35 2.07* 0.37 5.55

Macmed 0.09* 0.06 1.42 0.19 0.47 0.40

SAJEMS NS 9 (2006) No 3 349

Noble –0.28 0.10 –2.66 3.50* 0.77 4.54

Oxbridge 0.11* 0.11 0.96 0.01 0.84 0.02

Refcorp 0.08* 0.09 0.90 0.28 0.65 0.43

Regal 0.13* 0.09 1.46 –0.18 0.66 –0.27

Shawcell 0.16* 0.13 1.18 –0.41 0.98 –0.42

Taufin 0.03* 0.06 0.56 0.70 0.44 1.59

Terrafin 0.02* 0.14 0.17 0.79 1.02 0.78

Tigon 0.08* 0.13 0.59 0.31 0.97 0.32

Tridelta 0.10* 0.16 0.63 0.08 1.20 0.07

Unifer 0.00* 0.07 0.04 0.98 0.49 2.00

Whetstone –0.02* 0.09 –0.23 1.18 0.66 2.00

Sample 0.04 0.10 0.19 0.65 0.75 1.46

‘early’ data manipulation has a cascading effect. To

put the argument differently, misstatement of line

items that occur low down in the income statement

would mean that a random sample of first digits

may comply with Benford’s Law due to the possible

compliance of earlier numbers which, in the case

of ‘late’ manipulation would make up the majority

of first digits. Thus, from the findings presented

in Table 4 it is inferred that it is more likely that

data manipulation in the current sample occurred

early in the income statement rather than late in the

statement. This, then, sharpens the fraud detection

tool as it is not the company that perpetrates a

fraud, but rather agents of the company and, given

the above arguments, most likely agents that are

able to influence ‘early’ line items. However, as

noted, the unit of analysis in the current study is

the firm, and so a more detailed study is left for

investigation elsewhere.

These comments aside, continuing with the

argument, whilst it may be useful to know that

‘errant’ companies fail to comply with Benford’s

Law, the test only becomes a useful screening

tool if it can be shown that ‘compliant’ companies

generate first-digit frequencies that conform to

Benford’s Law. Consequently, the second set of

tests ensures against Type II error. Given this

backdrop, the test results for the 17 ‘compliant’

companies are reported in Table 5, which sets

digits increases. Accordingly, first-digit

distributions in these data sets are highly suspect.

That aside, and in short, none of the data sets

tested passes the test conditions established for

conformation to Benford’s Law.

Thus, the preliminary finding, based on the

above sample set, is that Benford’s Law is a

useful indicator of the existence of fraudulent

or erroneous data. All 17 companies that are

believed or found to have generated fraudulent

data over the sample period fail the test of

conformity of the distribution of first significant-

digits with Benford’s Law. It is unsurprising to

note that the estimated values based on pooled

data for the 17 ‘errant’ companies indicates that,

if measured as a group, the first significant digit

frequencies fail to conform to Benford’s Law.

As an aside, in the case of ‘errant’ companies it

is evident that the non-compliance of the data with

Benford’s Law can occur due to manipulation of

line items at any level of the income statement.

However, that all companies fail to satisfy the

intercept and slope aspects of the test implies that

data manipulation in the sample occurs in line

items that appear close to the top of the income

statement (the overstatement of revenue is the most

obvious culprit). More to the point, the higher up

the statement that manipulation occurs, the greater

the deviance of the balance of the statement as the

350 SAJEMS NS 9 (2006) No 3

out the estimated values of β0 and β1; the standard

deviation of the estimated values; and the t-statistics

on the estimated values.

is not found to be significantly different from

zero at the five percent level (although the

estimate is significant at the 10 percent level).

Importantly, of the 16 estimates of β1 that are

found to be significantly different from 0, only

three estimates fail to meet the further condition

that β1 = 1 lies within two standard deviations

of the estimated value of β1. Thus, of the set of

‘compliant’ companies, 13 of the 17 firms pass

the joint test of β0 = 0 and β1 = 1, indicating

conformity with Benford’s Law. It is interesting

to note that the estimated values based on

Table 5

Test results on ‘compliant’ company data sets

Company Estimate

of β0

σtEstimate

of β1

σt

ABSA –0.01* 0.03 –0.31 0.96* 0.21 4.50

ABIL –0.01* 0.05 –0.22 1.09* 0.35 3.11

Afrox –0.01* 0.03 –0.40 1.12* 0.25 4.42

Allan Gray –0.07* 0.04 –1.83 1.64* 0.29 5.71

Anglogold 0.04* 0.03 1.12 0.65* 0.25 2.57

Aveng –0.03* 0.06 0.06 1.31* 0.45 2.93

Avmin –0.01* 0.05 –0.16 1.07* 0.39 2.77

Firstrand 0.01* 0.05 0.32 0.87* 0.35 2.51

Goldfield 0.04* 0.04 0.91 0.64 0.32 1.96

Illovo 0.02* 0.04 0.40 0.85* 0.31 2.71

Kersaf 0.01* 0.05 0.12 0.94* 0.39 2.41

Liberty –0.07* 0.04 –1.85 1.59* 0.26 6.09

Nampak –0.05* 0.03 –1.44 1.43* 0.24 5.87

Nedcor –0.04* 0.02 –2.13 1.36* 0.14 9.91

PPC 0.05* 0.03 1.64 0.55* 0.23 2.41

Sanlam –0.04* 0.03 –1.22 1.35* 0.24 5.71

Sasol 0.03* 0.04 0.62 0.77* 0.31 2.48

Sample –0.01 0.04 –0.26 0.65 0.75 1.46

The results set out in Table 5 for the sample of

‘compliant’ companies indicate that the null

hypothesis that β0 = 0 cannot be rejected at

the five percent level for any of the companies.

Moreover, in all 17 cases, β0 = 0 lies within two

standard deviations of the estimated values of β0.

Thus, all 17 of the ‘compliant’ companies have

significant first-digit frequencies that indicate

conformity with Benford’s Law in the case of

β0 = 0. In considering the estimates of β1, the

coefficient is significant in 16 of the 17 cases.

The estimate of β1 on Goldfields (β1 = 0.64)

SAJEMS NS 9 (2006) No 3 351

pooled data for the 17 ‘compliant’ companies

indicates that the group’s first significant-digit

frequencies conform to Benford’s Law, with

β0 = 0 and β1 = 1 for the group.

As a final comment on the estimated values

of β0 and β1, it is noteworthy that the standard

errors on the estimates in the case of ‘errant’

companies (0.10 and 0.75, respectively) are

more than twice the size of standard errors on

the estimates β0 and β1 in the case of ‘compliant’

companies. This result offers further anecdotal

evidence of the superior ‘quality’ of ‘compliant’

company data over ‘errant’ company data.

4.2 Implications and limitations

In short, the results of the testing procedure

indicate that conformity to Benford’s Law may

serve as a robust tool forewarning users of

accounting data of the potential existence of

data error or fraud. The results are particularly

encouraging in this regard in that the test

procedure yielded a false-positive result in

four of 34 cases (11.8 percent of the sample).

Put differently, when applied at the time of

annual financial reporting to the above sample

of ‘errant’ and ‘compliant’ companies, the test of

Benford’s Law correctly identified 88.20 per cent

of the cases (30 of 34 companies), and correctly

identified 100.0 per cent of ‘errant’ cases.13 The

reason for this appears to be elegantly simple:

like supernovae, fraudulent companies give

themselves away by shining more brightly than

their peers as they zealously thrash away their

final moments.

Nevertheless, whilst these early results of the

application of Benford’s Law yield encouraging

findings, the test procedure and data set have

limitations that suggest further research is

required. Some of the more obvious limitations

are identified below.

First, the data collection method may include

an obvious source of sample bias in that with

the benefit of hindsight, the status of ‘errant’

and ‘compliant’ companies was known before

testing was conducted. This begs the question

of whether the test method would be as reliable

in the case of live data, that is, as a prediction

tool (where the value of the instrument is

unambiguously greatest). There is no cause to

doubt that this is the case. Nevertheless, testing

of live data would go some way in confirming

the tool’s validity.

Second, and related to this point, the results

reveal that the test functions in a highly

effective fashion in the tails of the distribution

– correctly failing ‘errant’ companies and

passing ‘compliant’ companies. However, the

data set used in this study offers no insight as

to ‘what goes on in between’. Over most of the

sample period there were in excess of 500 listed

companies on the JSE. Thus, this study covers

less than 10 per cent of the population. A broader

study is required to establish the effectiveness

of the tool across all firms. Until such time,

then, the instrument is arguably best used as an

indicator of potential data error or fraud rather

than a corroborator of data problems.

Third, the results offer no guide as to whether

all companies that fail the test ultimately fail

and, if so, what the extent of the lag in time is

between detection and failure.

Fourth, in the international setting, Benford’s

Law has been applied more widely than

accounting data as the basis for detecting

data error or fraud. Indeed, the potential

applications of the law are wide. For instance,

the law has been identified as relevant to the

interrogation of design efficiency (Hamming,

1970 and Knuth, 1981 in Scott and Fasli, 2001),

the examination of authenticity of mathematical

models (Varian, 1972 in Scott and Fasli; Nigrini,

1996), assessment of the validity of research

results (Matthews, 1999: 26) and the examination

of data storage and data management efficiency

(Nigrini, 1999). Moreover, as noted in Section

2, the tool also is applicable as an instrument

for detecting fraud in claims (such as insurance

claims and expense account claims), payments

(bank payments and payroll disbursements)

and tax fraud (income declarations and expense

claims). However, constraints of time confine

the extant study to a consideration of accounting

data problems amongst listed firms. Broader,

and more detailed, studies of Benford’s Law

should address these limitations.

352 SAJEMS NS 9 (2006) No 3

5

Conclusion

Over the past decade, the frequency of

accounting data error and fraud has increased

in the international and domestic settings.

The adverse economic effects of these data

problems are considered to be material. For

this reason, broad-based efforts are being made

by the accounting and auditing professions and

regulatory authorities to reduce the incidence

of data error and fraud. However, even in a

world where recording and reporting of data

is potentially error free, elements of human

behaviour (such as greed and deceit) will linger

on, causing data error and fraud to persist.

Moreover, the pace at which progress in

accounting, auditing and regulatory advances

are being made is slow. For these reasons, error

and fraud detection instruments are likely to

remain important instruments in the toolkits

of auditors, shareholders, financial analysts,

investment managers, private investors and other

users of publicly reported accounting data, such

as the revenue services. One such potential tool

is Benford’s Law. However, whilst the potential

effectiveness of the law has been established

in the international literature, the domestic

research environment is silent on the topic.

Accordingly, this paper examines the potential

effectiveness of Benford’s Law in the detection

of data error and fraud in a South African

setting. To examine the case, a simple regression

tool is applied to data generated by a set of 34

companies listed on the JSE. For the sake of

the study, the test sample consists of data drawn

from an equal number of so-called ‘errant’ and

‘compliant’ companies. The results of the study

are convincing, with the tool correctly failing

all 17 of the ‘errant’ companies; three of the

17 ‘compliant’ companies fail the test. Despite

the incidence of false–positive results, the

number is considered to be sufficiently small

(11.2 percent of the full sample) to conclude

that Benford’s Law has the capacity to serve

as an effective indicator of data problems in

accounting information. Moreover, under test

conditions that are broader than the a priori

conditions that were set, the success rate of

the test climbs to 97.1 percent. Further, whilst

the study has some limitations, none of these is

considered to be sufficient to challenge the basic

result: Benford’s Law has the potential to act as

a highly effective detector of data error or fraud

in accounting information.

Endnotes

1 The author would like to thank Kerry Hadfield,

Jim Harris, Warwick Lucas, Zane Spindler, Hunter

Thyne and John Verster for useful contributions

made to this paper; the author also acknowledges

the helpful comments provided by two anonymous

editors. However, the usual caveats apply.

2 International examples embrace a diverse set

of high-profile companies that includes Enron,

WorldCom, Lucent, Adelphia, Ahold, Tyco, Intel,

AOL-Time Warner and Global Crossing. As with

the international environment, the South African

business environment is scattered with examples

of accounting frauds and irregularities. Some

companies that have engaged such practices are

listed in Section 3 of this paper.

3 D’Agostino and Williams (2002) identify 919

cases of accounting restatements made by listed

companies in the United States (US) between

1 January 1997 and 30 June 2002. The study

finds that losses in market capitalisation of

US$100 billion occurred over the reference period.

See also Floyd (2003), where comment is made on

the growing incidence of accounting irregularities

amongst listed firms.

4 It ought to be noted that the validity of Benford’s

(1938) findings has been drawn into question by

some researchers. For example, Scott and Fasli

(2001: 5) note that Benford’s claim that the tested

data sets conformed to his law rested entirely

on the apparent similarity of the numbers. To

be sure, Benford made no attempt to test the

goodness of fit of the data. However, this has not

led to the rejection of Benford’s Law. Rather,

this shortcoming in Benford’s work has led to the

refinement of our understanding of the types of

data to which the law applies (Scott and Fasli,

2001: 2).

5 In his paper, Newcomb (1881) also determined the

probability of the ten second digits, independent of

the first digits (Brookes, 2002: 1).

6 Hill (1995a) provides the exact formulas of the

joint probability calculations.

7 Pinkham (1961) provided a key development in the

understanding of Benford’s Law by arguing that

SAJEMS NS 9 (2006) No 3 353

for any digit-distribution law to hold consistently, it

would have to be scale invariant. Pinkham’s (1961)

proof was later extended by Hill (1995b).

8 See Raimi (1976) for an early review of the

literature and Scott and Fasli (2002) for a more

recent literature survey. Three main groups of

explanations emerge from these literature surveys.

The first set argues that Benford’s Law is due to

the numbering system that we use to count upward

through the natural numbers. The second group of

mathematical explanations is based on the notion

of ‘randomness’ and the central limit theorem.

The third approach to deriving Benford’s Law is

termed ‘ontological’ because it asks: ‘What form

would a digit law take if such a law existed?’ Scott

and Fasli (2001: 3-5) and Brookes (2002) offer

comment in this regard. That aside, of these three

approaches, the second remains the most widely

accepted plausible explanation for conformity of

a data set to Benford’s Law (Scott and Fasli, 2001:

15).

9 It is noteworthy that the test statistics generated by

Scott and Fasli (2001: 6) to interrogate Benford’s

(1938) results conform to Benford’s Law.

10 The companies identified by the author that are

commonly suspected or believed to have published

false or fraudulent data were supplied by a

group of ten investment brokers and managers

representing five different financial services firms

who dealt in listed companies over the reference

period.

11 It is acknowledged that Ernst and Young’s

‘Excellence in Financial Reporting’ is not intended

by the authors to test or validate the authenticity

(correctness) of the numbers reported in financial

statements. Rather, in the absence of such a tool,

the report is used here as a proxy for indicating the

authenticity of reported accounting data.

12 To illustrate this point, all ending digits in earnings

per share figures are expected to be distributed

with equal probability. Further, first digit counts on

financial ratios, such as return on equity or return

on assets are, in many instances, likely to conform

more closely to a binary distribution than to the

distribution implied by Benford’s Law.

13 It is interesting to note that at the 10.0 per cent

level of significance and allowing for true values

of βi to lie within three standard deviations of the

estimated βi values all of the 17 ‘errant’ companies

continue to fail the test, whilst the number of false-

positive results declines to one. Thus, under this

set of broader test criteria, the overall success rate

of the test climbs to 97.1 per cent.

References

1 ALEXANDER, E. & OLDERT, N. (eds.) (2003)

Profile’s Stock Exchange Handbook: July-December

2003, Profile Media: Johannesburg.

2 ARTHUR, C. (1993) “Intimate moments in the

lives of great scientists”, New Scientist, No. 1905.

3 ASHCRAFT, M. (1992) “Cognitive arithmetic: A

review of data and theory”, Cognition, 44: 75-16.

4 BAKAN, P. (1960) “Response-tendencies in

attempts to generate binary series”, American

Journal of Psychology, 73: 127-131.

5 BENFORD, F. (1938) “The law of anomalous

numbers”, Proceedings of the American

Philosophical Society, 78: 551-572.

6 BROOKES, D. (2002) “Naked-eye quantum

mechanics: Practical applications of Benford’s law

for integer quantities”, Frequencies Journal of Size

Law Applications, Special Paper, 1: 1-8.

7 CARSLAW, C. (1988) “Anomalies in income

numbers: Evidence of goal oriented behavior”,

Accounting Review, 63: 321-327.

8 CHAPANIS, A. (1953) “Random number guessing

behavior”, American Psychologist, 8: 332.

9 D’AGOSTINO, D.M. & WILLIAMS, O.M. (2002)

Financial Statement Restatements, Report to the

Chairman, Committee on Banking, Housing, and

Urban Affairs. United States General Accounting

Office: Washington.

10 DEHAENE, S. & MEHLER, J. (1992) “Cross-

linguistic regularities in the frequency of number

words”, Cognition, 43: 1-29.

11 ERNST & YOUNG (2002) Excellence in Financial

Reporting, Ernst and Young: Johannesburg.

12 FLOYD, J.J (2003) An Analysis of Restatement

Matters, Huron Consulting Group: Chicago.

13 HILL, T.P. (1995a) “Base-invariance implies

Benford’s law”, Proceedings of the American

Mathematical Society, 123: 887-895.

14 (1995b) “The significant-digit phenomenon”,

American Mathematical Monthly, 1024: 322-327.

15 (1996) “A statistical derivation of the significant-

digit law”, Statistical Science, 104: 354-363.

16 (1998) “The first-digit phenomenon”, American

Scientist Online, July-August, No. 86: 1-18.

17 (1999) “The difficulty of faking data”, Chance, 123:

27-31.

18 LEY, E. (1996) “On the peculiar distribution of the

US stock indices”, American Statistician, 504: 311-313.

19 MATTHEWS, R. (1999) “The power of one”, New

Scientist, 163(2194): 26.

20 NEURINGER, A. (1986) “Can people behave

‘randomly’?: The role of feedback”, Journal of

Experimental Psychology, 115: 62-75.

354 SAJEMS NS 9 (2006) No 3

21 NEWCOMB, S. (1881) “Note on the frequency

of use of the different digits in natural numbers”,

American Journal of Mathematics, 4: 39-40.

22 NIGRINI, M.J. (1996) “A taxpayer compliance

application of Benford’s law”, Journal of the

American Taxation Association, 18: 72-91.

23 (1999) “I’ve got your number: How a mathematical

phenomenon can help CPAS uncover fraud and

other irregularities”, Journal of Accountancy, May

1999: 1-7.

24 (2000) Continuous Auditing, Working Paper, Ernst

and Young Centre for Auditing Research and

Advanced Technology, University of Kansas.

25 NIGRINI, M.J. & MITTERMAIER, L.J. (1997)

“The use of Benford’s law as an aid in analytical

procedures”, Auditing: A Journal of Practice and

Theory, 16(Fall): 52-67.

26 PINKHAM, R. (1961) “On the distribution of

first significant digits”, Annals of Mathematical

Statistics, 32: 1223-1230.

27 RAIMI, R. (1969) “The peculiar distribution of

first significant digits”, Scientific American, 2216:

109-120.

28 (1976) “The first digit problem”, American

Mathematical Monthly, 83: 521-538.

29 SCOTT, P. & FASLI, M. (2001) “Benford’s law: An

empirical investigation and a novel explanation”,

CSM Technical Report 349, Department of

Computer Science, University of Essex.

30 THOMAS, J. (1989) “Unusual patterns in

reported earnings”, Accounting Review, 64: 773-

787.