Content uploaded by Werner Hürlimann

Author content

All content in this area was uploaded by Werner Hürlimann on Oct 08, 2014

Content may be subject to copyright.

A GENERALIZED BENFORD LAW AND ITS APPLICATION

By Werner Hürlimann, Switzerland

Abstract.

A simple one-parameter analytical extension of Benford’s law for first digits of numerical

data is constructed. Based on the maximum likelihood method, the fitting capabilities of the

new distribution is illustrated at some interesting and important integer sequences including

the numeri ideoni, the Keith, Princeton, lucky, Ulam and Bell numbers, as well as the

sequence of primes. Benford’s law of the mixing of the considered data sets is rejected at the

5% significance level while the generalized Benford law is accepted with a 25% p-value.

Confirming the statistical evidence, it is shown that the first digits of the Bell numbers satisfy

Benford’s law.

Keywords : first digit, Benford’s law, uniform distribution, triangular distribution,

two-sided power distribution, integer sequences, mixing sequences, Bell number

MSC 2000 : 11B73, 11B83, 11K31, 62E15, 62E17

1. Introduction.

Since Newcomb(1881) and Benford(1938) it is known that many numerical data sets obey

Benford’s law. To be specific, if the random variable X, which describes the first significant

digit in a numerical table, is Benford distributed, then

()

()

PX d d== +

−

log 1 1,

{}

d∈19,..., . (1.1)

Mathematical explanations of this law have been proposed by Pinkham(1961),

Cohen(1976), Hill(1995a/b/c,97,98), Allart(1997), Janvresse and de la Rue(2003). In recent

years an upsurge of applications of Benford’s law have appeared including work by

Becker(1982), Burke and Kincanon(1991), Buck et al.(1993), Ley(1996), Nigrini(1996/99),

Nigrini and Mittermaier(1997), Vogt(2000), Tolle et al.(2000) and Berger et al.(2002).

Hill(1995c) also suggested to switch the attention to probability distributions that follow

or closely approximate Benford’s law. Papers along this path include Leemis et al.(2000) and

Engel and Leuenberger(2003). Some survival distributions, which satisfy exactly Benford’s

law, are known. However, there does not seem to exist simple analytical distributions, which

include as special case Benford’s law. Combining facts from Leemis et al.(2000) and Dorp

and Kotz(2002), such a simple one-parameter family of distributions is constructed.

The interest of the enlarged Benford law is two-fold. First, it may provide a better fit of

the data than Benford’s law. Second, it yields a simple statistical procedure to validate

Benford’s law. If Benford’s model is sufficiently “close” to the one-parameter extended

model, then it will be retained. These points will be illustrated by some typical examples.

2

2. A generalized Benford distribution.

If T denotes a random lifetime with survival distribution

S

t

P

T

t

() ( )=≥, then the value

Y of the first significant digit in the lifetime T has probability distribution

() ( )

{}

{}

PY y S y S y y

ii

i

== ⋅ − +⋅ ∈

=−∞

∞

∑( )() , ,...,10 1 10 1 9 . (2.1)

Alternatively, if D denotes the integer-valued random variable satisfying

10 10 1DD

T≤< +, (2.2)

then the first significant digit can be written in terms of T and D as

[][]

YT DTD

=⋅ =

−−

10 10log , (2.3)

where

[]

x denotes the greatest integer less than or equal to x. In particular, if the random

variable ZTD=−log is uniformly distributed as U(,)01 , then the first significant digit Y

is exactly Benford distributed. Starting from the uniform random variable

W

U=(,)02 or the

triangular random variable W Triangular=(,,)0 12 with probability density function

fw w

W()= if

()

w∈01, and fw w

W()=−2 if

[

)

w∈12,, one shows that the random

lifetime TW

=10 generates the first digit Benford distribution (Leemis et al.(2000),

Examples 1 and 2).

A simple parametric distribution, which includes as special cases both the above uniform

and triangular distributions, is the two-sided power random variable

()

WTSP c=

α

, recently

considered in Dorp and Kotz(2002) with probability density function

fw

W()=

()

()

cw

c

cw

c

w

w

2

1

2

2

2

1

0

2

α

α

α

α

−

−

−

−

<≤

≤<

,,

,.

(2.4)

If c=1 then

W

U=(,)02 , and if c==21,

α

then W Triangular=(,,)012 . This

observation shows that the random lifetime

()

TTSP c

=10 1, will generate first digit

distributions closely related to Benford’s distribution, at least if c is close to 1 or 2.

Theorem 2.1. Let

()

WTSPc=1, be the two-sided power random variable with probability

density function

fw

W()=

()

cc

cc

ww

ww

2

1

2

1

01

212

−

−

<≤

−≤<

,,

,,

(2.5)

and let the integer-valued random variable D satisfy D

W

D≤<+1. Then the first digit

random variable

[]

YWD

=−

10 has the one-parameter generalized Benford probability density

function

[][][ ][]

{}

{}

fy y y y y y

Y

cc c c

( ) log( ) log log( ) log , ,..., .=+−−−++− ∈

1

2111119 (2.6)

3

Proof. The probability density function of TW

=10 is given by

ft

T()=

()

()

c

t

tc

c

t

tc

t

t

210 10

1

210 10

1

010

2 10 100

⋅

−

⋅

−

<≤

−≤<

ln

ln

ln

ln

ln

ln

,,

,.

It follows that the first significant digit of T, namely

[]

YT D

=⋅

−

10 , has probability density

function

fy ftdt ftdt

YT

y

y

T

y

y

() () ()

()

=+

++

∫∫

1

10

10 1

.

Making the change of variable ut=ln / ln10 , one obtains

f y cu du c u du

Y

c

y

y

c

y

y

() ( )

log

log( )

log

log( )

=+−

−

+

−

+

++

∫∫

1

22

1

1

1

1

11

,

which upon integration immediately implies (2.6). ◊

3. Fitting the first digit distributions of some integer sequences.

Maximum likelihood estimation of the generalized Benford distribution is straightforward.

To simplify notation, set

{}

ayb yy

Yy

==+∈log , log( ), ,... ,1 1 9 , and consider the

derivative of Lfy

yY

=ln ( ) with respect to the parameter c, which is given by

∂

∂

cLy=

ln ln( ) ( )

() ,,

ln ln ln( ) ( ) ln( ) ( )

() , ,..., ,

ln ln()()

() ,.

bb b b

bb y

bb a a b b a a

bb y

aa a a

aa y

cc

cc

yy

c

YY

c

YY

c

YY

c

cc

cc

cc

11 1 1

11

11

99 9 9

99

11

11 1

11 11

11 28

11

11 9

⋅− − ⋅−

−− + =

⋅− ⋅− − ⋅− + − ⋅−

−− + =

−⋅+−⋅−

−+− =

If

{}

Ny

y, ,...,∈1 9 , is the number of observations with first digit y, then the maximum

likelihood estimator (MLE) of c will solve the non-linear equation

NcL

yy

y

⋅=

=

∑

∂

∂

1

9

0. (3.1)

In some cases, more than one solution may exist, usually near c=1 and c=2. In this

situation, the solution with greatest value of the log-likelihood ln ln ( )LNfy

yY

y

=⋅

=

∑

1

9

will

be chosen as MLE.

4

The fitting capabilities of the new distribution is illustrated at some interesting and

important integer sequences. The first digit occurrences of the analyzed sequences are listed

in Table 3.1. Then, we give some comments on the interest of these sequences. Statistical

results are summarized in Table 3.2. Besides the value of the MLE of the parameter c, we list

the negative log-likelihood, the chi-square values and their corresponding p-values. Finally,

the obtained results are discussed.

Table 3.1 : First digit distributions of some integer sequences

First digit occurrences

Name of sequence Sample

size 1 2 3 4 5 6 7 8 9

Numeri ideoni 65 20 12 9 7 4 2 5 4 2

Keith number 71 23 10 10 5 3 5 9 2 4

Princeton number 25 7 2 3 3 2 3 2 1 2

Lucky number 45 19 8 4 2 1 3 4 1 3

Ulam number 44 20 6 3 3 2 3 2 3 2

Fibonacci number 100 30 18 13 9 8 6 5 7 4

Bell number 100 31 15 10 12 10 8 5 6 3

Prime number 168 25 19 19 20 17 18 18 17 15

Mixing sequence 618 175 90 71 61 47 48 50 41 35

According to Euler, a natural number n is a numeri ideoni if it fulfills the following

condition. If a number p has a unique representation in the form px ny=+⋅

22

with

relatively prime numbers x and y, then it is a prime number. Euler discovered the fact that

there are only 65 numeri ideoni, with n=1848 the largest one. The proof of finiteness has

been given by Heilbronn and Chowla in 1934 (see Fellmann(1983), p. 37). Keith numbers or

repfigit numbers (replicating Fibonacci digits) have been introduced by Keith(1987) (see also

Pickover(1990/2000), Shiriff(1994), Keith(1994)). A Keith number is an n-digit integer N

with the following property. If a Fibonacci-like sequence (in which each term in the sequence

is the sum of the n previous terms) is formed, with the first n terms being the decimal digits

of the number N, then N itself occurs as a term in the sequence. For example, 47 is a Keith

number since it generates the sequence 4, 7, 11, 18, 29, 47. There are 71 Keith numbers up to

1019 , as determined by Keith(1998). Moreover, it has been suggested that there is an infinite

number of Keith numbers. The story of Princeton numbers is found in Robbins(1991) (see

also Pickover(2000)). Lucky numbers and Ulam numbers are defined in Guy(1981), p. 60 (for

the latter ones see also Pickover(2000)). The Fibonacci numbers are well-known and need no

special reference. The n-th Bell number represents the number of ways to partition n things

into subsets (e.g. Aigner(1975), Jacobs(1983), Graham et al.(1989), Conway and Guy(1996)).

Prime numbers are ubiquitous and known to everybody. The mixing sequence represents the

aggregate of all previous integer sequences.

5

Table 3.2 : Fitting integer sequences to the Benford and generalized Benford distributions

Benford generalized Benford

Name of sequence MLE $

c

−ln L

χ

2 p-value −ln L

χ

2 p-value

Numeri ideoni 1.15087 127.76 2.59 95.7 127.71 2.52 92.5

Keith number 2.63445 142.49 9.21 32.4 141.98 7.72 35.8

Princeton number 2.81958 53.14 3.45 90.3 52.83 2.87 89.7

Lucky number 3.15802 83.96 7.69 46.4 82.78 5.16 64.0

Ulam number 3.48760 81.78 6.35 60.8 79.83 2.53 92.5

Fibonacci number 2.01802 199.22 1.03 99.8 199.22 1.02 99.4

Bell number 2.01106 200.39 3.07 93.0 200.39 3.07 87.9

Prime number 2.75660 389.05 45.02 4105

⋅− 387.04 37.03 46 10 4

.⋅−

Mixing sequence 2.52473 1277.65 15.55 4.93 1274.66 9.02 25.1

Taking into account the finiteness of the sequence of numeri ideoni, it is quite surpising how

well this sequence fits both first digit distributions. Among the Keith, Princeton, lucky and

Ulam numbers, the Princeton ones are the most likely to fit Benford’s law. Unfortunately, the

considered sequence is up to now only limited to a relatively small sample size. Comparing

p-values, the lucky and Ulam numbers are better fitted by the generalized Benford

distribution. The Fibonacci and Bell numbers are very close to Benford’s law. In fact, both

follow Benford’s law. For the Fibonacci sequence, this is shown in a number of papers

including Brown and Duncan(1970), Wlodarski(1971), Sentance(1973), Webb(1975),

Raimi(1976), Brady(1978), Kunoff(1987). An analytical proof that Bell numbers are Benford

distributed is given in Theorem 4.1. Both Benford distributions are rejected for the sequence

of primes. This statistical evidence proof is simpler than the analytical derivation proposed in

Diaconis(1977), p. 74. However, note that the sequence of primes is Benford distributed with

respect to other densities rather than with the usual natural density (e.g. Whitney(1972),

Schatte(1983), Cohen and Katz(1984)). Despite the observation that mixing sequences are

often Benford distributed (e.g. Benford(1938), Raimi(1969/85), Knuth(1969)), a fact proven

under some hypotheses by Hill(1995c), Theorem 3, it is perhaps surprising that our mixing

sequence is rejected at the 5% significance level. In contrast to this, the generalized Benford

law is accepted with a 25% p-value. Moreover, it is not clear how some sequences in Table

3.2 will behave by increasing the sample size. To clarify these and other facts, future research

is strongly required in this area.

4. The first digits of the Bell numbers satisfy Benford’s law.

To confirm the statistical evidence of Table 3.2, it is shown that the first digits of the Bell

numbers satisfy Benford’s law.

Theorem 4.1. The sequence of first digits of the Bell numbers are Benford distributed.

Proof. Recall that the Bell numbers satisfy the recurrence relationship bn

kb

nk

k

n

+=

=

∑

1

0

,

bb

12

12==, (e.g. Graham et al.(1989), p. 359). The sequence

{}

bn is Benford distributed if

6

and only if the sequence

{}

logbn is uniformly distributed modulo 1 (e.g. Diaconis(1977),

Theorem 1). To show the latter, consider first the sequence

xne

nn

n

nn nn n

=⋅

++

+⋅ −⋅ −

−

ln ln ( )

ln ln

1

21

1

12

,

which by Graham et al.(1989), p. 479 and 573, is an asymptotic estimate of

{}

lnbn. The

simpler sequence

yne

n

n

nn nn n

=⋅

+

+⋅ −⋅ −

ln ln

ln ln

1

21

1

satisfies the property

{}

lim

nnn

xy

→∞ −=0. In the following, we apply some useful results from

the theory of uniformly distributed sequences modulo 1 from Kuipers and Niederreiter(1974),

pp. 1-18. First, if two sequences

{}

xn and

{}

yn satisfy asymptotically a relation

{}

lim .

nnn

x y const

→∞ −= , and if

{}

yn is uniformly distributed modulo 1, then

{}

xn is also

uniformly distributed modulo 1. Therefore, it suffices to show that

{}

yn/ ln10 is uniformly

distributed modulo 1. Second, one applies the criterion of Weyl(1916), which says that a

sequence

{}

an is uniformly distributed modulo 1 if and only if

lim log

N

ih a

n

N

Nen

→∞ =

∑=

10

2

1

π

for all hN∈+.

Third, to show this, we use Van der Corput’s estimate for trigonometric sums. Let a and b

be integers with ab<, and let f be twice differentiable on

[

)

ab, with fx''( ) ≥>

ρ

0 or

fx''( ) ≤− <

ρ

0 for

[

)

xab∈,. Then

()

efbfa

if n

na

b

2243

π

ρ

() '( ) '( )

=

∑≤−+⋅+

.

Now, set fx hgx() ()/ln=⋅ 10 such that fn hy

n

() /ln=⋅ 10 with

()

()

gx x x x x x x x( ) ln ln ln ln ln=− −+ + − +11

1

2

1

2.

One has

() () ( )

[]

gxxxx xx'( ) ln ln ln=++−+

−−

21 1

221 ,

gx gx gx gx''( ) ( ) ( ) ( )=++

123

, with

7

()

gx x

xgx x

xgx x

xx

122 3 22

21

221

2

2

1

() , () ln ,() ln

ln

=−=⋅ = ⋅ +

+.

We show that g

x

'' ( ) is decreasing for all

x

≥1. This follows easily noting that

gx x

x

13

10

'()=−≤ for

x

≥1, gx x

x

22

10

'() ln

=−≤ for

x

e≥,

()

()

gx xx

xx

3

2

1

2

71 2

10

'() ln (ln )

ln

=− ++

+<, and gx gx

23

0

''

() ()+≤ for

x

≥1.

With abN==1, , it follows that for all

[]

xN∈1, :

()

gx gN N

N

N

N

NN

N

''( ) ''( ) ln ln

ln

≥≥+⋅+⋅

+

+≥

1

221

2

2

1

1

2

22,

which implies that fx hgx hN''( ) ''( )/ ln /=≥=>10 6 0

ρ

. Van der Corput’s estimate yields

() ()

ehNN

NN N

N

h

if n

n

N

2

1

2

10

1

2

1

21 24

63

π

()

ln ln ln ln

=

∑≤++−

++

⋅+

,

which implies that lim ()

n

if n

n

N

Ne

→∞ =

∑=

10

2

1

π

. Therefore, the asymptotically equivalent sequences

{}

yn/ln10 ,

{}

xn/ln10 and

{}

logbn are all uniformly distributed modulo 1. The result is

shown. ◊

References.

Aigner, M. (1975). Kombinatorik. I. Grundlagen und Zähltheorie. Springer.

Allaart, P.C. (1997). An invariant-sum characterization of Benford’s law. Journal of Applied

Probability 34, 288-291.

Becker, P. (1982). Patterns in listings of failure-rate and MTTF values and listings of other

data. IEEE Transactions on Reliability R-31, 132-134.

Berger, A., Bunimovich, L.A. and T.P. Hill (2002). One-dimensional dynamical systems and

Benford’s law. Appears in Transactions of the American Mathematical Society.

Benford, F. (1938). The law of anomalous numbers. Proceedings of the American

Philosophical Society 78, 551-572.

Brady, W.G. (1978). More on Benford’s law. Fibonacci Quarterly 16, 51-52.

Brown, J. and R. Duncan (1970). Modulo one uniform distribution of the sequence of

logarithms of certain recursive sequences. Fibonacci Quarterly 8, 482-486.

Buck, B., Merchant, A. and S. Perez (1993). An illustration of Benford’s first digit law using

alpha decay half lives. European Journal of Physics 14, 59-63.

Burke, J. and E. Kincanon (1991). Benford’s law and physical constants : the distribution of

initial digits. American Journal of Physics 59, 952.

Cohen, D. (1984). An explanation of the first digit phenomenon. Journal of Combinatorial

Theory (A) 20, 367-370.

8

Cohen, D. and T. Katz (1984). Prime numbers and the first digit phenomenon. Journal of

Number Theory 18, 261-268.

Conway, J.H. and R.K. Guy (1996). The Book of Numbers. Springer.

Diaconis, P. (1977). The distribution of leading digits and uniform distribution mod 1. Annals

of Probability 5, 72-81.

Dorp, J.R. van and S. Kotz (2002). The standard two-sided power distribution and its

properties : with applications in financial engineering. The American Statistician

56(2), 90-99.

Engel, H.-A. and C. Leuenberger (2003). Benford’s law for exponential random variables.

Statistics and Probability Letters.

Fellmann, E.A. (1983). Leonhard Euler – Ein Essay über Leben und Werk. In : Leonhard

Euler 1707-1783. Beiträge zu Leben und Werk. Birkhäuser Verlag Basel.

Graham, R.L., Knuth, D.E. and O. Patashnik (1989). Concrete Mathematics. A Foundation

for Computer Science. Addison-Wesley.

Guy, R.K. (1981). Unsolved Problems in Number Theory. Problem Books in Mathematics,

vol. 1. Springer.

Hill, T.P. (1995a). Base-invariance implies Benford’s law. Proceedings of the American

Mathematical Society 123, 887-895.

Hill, T.P. (1995b). The significant-digit phenomenon. Amer. Math. Monthly 102, 322-326.

Hill, T.P. (1995c). A statistical derivation of the significant-digit law. Statistical Science 10,

354-363.

Hill, T.P. (1997). Benford’s law. Encyclopedia of Mathematics Supplement, vol. 1, 102.

Hill, T.P. (1998). The first digit phenomenon. The American Scientist 10(4), 354-363.

Jacobs, K. (1983). Einführung in die Kombinatorik. Walter de Gruyter.

Keith, M. (1987). Repfigit numbers. Journal of Recreational Mathematics 19(1), 41-42.

Keith, M. (1994). All repfigit numbers less than 100 billion. Journal of Recreational

Mathematics 26(3), 181-184.

Keith, M. (1998). Determination of all Keith numbers up to 1019 . Available at

http://users.aol.com/s6sj7gt/keithnum.htm.

Knuth, D. (1969). The Art of Computer Programming, vol. 2, 219-229. (2nd ed.).

Addison-Wesley, Reading, MA, 239-249. (3rd ed.) (1998), 254-262.

Kuipers, L. and H. Niederreiter (1973). Uniform Distribution of Sequences. J. Wiley.

Kunoff, S. (1987). N! has the first digit property. Fibonacci Quarterly 25, 365-367.

Leemis, L.M., Schmeiser, B.W. and D.L. Evans (2000). Survival distributions satisfying

Benford’s law. The American Statistician 54(3), 1-6.

Ley, E. (1996). On the peculiar distribution of the U.S. Stock Indices Digits. The American

Statistician 50(4), 311-313.

Newcomb, S. (1881). Note on the frequency of use of the different digits in natural numbers.

American Journal of Mathematics 4, 39-40.

Nigrini, M. (1996). A taxpayer compliance application of Benford’s law. Journal of the

American Taxation Association 1, 72-91.

Nigrini, M. (1999). I’ve got your number. Journal of Accountancy 187, 79-83.

Nigrini, M. and L. Mittermaier (1997). The use of Benford’s law as an aid in analytical

procedures. Auditing : A Journal of Practice and Theory 16(2), 52-67.

Pickover, C. (1990). All known replicating Fibonacci-digits less than one billion. Journal of

Recreational Mathematics 22(3), 176-178.

Pickover, C. (2000). Wonders of numbers. Oxford University Press.

Pinkham, R.S. (1961). On the distribution of first significant digits. Annals of Mathematical

Statistics 32, 1223-1230.

Raimi, R.A. (1969). The peculiar distribution of first digits. Scientific American, 109-120.

Raimi, R.A. (1976). The first digit problem. American Mathematical Monthly 83, 521-538.

9

Raimi, R.A. (1985). The first digit phenomenon again. Proceedings of the American

Philosophical Society 129, 211-219.

Robbins, D.P. (1991). The story of 1, 2, 7, 42, 429, 7436, ... . The Mathematical Intelligencer

13(2), 12-19.

Schatte, P. (1983). On H∞-summability and the uniform distribution of sequences. Math.

Nachr. 113, 237-243.

Sentance, W.A. (1973). A further analysis of Benford’s law. Fibonacci Quarterly 11, 490-494.

Shiriff, K. (1994). Computing replicating Fibonacci digits. Journal of Recreational

Mathematics 26(3), 191-192.

Tolle, C., Budzien, J. and R. LaViolette (2000). Do dynamical systems follow Benford’s law ?

Chaos 10, 331-337.

Vogt, W. (2000). Benford’s Gesetz : Steuer- und Budgetsündern auf der Spur – Zahlen lügen

nicht. Schweizer Versicherung 9, 27-29.

Webb, W. (1975). Distribution of the first digits of Fibonacci numbers. Fibonacci Quarterly

13, 334-336.

Weyl, H. (1916). Über die Gleichverteilung von Zahlen mod Eins. Mathematische Annalen

77, 313-352.

Whitney, R.E. (1972). Initial digits for the sequence of primes. American Mathematical

Monthly 79, 150-152.

Wlodarski, J. (1971). Fibonacci and Lucas Numbers tend to obey Benford’s law. Fibonacci

Quarterly 9, 87-88.

Werner Hürlimann

Schönholzweg 24

CH-8409 Winterthur

Switzerland

e-mail : whurlimann@bluewin.ch

URL : www.geocities.com/hurlimann53