Content uploaded by Werner Hürlimann
Author content
All content in this area was uploaded by Werner Hürlimann on Oct 08, 2014
Content may be subject to copyright.
A GENERALIZED BENFORD LAW AND ITS APPLICATION
By Werner Hürlimann, Switzerland
Abstract.
A simple one-parameter analytical extension of Benford’s law for first digits of numerical
data is constructed. Based on the maximum likelihood method, the fitting capabilities of the
new distribution is illustrated at some interesting and important integer sequences including
the numeri ideoni, the Keith, Princeton, lucky, Ulam and Bell numbers, as well as the
sequence of primes. Benford’s law of the mixing of the considered data sets is rejected at the
5% significance level while the generalized Benford law is accepted with a 25% p-value.
Confirming the statistical evidence, it is shown that the first digits of the Bell numbers satisfy
Benford’s law.
Keywords : first digit, Benford’s law, uniform distribution, triangular distribution,
two-sided power distribution, integer sequences, mixing sequences, Bell number
MSC 2000 : 11B73, 11B83, 11K31, 62E15, 62E17
1. Introduction.
Since Newcomb(1881) and Benford(1938) it is known that many numerical data sets obey
Benford’s law. To be specific, if the random variable X, which describes the first significant
digit in a numerical table, is Benford distributed, then
()
()
PX d d== +
−
log 1 1,
{}
d∈19,..., . (1.1)
Mathematical explanations of this law have been proposed by Pinkham(1961),
Cohen(1976), Hill(1995a/b/c,97,98), Allart(1997), Janvresse and de la Rue(2003). In recent
years an upsurge of applications of Benford’s law have appeared including work by
Becker(1982), Burke and Kincanon(1991), Buck et al.(1993), Ley(1996), Nigrini(1996/99),
Nigrini and Mittermaier(1997), Vogt(2000), Tolle et al.(2000) and Berger et al.(2002).
Hill(1995c) also suggested to switch the attention to probability distributions that follow
or closely approximate Benford’s law. Papers along this path include Leemis et al.(2000) and
Engel and Leuenberger(2003). Some survival distributions, which satisfy exactly Benford’s
law, are known. However, there does not seem to exist simple analytical distributions, which
include as special case Benford’s law. Combining facts from Leemis et al.(2000) and Dorp
and Kotz(2002), such a simple one-parameter family of distributions is constructed.
The interest of the enlarged Benford law is two-fold. First, it may provide a better fit of
the data than Benford’s law. Second, it yields a simple statistical procedure to validate
Benford’s law. If Benford’s model is sufficiently “close” to the one-parameter extended
model, then it will be retained. These points will be illustrated by some typical examples.
2
2. A generalized Benford distribution.
If T denotes a random lifetime with survival distribution
S
t
P
T
t
() ( )=≥, then the value
Y of the first significant digit in the lifetime T has probability distribution
() ( )
{}
{}
PY y S y S y y
ii
i
== ⋅ − +⋅ ∈
=−∞
∞
∑( )() , ,...,10 1 10 1 9 . (2.1)
Alternatively, if D denotes the integer-valued random variable satisfying
10 10 1DD
T≤< +, (2.2)
then the first significant digit can be written in terms of T and D as
[][]
YT DTD
=⋅ =
−−
10 10log , (2.3)
where
[]
x denotes the greatest integer less than or equal to x. In particular, if the random
variable ZTD=−log is uniformly distributed as U(,)01 , then the first significant digit Y
is exactly Benford distributed. Starting from the uniform random variable
W
U=(,)02 or the
triangular random variable W Triangular=(,,)0 12 with probability density function
fw w
W()= if
()
w∈01, and fw w
W()=−2 if
[
)
w∈12,, one shows that the random
lifetime TW
=10 generates the first digit Benford distribution (Leemis et al.(2000),
Examples 1 and 2).
A simple parametric distribution, which includes as special cases both the above uniform
and triangular distributions, is the two-sided power random variable
()
WTSP c=
α
, recently
considered in Dorp and Kotz(2002) with probability density function
fw
W()=
()
()
cw
c
cw
c
w
w
2
1
2
2
2
1
0
2
α
α
α
α
−
−
−
−
<≤
≤<
,,
,.
(2.4)
If c=1 then
W
U=(,)02 , and if c==21,
α
then W Triangular=(,,)012 . This
observation shows that the random lifetime
()
TTSP c
=10 1, will generate first digit
distributions closely related to Benford’s distribution, at least if c is close to 1 or 2.
Theorem 2.1. Let
()
WTSPc=1, be the two-sided power random variable with probability
density function
fw
W()=
()
cc
cc
ww
ww
2
1
2
1
01
212
−
−
<≤
−≤<
,,
,,
(2.5)
and let the integer-valued random variable D satisfy D
W
D≤<+1. Then the first digit
random variable
[]
YWD
=−
10 has the one-parameter generalized Benford probability density
function
[][][ ][]
{}
{}
fy y y y y y
Y
cc c c
( ) log( ) log log( ) log , ,..., .=+−−−++− ∈
1
2111119 (2.6)
3
Proof. The probability density function of TW
=10 is given by
ft
T()=
()
()
c
t
tc
c
t
tc
t
t
210 10
1
210 10
1
010
2 10 100
⋅
−
⋅
−
<≤
−≤<
ln
ln
ln
ln
ln
ln
,,
,.
It follows that the first significant digit of T, namely
[]
YT D
=⋅
−
10 , has probability density
function
fy ftdt ftdt
YT
y
y
T
y
y
() () ()
()
=+
++
∫∫
1
10
10 1
.
Making the change of variable ut=ln / ln10 , one obtains
f y cu du c u du
Y
c
y
y
c
y
y
() ( )
log
log( )
log
log( )
=+−
−
+
−
+
++
∫∫
1
22
1
1
1
1
11
,
which upon integration immediately implies (2.6). ◊
3. Fitting the first digit distributions of some integer sequences.
Maximum likelihood estimation of the generalized Benford distribution is straightforward.
To simplify notation, set
{}
ayb yy
Yy
==+∈log , log( ), ,... ,1 1 9 , and consider the
derivative of Lfy
yY
=ln ( ) with respect to the parameter c, which is given by
∂
∂
cLy=
ln ln( ) ( )
() ,,
ln ln ln( ) ( ) ln( ) ( )
() , ,..., ,
ln ln()()
() ,.
bb b b
bb y
bb a a b b a a
bb y
aa a a
aa y
cc
cc
yy
c
YY
c
YY
c
YY
c
cc
cc
cc
11 1 1
11
11
99 9 9
99
11
11 1
11 11
11 28
11
11 9
⋅− − ⋅−
−− + =
⋅− ⋅− − ⋅− + − ⋅−
−− + =
−⋅+−⋅−
−+− =
If
{}
Ny
y, ,...,∈1 9 , is the number of observations with first digit y, then the maximum
likelihood estimator (MLE) of c will solve the non-linear equation
NcL
yy
y
⋅=
=
∑
∂
∂
1
9
0. (3.1)
In some cases, more than one solution may exist, usually near c=1 and c=2. In this
situation, the solution with greatest value of the log-likelihood ln ln ( )LNfy
yY
y
=⋅
=
∑
1
9
will
be chosen as MLE.
4
The fitting capabilities of the new distribution is illustrated at some interesting and
important integer sequences. The first digit occurrences of the analyzed sequences are listed
in Table 3.1. Then, we give some comments on the interest of these sequences. Statistical
results are summarized in Table 3.2. Besides the value of the MLE of the parameter c, we list
the negative log-likelihood, the chi-square values and their corresponding p-values. Finally,
the obtained results are discussed.
Table 3.1 : First digit distributions of some integer sequences
First digit occurrences
Name of sequence Sample
size 1 2 3 4 5 6 7 8 9
Numeri ideoni 65 20 12 9 7 4 2 5 4 2
Keith number 71 23 10 10 5 3 5 9 2 4
Princeton number 25 7 2 3 3 2 3 2 1 2
Lucky number 45 19 8 4 2 1 3 4 1 3
Ulam number 44 20 6 3 3 2 3 2 3 2
Fibonacci number 100 30 18 13 9 8 6 5 7 4
Bell number 100 31 15 10 12 10 8 5 6 3
Prime number 168 25 19 19 20 17 18 18 17 15
Mixing sequence 618 175 90 71 61 47 48 50 41 35
According to Euler, a natural number n is a numeri ideoni if it fulfills the following
condition. If a number p has a unique representation in the form px ny=+⋅
22
with
relatively prime numbers x and y, then it is a prime number. Euler discovered the fact that
there are only 65 numeri ideoni, with n=1848 the largest one. The proof of finiteness has
been given by Heilbronn and Chowla in 1934 (see Fellmann(1983), p. 37). Keith numbers or
repfigit numbers (replicating Fibonacci digits) have been introduced by Keith(1987) (see also
Pickover(1990/2000), Shiriff(1994), Keith(1994)). A Keith number is an n-digit integer N
with the following property. If a Fibonacci-like sequence (in which each term in the sequence
is the sum of the n previous terms) is formed, with the first n terms being the decimal digits
of the number N, then N itself occurs as a term in the sequence. For example, 47 is a Keith
number since it generates the sequence 4, 7, 11, 18, 29, 47. There are 71 Keith numbers up to
1019 , as determined by Keith(1998). Moreover, it has been suggested that there is an infinite
number of Keith numbers. The story of Princeton numbers is found in Robbins(1991) (see
also Pickover(2000)). Lucky numbers and Ulam numbers are defined in Guy(1981), p. 60 (for
the latter ones see also Pickover(2000)). The Fibonacci numbers are well-known and need no
special reference. The n-th Bell number represents the number of ways to partition n things
into subsets (e.g. Aigner(1975), Jacobs(1983), Graham et al.(1989), Conway and Guy(1996)).
Prime numbers are ubiquitous and known to everybody. The mixing sequence represents the
aggregate of all previous integer sequences.
5
Table 3.2 : Fitting integer sequences to the Benford and generalized Benford distributions
Benford generalized Benford
Name of sequence MLE $
c
−ln L
χ
2 p-value −ln L
χ
2 p-value
Numeri ideoni 1.15087 127.76 2.59 95.7 127.71 2.52 92.5
Keith number 2.63445 142.49 9.21 32.4 141.98 7.72 35.8
Princeton number 2.81958 53.14 3.45 90.3 52.83 2.87 89.7
Lucky number 3.15802 83.96 7.69 46.4 82.78 5.16 64.0
Ulam number 3.48760 81.78 6.35 60.8 79.83 2.53 92.5
Fibonacci number 2.01802 199.22 1.03 99.8 199.22 1.02 99.4
Bell number 2.01106 200.39 3.07 93.0 200.39 3.07 87.9
Prime number 2.75660 389.05 45.02 4105
⋅− 387.04 37.03 46 10 4
.⋅−
Mixing sequence 2.52473 1277.65 15.55 4.93 1274.66 9.02 25.1
Taking into account the finiteness of the sequence of numeri ideoni, it is quite surpising how
well this sequence fits both first digit distributions. Among the Keith, Princeton, lucky and
Ulam numbers, the Princeton ones are the most likely to fit Benford’s law. Unfortunately, the
considered sequence is up to now only limited to a relatively small sample size. Comparing
p-values, the lucky and Ulam numbers are better fitted by the generalized Benford
distribution. The Fibonacci and Bell numbers are very close to Benford’s law. In fact, both
follow Benford’s law. For the Fibonacci sequence, this is shown in a number of papers
including Brown and Duncan(1970), Wlodarski(1971), Sentance(1973), Webb(1975),
Raimi(1976), Brady(1978), Kunoff(1987). An analytical proof that Bell numbers are Benford
distributed is given in Theorem 4.1. Both Benford distributions are rejected for the sequence
of primes. This statistical evidence proof is simpler than the analytical derivation proposed in
Diaconis(1977), p. 74. However, note that the sequence of primes is Benford distributed with
respect to other densities rather than with the usual natural density (e.g. Whitney(1972),
Schatte(1983), Cohen and Katz(1984)). Despite the observation that mixing sequences are
often Benford distributed (e.g. Benford(1938), Raimi(1969/85), Knuth(1969)), a fact proven
under some hypotheses by Hill(1995c), Theorem 3, it is perhaps surprising that our mixing
sequence is rejected at the 5% significance level. In contrast to this, the generalized Benford
law is accepted with a 25% p-value. Moreover, it is not clear how some sequences in Table
3.2 will behave by increasing the sample size. To clarify these and other facts, future research
is strongly required in this area.
4. The first digits of the Bell numbers satisfy Benford’s law.
To confirm the statistical evidence of Table 3.2, it is shown that the first digits of the Bell
numbers satisfy Benford’s law.
Theorem 4.1. The sequence of first digits of the Bell numbers are Benford distributed.
Proof. Recall that the Bell numbers satisfy the recurrence relationship bn
kb
nk
k
n
+=
=
∑
1
0
,
bb
12
12==, (e.g. Graham et al.(1989), p. 359). The sequence
{}
bn is Benford distributed if
6
and only if the sequence
{}
logbn is uniformly distributed modulo 1 (e.g. Diaconis(1977),
Theorem 1). To show the latter, consider first the sequence
xne
nn
n
nn nn n
=⋅
++
+⋅ −⋅ −
−
ln ln ( )
ln ln
1
21
1
12
,
which by Graham et al.(1989), p. 479 and 573, is an asymptotic estimate of
{}
lnbn. The
simpler sequence
yne
n
n
nn nn n
=⋅
+
+⋅ −⋅ −
ln ln
ln ln
1
21
1
satisfies the property
{}
lim
nnn
xy
→∞ −=0. In the following, we apply some useful results from
the theory of uniformly distributed sequences modulo 1 from Kuipers and Niederreiter(1974),
pp. 1-18. First, if two sequences
{}
xn and
{}
yn satisfy asymptotically a relation
{}
lim .
nnn
x y const
→∞ −= , and if
{}
yn is uniformly distributed modulo 1, then
{}
xn is also
uniformly distributed modulo 1. Therefore, it suffices to show that
{}
yn/ ln10 is uniformly
distributed modulo 1. Second, one applies the criterion of Weyl(1916), which says that a
sequence
{}
an is uniformly distributed modulo 1 if and only if
lim log
N
ih a
n
N
Nen
→∞ =
∑=
10
2
1
π
for all hN∈+.
Third, to show this, we use Van der Corput’s estimate for trigonometric sums. Let a and b
be integers with ab<, and let f be twice differentiable on
[
)
ab, with fx''( ) ≥>
ρ
0 or
fx''( ) ≤− <
ρ
0 for
[
)
xab∈,. Then
()
efbfa
if n
na
b
2243
π
ρ
() '( ) '( )
=
∑≤−+⋅+
.
Now, set fx hgx() ()/ln=⋅ 10 such that fn hy
n
() /ln=⋅ 10 with
()
()
gx x x x x x x x( ) ln ln ln ln ln=− −+ + − +11
1
2
1
2.
One has
() () ( )
[]
gxxxx xx'( ) ln ln ln=++−+
−−
21 1
221 ,
gx gx gx gx''( ) ( ) ( ) ( )=++
123
, with
7
()
gx x
xgx x
xgx x
xx
122 3 22
21
221
2
2
1
() , () ln ,() ln
ln
=−=⋅ = ⋅ +
+.
We show that g
x
'' ( ) is decreasing for all
x
≥1. This follows easily noting that
gx x
x
13
10
'()=−≤ for
x
≥1, gx x
x
22
10
'() ln
=−≤ for
x
e≥,
()
()
gx xx
xx
3
2
1
2
71 2
10
'() ln (ln )
ln
=− ++
+<, and gx gx
23
0
''
() ()+≤ for
x
≥1.
With abN==1, , it follows that for all
[]
xN∈1, :
()
gx gN N
N
N
N
NN
N
''( ) ''( ) ln ln
ln
≥≥+⋅+⋅
+
+≥
1
221
2
2
1
1
2
22,
which implies that fx hgx hN''( ) ''( )/ ln /=≥=>10 6 0
ρ
. Van der Corput’s estimate yields
() ()
ehNN
NN N
N
h
if n
n
N
2
1
2
10
1
2
1
21 24
63
π
()
ln ln ln ln
=
∑≤++−
++
⋅+
,
which implies that lim ()
n
if n
n
N
Ne
→∞ =
∑=
10
2
1
π
. Therefore, the asymptotically equivalent sequences
{}
yn/ln10 ,
{}
xn/ln10 and
{}
logbn are all uniformly distributed modulo 1. The result is
shown. ◊
References.
Aigner, M. (1975). Kombinatorik. I. Grundlagen und Zähltheorie. Springer.
Allaart, P.C. (1997). An invariant-sum characterization of Benford’s law. Journal of Applied
Probability 34, 288-291.
Becker, P. (1982). Patterns in listings of failure-rate and MTTF values and listings of other
data. IEEE Transactions on Reliability R-31, 132-134.
Berger, A., Bunimovich, L.A. and T.P. Hill (2002). One-dimensional dynamical systems and
Benford’s law. Appears in Transactions of the American Mathematical Society.
Benford, F. (1938). The law of anomalous numbers. Proceedings of the American
Philosophical Society 78, 551-572.
Brady, W.G. (1978). More on Benford’s law. Fibonacci Quarterly 16, 51-52.
Brown, J. and R. Duncan (1970). Modulo one uniform distribution of the sequence of
logarithms of certain recursive sequences. Fibonacci Quarterly 8, 482-486.
Buck, B., Merchant, A. and S. Perez (1993). An illustration of Benford’s first digit law using
alpha decay half lives. European Journal of Physics 14, 59-63.
Burke, J. and E. Kincanon (1991). Benford’s law and physical constants : the distribution of
initial digits. American Journal of Physics 59, 952.
Cohen, D. (1984). An explanation of the first digit phenomenon. Journal of Combinatorial
Theory (A) 20, 367-370.
8
Cohen, D. and T. Katz (1984). Prime numbers and the first digit phenomenon. Journal of
Number Theory 18, 261-268.
Conway, J.H. and R.K. Guy (1996). The Book of Numbers. Springer.
Diaconis, P. (1977). The distribution of leading digits and uniform distribution mod 1. Annals
of Probability 5, 72-81.
Dorp, J.R. van and S. Kotz (2002). The standard two-sided power distribution and its
properties : with applications in financial engineering. The American Statistician
56(2), 90-99.
Engel, H.-A. and C. Leuenberger (2003). Benford’s law for exponential random variables.
Statistics and Probability Letters.
Fellmann, E.A. (1983). Leonhard Euler – Ein Essay über Leben und Werk. In : Leonhard
Euler 1707-1783. Beiträge zu Leben und Werk. Birkhäuser Verlag Basel.
Graham, R.L., Knuth, D.E. and O. Patashnik (1989). Concrete Mathematics. A Foundation
for Computer Science. Addison-Wesley.
Guy, R.K. (1981). Unsolved Problems in Number Theory. Problem Books in Mathematics,
vol. 1. Springer.
Hill, T.P. (1995a). Base-invariance implies Benford’s law. Proceedings of the American
Mathematical Society 123, 887-895.
Hill, T.P. (1995b). The significant-digit phenomenon. Amer. Math. Monthly 102, 322-326.
Hill, T.P. (1995c). A statistical derivation of the significant-digit law. Statistical Science 10,
354-363.
Hill, T.P. (1997). Benford’s law. Encyclopedia of Mathematics Supplement, vol. 1, 102.
Hill, T.P. (1998). The first digit phenomenon. The American Scientist 10(4), 354-363.
Jacobs, K. (1983). Einführung in die Kombinatorik. Walter de Gruyter.
Keith, M. (1987). Repfigit numbers. Journal of Recreational Mathematics 19(1), 41-42.
Keith, M. (1994). All repfigit numbers less than 100 billion. Journal of Recreational
Mathematics 26(3), 181-184.
Keith, M. (1998). Determination of all Keith numbers up to 1019 . Available at
http://users.aol.com/s6sj7gt/keithnum.htm.
Knuth, D. (1969). The Art of Computer Programming, vol. 2, 219-229. (2nd ed.).
Addison-Wesley, Reading, MA, 239-249. (3rd ed.) (1998), 254-262.
Kuipers, L. and H. Niederreiter (1973). Uniform Distribution of Sequences. J. Wiley.
Kunoff, S. (1987). N! has the first digit property. Fibonacci Quarterly 25, 365-367.
Leemis, L.M., Schmeiser, B.W. and D.L. Evans (2000). Survival distributions satisfying
Benford’s law. The American Statistician 54(3), 1-6.
Ley, E. (1996). On the peculiar distribution of the U.S. Stock Indices Digits. The American
Statistician 50(4), 311-313.
Newcomb, S. (1881). Note on the frequency of use of the different digits in natural numbers.
American Journal of Mathematics 4, 39-40.
Nigrini, M. (1996). A taxpayer compliance application of Benford’s law. Journal of the
American Taxation Association 1, 72-91.
Nigrini, M. (1999). I’ve got your number. Journal of Accountancy 187, 79-83.
Nigrini, M. and L. Mittermaier (1997). The use of Benford’s law as an aid in analytical
procedures. Auditing : A Journal of Practice and Theory 16(2), 52-67.
Pickover, C. (1990). All known replicating Fibonacci-digits less than one billion. Journal of
Recreational Mathematics 22(3), 176-178.
Pickover, C. (2000). Wonders of numbers. Oxford University Press.
Pinkham, R.S. (1961). On the distribution of first significant digits. Annals of Mathematical
Statistics 32, 1223-1230.
Raimi, R.A. (1969). The peculiar distribution of first digits. Scientific American, 109-120.
Raimi, R.A. (1976). The first digit problem. American Mathematical Monthly 83, 521-538.
9
Raimi, R.A. (1985). The first digit phenomenon again. Proceedings of the American
Philosophical Society 129, 211-219.
Robbins, D.P. (1991). The story of 1, 2, 7, 42, 429, 7436, ... . The Mathematical Intelligencer
13(2), 12-19.
Schatte, P. (1983). On H∞-summability and the uniform distribution of sequences. Math.
Nachr. 113, 237-243.
Sentance, W.A. (1973). A further analysis of Benford’s law. Fibonacci Quarterly 11, 490-494.
Shiriff, K. (1994). Computing replicating Fibonacci digits. Journal of Recreational
Mathematics 26(3), 191-192.
Tolle, C., Budzien, J. and R. LaViolette (2000). Do dynamical systems follow Benford’s law ?
Chaos 10, 331-337.
Vogt, W. (2000). Benford’s Gesetz : Steuer- und Budgetsündern auf der Spur – Zahlen lügen
nicht. Schweizer Versicherung 9, 27-29.
Webb, W. (1975). Distribution of the first digits of Fibonacci numbers. Fibonacci Quarterly
13, 334-336.
Weyl, H. (1916). Über die Gleichverteilung von Zahlen mod Eins. Mathematische Annalen
77, 313-352.
Whitney, R.E. (1972). Initial digits for the sequence of primes. American Mathematical
Monthly 79, 150-152.
Wlodarski, J. (1971). Fibonacci and Lucas Numbers tend to obey Benford’s law. Fibonacci
Quarterly 9, 87-88.
Werner Hürlimann
Schönholzweg 24
CH-8409 Winterthur
Switzerland
e-mail : whurlimann@bluewin.ch
URL : www.geocities.com/hurlimann53