To read the full-text of this research, you can request a copy directly from the author.
... Naturally, a question arose whether Benford's Law can or cannot be proved mathematically. In particular, T. P. Hill (Hill, 1995a;Hill, 1995b;and Hill, 1998), and R. A. Raimi (Raimi, 1969a;Raimi, 1969b;and Raimi, 1976) tried to fi nd such a proof, but no strict mathematical proof was found. 6 If nothing else, their theoretical eff orts led to an approximate formulation of Benford's Law validity: if we take random samples from arbitrary distributions, the collection of these random samples approximately obey the Benford's Law. ...
Benford's Law (sometimes also called Benford's Distribution or Benford's Test) is one of the possible tools for verification of a data structure in a given file regarding the relative frequencies of occurrence of the first (or second, etc.) digit from the left . If it is used as a goodness-of-fit test on sample data, there are usually no problems with its interpretation. However, certain factual questions arise in connection with validity of Benford's Law in large data sets in governmental statistics; such questions should be resolved before the law is used. In this paper we discuss the application potential of Benford's Law when working with extensive data sets in the areas of economic and social statistics.
... Since its second discovery in 1938, many attempts have been tried to explain the underlying reason for Benford's law. For theoretical reviews, see papers written by Raimi [23,24,25] and Hill [26,27,28,29,30]. Nowadays, many breakthrough points have been achieved in this domain, though, there still lacks a universally accepted final answer. ...
The occurrence of digits one through nine as the leftmost nonzero digit of numbers from real world sources is often not uniformly distributed, but instead, is distributed according to a logarithmic law, known as Benford's law. Here, we investigate systematically the mantissa distributions of some pulsar quantities, and find that for most quantities their first digits conform to this law. However, the barycentric period shows significant deviation from the usual distribution, but satisfies a generalized Benford's law roughly. Therefore pulsars can serve as an ideal assemblage to study the first digit distributions of real world data, and the observations can be used to constrain theoretical models of pulsar behavior.
... Raimi furthered the mathematical understanding of BL, using Banach and other scale invariances (Raimi, 1969a), and also its popularity (Raimi, 1969b). Both papers cite only Benford (1938) which received 4 citations in 1969 whereas none cited Newcomb (1881) which received no citation at all in this year (Fig. 2). ...
Benford's law is an empirical observation, first reported by Simon Newcomb in 1881 and then independently by Frank Benford in 1938: the first significant digits of numbers in large data are often distributed according to a logarithmically decreasing function. Being contrary to intuition, the law was forgotten as a mere curious observation. However, in the last two decades, relevant literature has grown exponentially, - an evolution typical of "Sleeping Beauties" (SBs) publications that go unnoticed (sleep) for a long time and then suddenly become center of attention (are awakened). Thus, in the present study, we show that Newcomb (1881) and Benford (1938) papers are clearly SBs. The former was in deep sleep for 110 years whereas the latter was in deep sleep for a comparatively lesser period of 31 years up to 1968, and in a state of less deep sleep for another 27 years up to 1995. Both SBs were awakened in the year 1995 by Hill (1995a). In so doing, we show that the waking prince (Hill, 1995a) is more often quoted than the SB whom he kissed, - in this Benford's law case, wondering whether this is a general effect, - to be usefully studied.
... 74 Raimi (1976) provides a selection of early reasonings as to why data may follow the Benford distribution. 75 Consequently, if a set of numbers does not follow the Benford distribution, multiplication by some constant may move the set further away or closer to the Benford frequencies, but no constant exists that turns a non-Benford distribution into a Benford distribution (Raimi, 1969). 76 This also implies that the data are not subject to artificial truncation, psychological barriers (for example, prices concentrated at specific values such as $1.99), or structural shifts. ...
... 6 Thus, it could be said that if a given dataset is to have an identifiable leading-digit distribution, then it must follow Benford's Law. 7 However, there is no requirement that all datasets must have a stable distribution of leading digits, and thus not all phenomena follow Benford's Law. ...
Benford's Law describes the finding that the distribution of leading (or
leftmost) digits of innumerable datasets follows a well-defined logarithmic
trend, rather than an intuitive uniformity. In practice this means that the
most common leading digit is 1, with an expected frequency of 30.1%, and the
least common is 9, with an expected frequency of 4.6%. The history and
development of Benford's Law is inexorably linked to physics, yet there has
been a dearth of physics-related Benford datasets reported in the literature.
Currently, the most common application of Benford's Law is in detecting number
invention and tampering such as found in accounting-, tax-, and voter-fraud. We
demonstrate that answers to end-of-chapter exercises in physics and chemistry
textbooks conform to Benford's Law. Subsequently, we investigate whether this
fact can be used to gain advantage over random guessing in multiple-choice
tests, and find that while testbank answers in introductory physics closely
conform to Benford's Law, the testbank is nonetheless secure against such a
Benford's attack for banal reasons.
Benford's Law is an interesting and unexpected empirical phenomenon-that if we take a large list of number from real data, the first digits of these numbers follow a certain non-uniform distribution. This law is actively used in economics and finance to check that the data in financial reports are real-and not improperly modified by the reporting company. The first challenge is that the cheaters know about it, and make sure that their modified data satisfies Benford's law. The second challenge related to this law is that lately, another application of this law has been discovered-namely, an application to deep learning, one of the most effective and most promising machine learning techniques. It turned out that the neurons' weights obey this law only at the difficult-to-detect stage when the fitting is optimal-and when further attempts attempt to fit will lead to the undesirable over-fitting. In this paper, we provide a possible solution to both challenges: we show how to use this law to make financial cheating practically impossible, and we provide qualitative explanation for the effectiveness of Benford's Law in machine learning.
In this paper we first discuss translation invariant measures and probabilities on and then use the results to describe a possible approach to the Benford law.
Benford’s law is an empirical observation, first reported by Simon Newcomb in 1881 and then independently by Frank Benford in 1938: the first significant digits of numbers in large data are often distributed according to a logarithmically decreasing function. Being contrary to intuition, the law was forgotten as a mere curious observation. However, in the last two decades, relevant literature has grown exponentially, - an evolution typical of ”Sleeping Beauties” (SBs) publications that go unnoticed (sleep) for a long time and then suddenly become center of attention (are awakened). Thus, in the present study, we show that Newcomb (1881) and Benford (1938) papers are clearly SBs. The former was in deep sleep for 110 years whereas the latter was in deep sleep for a comparatively lesser period of 31 years up to 1968, and in a state of less deep sleep for another 27 years up to 1995. Both SBs were awakened in the year 1995 by Hill (1995a). In so doing, we show that the waking prince (Hill, 1995a) is more often quoted than the SB whom he kissed, - in this Benford’s law case, wondering whether this is a general effect, - to be usefully studied.
Probabilistic models of floating point and logarithmic arithmetic are constructed using assumptions with both theoretical and empirical justification. The justification of these assumptions resolves open questions in Hamming (1970) and Bustoz et al. (1979).
These models are applied to errors from sums and inner products.
A comparison is made between the error analysis properties of floating point and logarithmic computers. We conclude that the logarithmic computer has smaller error confidence intervals for roundoff errors than a floating point computer with the same computer word size and approximately the same number range.
It seems empirically that the first digits of random numbers do not occur with equal frequency, but that the earlier digits
appear more often than the latters. This peculiality was at first noticed by F. Benford, hence this phenomenon is called Benford's
law.
In this note, we fix the set of all positive integers as a model population and we sample random integers from this population
according to a certain sampling procedure. For polynomial sampling procedures, we prove that random sampled integers do not
necessarily obey Benford's law but their Banach limit does. We also prove Benford's law for geometrical sampling procedures
and for linear recurrence sampling procedures.
The logarithmic distribution is commonly used to model mantissae of floating point numbers. It is known that floating point products of logarithmically distributed mantissae are logarithmically distributed, while floating point sums are not. In this paper a distribution for floating point sums is derived, and for a special case of logarithmically distributed mantissae the deviation of this distribution from the logarithmic distribution is determined.Zur Modellierung der Mantissen von Gleitpunktzahlen wird im allgemeinen eine logarithmische Verteilung verwendet. Dabei sind Gleitpunkt-Produkte von logarithmisch verteilten Mantissen wieder logarithmisch verteilt, nicht jedoch Gleitpunkt-Summen. In dieser Arbeit wird eine Verteilung fr Gleitpunkt-Summen hergeleitet; fr einen Spezialfall wird die Abweichung dieser Verteilung von der logarithmischen Verteilung bestimmt.
Generalized logarithmic law is derived for the distribution of the first t significant digits of a random digital integer. This result is then used to determine the distribution of the roundoff errors in floating-point operations, which is a mixture of uniform and reciprocal distributions.
Zipf 's experimental law states that, for a given large piece of text, the product of the relative frequency of a word and its order in descending frequency order is a constant, shown to be equal to 1 divided by the natural logarithm of the number of different words. It is shown to be approximately equal to Benford's logarithmic distribution of first significant digits in tables of numbers. Eleven samples allow comparison of observed and theoretical frequencies.
In the first part of this paper the Arens multiplicationon a space of bounded functions is used to simplify and extendresults by Day and Frey on amenability of subsemigroupa andideals of a semigroup. For example it is shown that if S isa left amenable cancellation semigroup then a subsemigroupA of S is left amenable if and only if each two right idealsof A intersect. The remainder and major portion of thispaper is devoted to relations between left invariant means onm(S) and left ideals of βS (=the Stone-Cech compactiacationof S). We find: If μ is a left invariant mean on m(S) and if S has left cancellation then L(μ), the support of μ considered as a Bore measure onβ(S), is a left ideal of β(S).An application is that if S is a left amenable semigroup and f is a left ideal of μS, then K(I) the w*-closed convex hullof I, contains an extreme left invariant mean; if in additionS has cancellation then K(I) contains a left invariant meanwhich is the w*-limit of a net of unweighted finite averages.
Théorie des Opérations Linéaires, Warsaw, 1932, and Chelsea