Article

Entropy, Benford's first digit law, and the distribution of everything

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The history of the so called "Benford's Law", which concerns the distribution of the first significant digits in "natural" sets of measurements, is summarized, and its relation with exponential rank-size distributions (associated with geometric progressions of naturally- occurring quantities) is outlined. The physical significance of alternative distributions is then discussed by considering also the associated probability density functions, and it is shown that - under appropriate assumptions - exponential rank-size distributions can be derived from a maximum-entropy principle (in the information-theory sense as introduced by Shannon). Finally, naturally-occurring samples (e.g., surface areas of islands) are considered in detail and it is shown that they closely follow exponential rank-size distributions and satisfy both Benford's law and appropriately formulated principles of uniform probability density and maximum information entropy.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Benford not only rediscovered it in 1938, but also provided a vast number of data sets that conformed to Newcomb's logarithmic rule (Fewster, 2009; Ciofalo, 2009). Traditional analysis of Benford's law considered its applicability to data sets (Varian, 1972). ...
... In more recent scholarship, though, the focus has shifted to the study of probability distributions that obey Benford's law. These studies demonstrate that, if applicable, Benford's law is invariant under (1) an arbitrary change of scale; (2) an arbitrary raising to a power; and (3) an arbitrary change of the numerical basis (Hill, 1995a,b; Leemis, Schmeiser and Evans, 2000; Grendar, Judge and Schechter, 2007; Fewster, 2009; Ciofalo, 2009). Analyses of distributions satisfying Benford's law have also revealed that Benford compliance should not be expected in every random distribution (Leemis, Schmeiser and Evans, 2000). ...
Article
This paper introduces a naive Bayes classifier to detect electoral fraud using digit patterns in vote counts with authentic and synthetic data. The procedure is the following: (1) we create 10,000 simulated electoral contests between two parties using Monte Carlo methods. This training set is composed of two disjoint subsets: one containing electoral returns that follow a Benford distribution, and another where the vote counts are purposively "manipulated" by electoral tampering – a percentage of votes are taken away from one party and given to the other; (2) we calibrate membership values of the simulated elections (i.e. clean or fraudulent) using logistic regression; (3) we recover class-conditional densities using the relative frequencies from the training set; (4) we apply Bayes' rule to class-conditional probabilities and class priors to establish the membership probabilities of authentic observations. To illustrate our technique, we examine elections in the province of Buenos Aires (Argentina) between 1932 and 1942, a period with a checkered history of fraud. Our analysis allows us to successfully classify electoral contests according to their degree of fraud. More generally, our findings indicate that Benford's Law is an effective tool for identifying fraud, even when minimal information (i.e. electoral returns) is available.
... It has also been proposed that the underlying distribution leading to equation for ORF(G) is a Benford distribution, and that this gives genomes the following properties [164][165][166]: ...
... In more recent scholarship, though, the focus has shifted to the study of probability distributions that obey Benford's Law. These studies demonstrate that, if applicable, Benford's Law is invariant under (1) an arbitrary change of scale; (2) an arbitrary raising to a power; and (3) an arbitrary change of the numerical basis (Hill 1995aHill , 1995b Leemis, Schmeiser, and Evans 2000; Grendar, Judge, and Schechter 2007; Ciofalo 2009; Fewster 2009). 14 From a practical standpoint, Benford's Law is known to work better when the data in the sample cover several orders of magnitude and are not " artificially " biased in favor of any particular value. ...
Data
Full-text available
In this paper, we introduce an innovative method to diagnose electoral fraud using vote counts. Specifically, we use synthetic data to develop and train a fraud detection prototype. We employ a naive Bayes classifier as our learning algorithm and rely on digital analysis to identify the features that are most informative about class distinctions. To evaluate the detection capability of the classifier, we use authentic data drawn from a novel data set of district-level vote counts in the province of Buenos Aires (Argentina) between 1931 and 1941, a period with a checkered history of fraud. Our results corroborate the validity of our approach: The elections considered to be irregular (legitimate) by most historical accounts are unambiguously classified as fraudulent (clean) by the learner. More generally, our findings demonstrate the feasibility of generating and using synthetic data for training and testing an electoral fraud detection system.
... Benford not only rediscovered it in 1938, but also provided a vast number of data sets that conformed to Newcomb's logarithmic rule (Fewster, 2009; Ciofalo, 2009). Traditional analysis of Benford's law considered its applicability to data sets (Varian, 1972). ...
Article
Full-text available
In this paper we introduce an innovative method to diagnose electoral fraud using vote counts. First, to circumvent data availability problems and to study a particular type of fraud, we create synthetic data using Monte Carlo methods. Next, we build a supervised machine learning tool and use a Naive Bayes classier to distinguish between Benford and Benford-deviant data sets. To illustrate our technique, we examine elections in the province of Buenos Aires (Argentina) between 1931 and 1941, a period with a checkered history of fraud. Using a novel dataset of district-level vote counts, our results corroborate the validity of the conventional wisdom: Conservative manipulation of the electoral process, rather than changes in voters' preferences, led to the dramatic electoral shifts during this period. More generally, our endings indicate that Benford's Law can be an exective tool for identifying fraud, even when minimal information (i.e. electoral returns) is available.
Article
For ordinary‐chondrite (OC) mass distributions, Benford’s law applies to the set of individual objects that survive intact on the Earth’s surface after atmospheric disruption of meteoroids. Among OCs, Antarctic finds conform more closely to Benford’s law than observed falls, Northwest Africa (NWA) finds, or Oman finds mainly because Antarctic OCs tend to be relatively unweathered (and mostly intact) and have not been aggregated as pairs under collective meteorite names. Deviations from Benford’s law can result from tampering with data sets. The set of OC falls reflects tampering with the original Benford distribution (produced by meteoroid disruption) by the deliberate aggregation of paired individual samples and inefficiencies in the collection of small samples. The sets of NWA and Oman OC finds have been affected by natural “tampering” of the original distributions principally by terrestrial weathering, which can cause sample disintegration. NWA finds were also affected by non‐systematic collection of samples influenced by commercial considerations; collectors preferred type‐3 OC as revealed by the high proportions of such specimens among NWA chondrites relative to those among falls and Oman and Antarctic finds. The percentage of type‐4 OC among falls is appreciably lower than in the sets of finds. This suggests that type‐4 chondrites are friable and disintegrate into numerous pieces; these are counted individually for the sets of finds, but collectively for falls. However, the fact that the percentages of type‐3 OC are not generally higher for finds may be that these samples tend to break into small pieces that are preferentially lost.
Article
Biological organisms must perform computation as they grow, reproduce and evolve. Moreover, ever since Landauer’s bound was proposed, it has been known that all computation has some thermodynamic cost—and that the same computation can be achieved with greater or smaller thermodynamic cost depending on how it is implemented. Accordingly an important issue concerning the evolution of life is assessing the thermodynamic efficiency of the computations performed by organisms. This issue is interesting both from the perspective of how close life has come to maximally efficient computation (presumably under the pressure of natural selection), and from the practical perspective of what efficiencies we might hope that engineered biological computers might achieve, especially in comparison with current computational systems. Here we show that the computational efficiency of translation, defined as free energy expended per amino acid operation, outperforms the best supercomputers by several orders of magnitude, and is only about an order of magnitude worse than the Landauer bound. However, this efficiency depends strongly on the size and architecture of the cell in question. In particular, we show that the useful efficiency of an amino acid operation, defined as the bulk energy per amino acid polymerization, decreases for increasing bacterial size and converges to the polymerization cost of the ribosome. This cost of the largest bacteria does not change in cells as we progress through the major evolutionary shifts to both single- and multicellular eukaryotes. However, the rates of total computation per unit mass are non-monotonic in bacteria with increasing cell size, and also change across different biological architectures, including the shift from unicellular to multicellular eukaryotes. This article is part of the themed issue ‘Reconceptualizing the origins of life’.
Article
An experimental study was conducted for the heat transfer from hot walls to liquid water sprays. Four full cone, swirl spray nozzles were used at different upstream pressures, giving mass fluxes impinging on the wall, G, from 8 to 80 kg m−2 s−1, mean droplet velocities, U, from 13 to 28 m s−1 and mean droplet diameters, D, from 0.4 to 2.2 mm.A target consisting of two slabs of beryllium–copper alloy, each 4×5 cm in size and 1.1 mm thick, was electrically heated to about 300°C and then rapidly and symmetrically cooled by water sprays issuing from two identical nozzles. The midplane temperature was measured by a fast response, thin-foil thermocouple and the experimental data were regularized by Gaussian filtering.The inverse heat conduction problem was then solved by an approximation of the exact Stefan solution to yield the wall temperature Tw and the heat flux qw transferred to the spray at temperature Tf. As a result, cooling curves expressing the heat flux qw as a function of Tw−Tf were obtained. The single-phase heat transfer coefficient h and the maximum heat flux qc were found to depend upon the mass flux G and the droplet velocity U, while the droplet size D had a negligible independent influence. Simple correlations for h and qc were proposed.
Article
Recent research has focused on studying the patterns in the digits of closely followed stock market indexes. In this paper we find that the series of 1-day returns on the Dow-Jones Industrial Average Index (DJIA) and the Standard and Poor's Index (S&P) reasonably agrees with Benford's law and therefore belongs to the family of anomalous or outlaw numbers.
Article
An abstract is not available.
Article
A derivation of Benford's Law or the First-Digit Phenomenon is given assuming only base-invariance of the underlying law. The only base-invariant distributions are shown to be convex combinations of two extremal probabilities, one corresponding to point mass and the other a log-Lebesgue measure. The main tools in the proof are identification of an appropriate mantissa σ-algebra on the positive reals, and results for invariant measures on the circle.
Article
Subtitled "An introduction to human ecology," this work attempts systematically to treat "least effort" (and its derivatives) as the principle underlying a multiplicity of individual and collective behaviors, variously but regularly distributed. The general orientation is quantitative, and the principle is widely interpreted and applied. After a brief elaboration of principles and a brief summary of pertinent studies (mostly in psychology), Part One (Language and the structure of the personality) develops 8 chapters on its theme, ranging from regularities within language per se to material on individual psychology. Part Two (Human relations: a case of intraspecies balance) contains chapters on "The economy of geography," "Intranational and international cooperation and conflict," "The distribution of economic power and social status," and "Prestige values and cultural vogues"—all developed in terms of the central theme. 20 pages of references with some annotation, keyed to the index. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This article will concentrate on decimal (base 10) representations and significant digits; the corresponding analog of (3) for other bases b>1 is simply Prob(mantissa (base b) # t/b)=log
Looking Out For Number One
  • J Walthoe
  • R Hunt
  • M Pearson
Walthoe, J., Hunt, R. and Pearson, M., Looking Out For Number One, Millennium Mathematics Project web pages (http://pass.maths.org.uk/issue9/features/benford/), 2001.
Benford's law and Zipf's law, cut-the-knot web pages
  • A Bogomolny
Bogomolny, A., Benford's law and Zipf's law, cut-the-knot web pages (http: //www.cutthe-knot.com/do_you_know/zipfLaw.html), 1996-2000.
Following Benford's law, or looking out for
  • M W Browne
Browne, M.W., Following Benford's law, or looking out for No.1, The New York Times on the Web (http://courses.nus.edu.sg/course/mathelmr/080498sci-benford.html), 4 August 1998.
Treasure troves of mathematics -Benford's law page
  • Eric W Weisstein
Weisstein, Eric W., Treasure troves of mathematics -Benford's law page, Wolfram Research Mathworld web pages (http://mathworld.wolfram.com/BenfordsLaw.html), 1996-2000.
The use of Benford's law as an aid in analytical procedures
  • Mark J Nigrini
  • Linda J Metternaier
Nigrini, Mark J., and Metternaier, Linda J., The use of Benford's law as an aid in analytical procedures, Auditing: a Journal of Practice and Theory, Vol.16, No. 2, pp. 52-67, Fall 1997.
He's got their number: scholar uses math to foil financial fraud
  • Noauthor
Noauthor, He's got their number: scholar uses math to foil financial fraud, Wall Street Journal, 10 July 1995.
  • David Salsburg
Salsburg, David, Digit Preferences in the Bible, Chance, Vol. 10, No. 4, pp.46-48, 1997.
Spray cooling of hot surfaces
  • Louis Bolle
  • Jean C Moreau
Bolle, Louis and Moreau, Jean C., Spray cooling of hot surfaces, in Multiphase Science and Technology, G.F. Hewitt, J.M. Delhaye and N. Zuber, eds., Hemisphere -McGraw-Hill, New York, Vol.1, pp.1-98, 1982.
Psycho-biology of languages
  • George Zipf
  • Kingsley
Zipf, George Kingsley, Psycho-biology of languages, Houghton-Mufflin, 1935 (reprinted by MIT Press, 1965).
An informational theory of the statistical structure of languages
  • Benoit Mandelbrot
Mandelbrot, Benoit, An informational theory of the statistical structure of languages, in Communication theory, W. Jackson, ed., pp.486-502, Betterworth, 1953.
Shannon's Choice" (with apologies to Wm. Styron), web site cypherpunks
  • Alan Westrope
  • V I Entropy
Westrope, Alan, in Entropy VI: "Shannon's Choice" (with apologies to Wm. Styron), web site cypherpunks.venona.com/date/1995/09/msg01803.html.