Article

The Peculiar Distribution of First Digits

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... However, Benford's law, in short BL, has a much wider implementation portfolio, helping fraud detection, e.g., in accounting, bank transaction register, etc. The contribution of this paper is the verification of BL validity for electricity metering data sets (the consumption of the electric power), as well as the verification of first digit distribution deviation when comparing BL first digit distribution in natural and affected datasets [1,3,[6][7][8][9][10][11][12][13][14][15]. ...
... The value range in the dataset must be large enough, e.g., [14]: ...
... The Formula (2) describes the calculation of the first significant digit probability according to BL [14]: ...
Article
Full-text available
Benford’s law can be used as a method to detect non-natural changes in data sets with certain properties; in our case, the dataset was collected from electricity metering devices. In this paper, we present a theoretical background behind this law. We applied Benford’s law first digit probability distribution test for electricity metering data sets acquired from smart electricity meters, i.e., the natural data of electricity consumption acquired during a specific time interval. We present the results of Benford’s law distribution for an original measured dataset with no artificial intervention and a set of results for different kinds of affected datasets created by simulated artificial intervention. Comparing these two dataset types with each other and with the theoretical probability distribution provided us the proof that with this kind of data, Benford’s law can be applied and that it can extract the dataset’s artificial manipulation markers. As presented in the results part of the article, non-affected datasets mostly have a deviation from BL theoretical probability values below 10%, rarely between 10% and 20%. On the other side, simulated affected datasets show deviations mostly above 20%, often approximately 70%, but rarely lower than 20%, and this only in the case of affecting a small part of the original dataset (10%), which represents only a small magnitude of intervention.
... He gave numerous examples of data from many sources, including newspaper items, areas of rivers, street addresses, cost data, and populations. We give another example in Table 1, namely data based on the populations of 117 cities in Indiana, from Wikipedia based on the 2000 census. 1 For a good history up to 1975 with many references, see Raimi [17]. More references will be given at the end of the paper. ...
... In fact, equation (2) holds for all but countably many real numbers r in (1,10). Theorem 1 appears in Raimi [17] and [18], and it essentially appears in Diaconis [5]. ...
... (6) Figure 2 shows a picture of the circle of circumference 1 and a picture of the regions corresponding to where D = d for d = 1, 2, . . . , 9. Old-timers will be interested that Raimi [17] included a picture similar to our second picture in Figure 2 and that he discussed its relationship to a circular slide rule. This result is false if x is rational, because then the sequence (kx mod 1) takes on only finitely many values. ...
Article
Often data in the real world have the property that the first digit 1 appears about 30% of the time, the first digit 2 appears about 17% of the time, and so on with the first digit 9 appearing about 5% of the time. This phenomenon is known as Benford's law. This paper provides a simple explanation, suitable for nonmathematicians, of why Benford's law holds for data that have been growing (or shrinking) exponentially over time. Two theorems verify that Benford's law holds if the initial values and rates of growth of the data appear at random.
... Since then a large variety of publications [57][58][59][60][61][62][63][64][65][66][67][68][69][70] have appeared in the literature by mathematicians, statisticians, economists, engineers, physicists, electioneers and amateurs as observed by Raimi in Ref. [63] and by the end of 2021 the 'Benford Online Bibliography' [67] contains more than 1400 entries. Burgos and Santos in an article published in 2021 [71] mentioned that the applications of the Law cover topics as varied and prosaic as the study of the genome, atomic, nuclear and particle physics, astrophysics, quantum correlations, toxic emissions, biophysics, medicine, dynamical systems, distinction of chaos from noise, statistical physics, scientific citations, tax audits, electoral or scientific frauds, gross domestic product, stock markets, inflation data, climate change, world wide web, internet traffic, social networks, textbook exercises, image processing, religious activities, dates of birth, hydrology and geology, fragmentation processes, the first letters of words and of course the number of infections by Covid-19 in the word states. ...
... Since then a large variety of publications [57][58][59][60][61][62][63][64][65][66][67][68][69][70] have appeared in the literature by mathematicians, statisticians, economists, engineers, physicists, electioneers and amateurs as observed by Raimi in Ref. [63] and by the end of 2021 the 'Benford Online Bibliography' [67] contains more than 1400 entries. Burgos and Santos in an article published in 2021 [71] mentioned that the applications of the Law cover topics as varied and prosaic as the study of the genome, atomic, nuclear and particle physics, astrophysics, quantum correlations, toxic emissions, biophysics, medicine, dynamical systems, distinction of chaos from noise, statistical physics, scientific citations, tax audits, electoral or scientific frauds, gross domestic product, stock markets, inflation data, climate change, world wide web, internet traffic, social networks, textbook exercises, image processing, religious activities, dates of birth, hydrology and geology, fragmentation processes, the first letters of words and of course the number of infections by Covid-19 in the word states. ...
... (1) This same application can be valid for prices, for example, the values (in Reais) of 1,274 invoices from the Securities and Exchange Commission, shown in Graph 1: Authors such as Hill (1995aHill ( , 1995b, Pinkham (1961) and Raimi (1969) have shown that the NB-Law applies to numerical data that are invariant as to scale and random in nature. In this sense, financial flow data have received considerable attention in the literature as amenable to applications using this law. ...
... On the distribution of first significant digits Statistics They showed that the NB-Law applies to data of a numerical nature. Raimi 1969 The peculiar distribution of first significant digits Varian 1972 Benford's Law Social Sciences ...
Article
Full-text available
This article uses the Newcomb-Benford(NB-Law), or Law of Anomalous Numbers, to analyze the values of Electronic Bidding Processes that occurred in the Purchasing Portal of the Brazilian Federal Government and contained in the newly created website “DadosAbertos.gov.br”. In the analysis, all services contracted in the period from 2014 to 2018 were considered. The objective of the research was to analyze the conformity of the Electronic Auction referred to the NB-Law, aiming to verify anomalies, which represent signs of fraud. It can be said that there was a statistically significant anomaly in the analysis of the first digit of the values bid in the Electronic Bidding Tenders. It is also noted that the trading sessions with the first digit of numbers 4, 8 and 9 are those with the largest differences between expected and observed values, strengthening the hypothesis that these represent the trading sessions with the highest incidence / probability of deviations, to be tested in future studies. The study, pioneer in this type of analysis in the Brazilian data source, aims to contribute to the literature focused on the detection of accounting or financial fraud in the public sector. The results collected here can also contribute to the practice of inspection in public management. It is recommended to deepen the studies based on the Economics of Corruption, on issues that involve the detection of fraud, so that corruption becomes unviable or inopportune.
... Crowdfunding is often referred to as Crowdfinancing or Crowdinvesting. The "Crowdfunding" is defined as the process of obtaining needed services, ideas, or content by soliciting monetary contributions from a large group of people, especially from an online community, rather than from employees or suppliers [8]. Nowadays, crowdfunding is often performed via Internet-mediated registries, but the concept can also be executed through mail-order subscriptions, benefit events, and other channels. ...
... Benford's Law is a law on digital frequency, that is, the probability of the number that appears to follow the Benford distribution. In detail, in a large amounts figures, the frequency of 1 at the beginning of the number is not 1/9, but 30.1%, and the frequency of 2-beginning figures is 17.6%, while the lowest frequency is 9 with 4.6% [8]. Benford's law is not generated from a strict mathematical proof, but obtained from many statistics. ...
Chapter
In order to achieve the pre-set funding goal, some entrepreneurs may engage in malicious fraud, that is, using fraudulent textual descriptions to attract monetary contribution from crowd. Thus, fraud is inevitable in the online financial market. Fraudulent texts are not strictly equivalent to low-quality campaigns, but fraudulent content can jeopardize users’ perceptions of project quality. Thus, the fraudulent text has great drawbacks for the development of crowdfunding model, leading investors lose confidence in this newborn financing model. Through text mining, 4 indicators are adopted to measure the linguistic feature related to fraud. And 126,593 campaigns from Kickstarter is employed to estimate the impact of linguistic feature related to fraud on the fundraising outcomes. Multi text levels are selected as the study objects include abstract, detailed description and the reward narratives. The results show that in general, lower linguistic feature related to fraud attracts the investors to contribute more money, the predictive model also validates this conclusion. However, some fraud indicators have no significant negative impacts on financing, or even show positive influences. Moreover, the detailed delivery terms in the reward text, the higher ratio of successful funding. This study provides a guideline for the founders to generate attractive description for the crowdfunding campaigns.
... As already noted, however, and significantly in the current argument, the theorem also explains why a surprisingly diverse collection of information tends to obey Benford's Law. Examples of such data include large accounting tables, stock market figures, tables of physical constants, numbers appearing in newspaper articles, demographic data, numerical computations in computing and aspects of scientific calculations (Raimi, 1969;Ashcraft, 1992;Dehaene and Mehler, 1992;Hill, 1996 and1998;Ley, 1996;Nigrini, 1999). The explanation for conformity with Benford's Law now is well established: the data sets are composed of samples from many different distributions. ...
... Pinkham's (1961) proof was later extended by Hill (1995b). 8 See Raimi (1976) for an early review of the literature and Scott and Fasli (2002) for a more recent literature survey. Three main groups of explanations emerge from these literature surveys. ...
Article
Full-text available
Accounting numbers generally obey a mathematical law called Benford's Law, and this outcome is so unexpected that manipulators of information generally fail to observe the law. Armed with this knowledge, it becomes possible to detect the occurrence of accounting data that are presented fraudulently. However, the law also allows for the possibility of detecting instances where data are presented containing errors. Given this backdrop, this paper uses data drawn from companies listed on the Johannesburg Stock Exchange to test the hypothesis that Benford's Law can be used to identify false or fraudulent reporting of accounting data. The results support the argument that Benford's Law can be used effectively to detect accounting error and fraud. Accordingly, the findings are of particular relevance to auditors, shareholders, financial analysts, investment managers, private investors and other users of publicly reported accounting data, such as the revenue services.
... Numerous other publications [3][4][5][6] have analyzed data sets representing diverse phenomena from ␣-particle decay lifetimes to stock market prices and have observed similar patterns. Two popular and easily accessible articles, and good introductions to the topic, are by Raimi [7] and Hill [8]. The probability distribution log[(n + 1)/n] for the first digit of a set of numbers has come to be called Benford's law in the literature. ...
... Benford [2] first noted that bodies of numerical data display a strong tendency to fall into geometric series. Raimi [3,7] discusses this feature in some detail. Although a geometric series does not represent a function of a continuous variable x, as do those of the previous examples, it is still possible to define the functional dependence as follows. ...
Article
Full-text available
Benford’s law, which gives the probability distribution of first digits of a set of numbers, is examined from underlying distribution functions representing physical phenomena. Data satisfying the power law function y(x) ∝ 1/x retain the same probability distribution of first digits when the data are subject to a scale change in the variable x. Exponential functions are shown to exhibit approximate invariance under scale change. Results are tested and examined using the data from the areas of 4013 lakes and 415 β-decay half-lives.
... Three excellent non-technical expositions of Benford's law include Refs. [3][4][5]. ...
... (25) and (26) (ϵ = 0). 3 Secondly, note that tail distribution function of Eq. (25) is asymptotically harmonic ...
... Raimi furthered the mathematical understanding of BL, using Banach and other scale invariances (Raimi, 1969a), and also its popularity (Raimi, 1969b). Both papers cite only Benford (1938) which received 4 citations in 1969 whereas none cited Newcomb (1881) which received no citation at all in this year (Fig. 2). ...
Preprint
Full-text available
Benford's law is an empirical observation, first reported by Simon Newcomb in 1881 and then independently by Frank Benford in 1938: the first significant digits of numbers in large data are often distributed according to a logarithmically decreasing function. Being contrary to intuition, the law was forgotten as a mere curious observation. However, in the last two decades, relevant literature has grown exponentially, - an evolution typical of "Sleeping Beauties" (SBs) publications that go unnoticed (sleep) for a long time and then suddenly become center of attention (are awakened). Thus, in the present study, we show that Newcomb (1881) and Benford (1938) papers are clearly SBs. The former was in deep sleep for 110 years whereas the latter was in deep sleep for a comparatively lesser period of 31 years up to 1968, and in a state of less deep sleep for another 27 years up to 1995. Both SBs were awakened in the year 1995 by Hill (1995a). In so doing, we show that the waking prince (Hill, 1995a) is more often quoted than the SB whom he kissed, - in this Benford's law case, wondering whether this is a general effect, - to be usefully studied.
... The first research to address the theoretic underpinning of the Log10formula, EQ1, as a reasonable and appropriate surrogate for data generating processes only starts to appear some 25 years after Benford's paper. The ground breaking work is offered by Pinkham (1961), Adhikari & Sarkar (1968), Duncan (1969), and Raimi (1969). However, Hill (1995aHill ( ,b & 1996is usually credited with providing the conclusive theoretical support i for the Why, How, and When questions posed above. ...
Article
A central feature in the montage of executing the internal audit, in a non-forensic context, is a random sample of sufficient size to create the evidence needed to decide whether extended investigative procedures are warranted. This is critical not only in maintaining a best practices profile for the department of internal auditing but also in “partnering” with the external auditors by assuring, in so far as possible, that the work of the internal audit group can be accepted as audit evidence by the external auditors and so conserving scarce organizational resources. In fact, this partnering is one of the long-term goals of the PCAOB for controlling the cost of the external certification audit. However, a contentious issue in sampling is: How should the internal auditor ferret out those accounts that are likely candidates for discovery sampling? This is the point of departure of our paper where we present a simple and validated protocol based upon the digital frequency paradigm introduced by Newcomb & Benford and popularized in the audit context by Nigrini to: identify accounts under audit that seem reasonable candidates for extended procedures discovery sampling testing. The protocol centers around a triage-point in Cartesian space developed from samples of account profiles reported in the literature. We validated the logic of the protocol by using a holdback sample. Finally, we have coded the account identification protocol as a Decision Support System in VBA: ExcelÔ that is available from the authors for free without restriction to its use. JEL Code: M4
... Based on researches in the field of probability theory, Hill [10][11] [12], Pinkham [13] and Raimi [14] demonstrated that NBL data sets contain the following properties: (a) scale and base invariance; (b) come from a choice provided by a variety of different data sources. This result is reached from a rigorous analysis of central limit theory in form of theorems to mantissas of random variables over multiplication effect. ...
Article
Full-text available
This paper aims to introduce a data science approach for guiding auditors to accurately select regions suspected of frauds in welfare programs benefits distribution. The technique relies on Newcomb–Benford’s Law (NBL) for significant digits. It has been analysed Bolsa Familia data from Federal Government Transparency Portal, a tool that aims to increase fiscal transparency of the Brazilian Government through open budget data. The methodology consists in submit four data samples to null hypothesis statistical methods and thereby evaluate the conformity with the law as well as the summation test which looks for excessively large numbers in the dataset. Research results in this paper are that beneficiaries’ cash transfer per se is not a good test variable. Besides, once payment data are grouped by municipalities, they fit NBL, and finally, when submitted to the summation test, the distribution of the Bolsa Familia payments in several municipalities shows some fraud evidence. In this sense, we conclude the NBL can be an appropriate method to fraud investigation of welfare programs’ benefits distribution having beneficiaries’ payment geographically grouped.
... He provided some brilliant interpretations of the phenomenon, which were extensively replicated in the literature. Raimi (1969) is also remembered for criticising some physic scholars for denying that the origin of the law lies in nature, the ones above mentioned, and for bringing Benford's law to be published in a high-ranked journal. The conclusion of the paper simulated the scale invariance assumption and reported many questions that still remained open from the Pinkham (1961) study. ...
Thesis
Purpose – The purpose of this research is to examine whether Italian municipally owned utility entities (MOUs) engage in earnings management around the time of local elections. – Design/methodology/approach – A total of 506 Italian unlisted MOUs were examined between 2009 and 2014. The distribution of total revenues digits was examined using first-two digits testes based on Benford’s law, and 3036 observations comprised the data set. – Findings – The empirical results show that firms tend to engage in CEM during election periods. There is evidence of total revenue manipulation, as well as indications of patterns of revenue rounding linked to political connections. – Originality/value – This is the second study investigating pre-electoral earnings management in MOUs, extending the work of Capalbo et al. (2020). It applies the emerging Benford-based approach of detecting CEM in a new context, namely the Italian jurisdiction. It presents a novel research design to broaden the well-known relationship between the political process beyond the political cost hypothesis and accounting numbers. – Keywords – Forensic accounting, Benford’s law, Fraud investigation, Cosmetic earnings management, Municipal elections, Municipally owned entities. – Paper type – Research dissertation
... Moreover, even after nearly 140 years and a large amount of mental effort by mathematicians, physicists and natural philosophers, we still do not have a convincing answer for why nature is this way. As Raimi wrote in his popular Scientific American essay, 6 "Thus all the explanations given so far seem to lack something of finality … the answer remains obscure." ...
... Pinkham (1961) showed a multiplication by a constant that didn't change distribution of the first digits. Raimi (1969) also examined the distribution of the first digits. In 1972, Varian, an economist, suggested that Benford' Law can be used as a test of random data in a social science context. ...
Article
Full-text available
Benford’s Law, which has a logaritmic base, is a simple but an effective analytical examination tool forresearchers. It makes researchers determine abnormalities in numerical data clusters. Benford’s Law as anumerical analytic test is a mathematical comparison which proves unnatural deviations in data analysis. Forthat reason, it has various application areas such as auditing and finance. Investors may use Benford’s Law forfinding out financial frauds and abuses. This paper aims to test stock market indexes and stock values in IstanbulStock Exchange (BIST) whether they fellow Benford’s Law of Anomalous Numbers or not. All indexes’ closingvalues and stocks’ closing values, stocks’ monthly gains and losses were also examined. In this study, it wasfound that series of monthly gains and losses on the twenty-eight indexes reasonably agree with Benford’s Law.
... Further statistical proof of the law was provided by the work of Hill (1995). Hill (1998) in answering the question why do some dataset obey the law, pointed to the work of Raimi (1969) which showed that combination of many different tables always come closest to the logarithmic distribution. From the foregoing it could be argued that Benford's distribution is an empirically observable phenomenon and most importantly datasets that conform to Benford's law are second generation distribution. ...
Conference Paper
Full-text available
Census survey collects data over a wide range of topics providing a rich source of information and is available at a variety of geographical boundaries which changes over time. These changes present a challenge for usage in multi-temporal analysis operations. Redistribution to desired boundaries offers a solution and therefore effort should be made to quantify or characterise error which could be attributed to these redistribution efforts. The study explored the use of Benford’s law in the characterisation of errors and the effectiveness of a selected redistribution technique. Contrary to the widely reported universality of this law, the result indicates lack of conformity to the expected theoretical distribution for all the sample set tested. From the analysis, it is evident that Benford’s distribution could be employed as a reference point. Thus, offering an alternative for error characterisation of spatial data redistribution techniques especially when dealing with changing spatial extent in census surveys.
... The first research to address the theoretic underpinning of the Log10 formula (EQ1), as a reasonable and appropriate surrogate for un-perturbed data generating processes only starts to appear some 25 years after Benford's paper. In the 1960s, various aspects of the theoretical context of the Newcomb-Benford curiosity are offered by: Pinkham (1961), Adhikari and Sarkar (1968), Duncan (1969), and Raimi (1969). However, Hill (1995Hill ( a, b, 1996 is usually credited with providing the conclusive theoretical support for the Why and Conditional questions posed above. ...
Article
Full-text available
Introduction: Circa 1996 Theodore Hill offered a definitive proof that under certain conditions a data generating process is likely to produce observations that follow the Newcomb-Benford Log10 (N-B) first digit profile. The central feature of Hill's proof is the mixing property from which seems to follow base invariance for scale transformations. Further, it has been observed that small datasets are often not part of the N-B profile set. Study Precise: This suggests that, if indeed the mixing process underlies the generation of the N-B profile, that one should be able to take small Non-Conforming base-invariant datasets that are generated by uncorrupted processes and aggregate them to form datasets that conform to the N-B profile. Results: We demonstrate mixing convergence and find a systematic movement from Non-Conformity to Conformity at a transition point on the order of 250 data points. Impact: We suggest the practical importance of the Hill-Mixing result for the certification audit. We have all of these tests, datasets and results coded in a Decision Support System in VBA: Excel that is available from the authors free without restriction to its use.
... The first research to address the theoretic underpinning of the Log10 formula, EQ1, as a reasonable and appropriate surrogate for data generating processes only starts to appear some 25 years after Benford's paper. The ground breaking work is offered by Pinkham (1961), Adhikari and Sarkar (1968), Duncan (1969), and Raimi (1969). However, Hill (1995aHill ( , b, 1996Hill ( , 1998 is usually credited with providing the conclusive theoretical support for the Why, How, and When questions posed above. ...
... Naturally, a question arose whether Benford's Law can or cannot be proved mathematically. In particular, T. P. Hill (Hill, 1995a;Hill, 1995b;and Hill, 1998), and R. A. Raimi (Raimi, 1969a;Raimi, 1969b;and Raimi, 1976) tried to fi nd such a proof, but no strict mathematical proof was found. 6 If nothing else, their theoretical eff orts led to an approximate formulation of Benford's Law validity: if we take random samples from arbitrary distributions, the collection of these random samples approximately obey the Benford's Law. ...
Article
Full-text available
Benford's Law (sometimes also called Benford's Distribution or Benford's Test) is one of the possible tools for verification of a data structure in a given file regarding the relative frequencies of occurrence of the first (or second, etc.) digit from the left . If it is used as a goodness-of-fit test on sample data, there are usually no problems with its interpretation. However, certain factual questions arise in connection with validity of Benford's Law in large data sets in governmental statistics; such questions should be resolved before the law is used. In this paper we discuss the application potential of Benford's Law when working with extensive data sets in the areas of economic and social statistics.
... Several articles have summarized most of the known datasets that follow BL prediction, including river length, population distribution, atomic weight, x-ray volts, American League baseball statistics, black-body radiation, the mass of exoplanets, postal codes, and death rates [5][6][7][8][9] . Large datasets of variables that span many orders of magnitude are often seen to follow the distribution 3,7,10 . ...
Article
Full-text available
Working with a large temporal dataset spanning several decades often represents a challenging task, especially when the record is heterogeneous and incomplete. The use of statistical laws could potentially overcome these problems. Here we apply Benford's Law (also called the "First-Digit Law") to the traveled distances of tropical cyclones since 1842. The record of tropical cyclones has been extensively impacted by improvements in detection capabilities over the past decades. We have found that, while the first-digit distribution for the entire record follows Benford's Law prediction, specific changes such as satellite detection have had serious impacts on the dataset. The least-square misfit measure is used as a proxy to observe temporal variations, allowing us to assess data quality and homogeneity over the entire record, and at the same time over specific periods. Such information is crucial when running climatic models and Benford's Law could potentially be used to overcome and correct for data heterogeneity and/or to select the most appropriate part of the record for detailed studies.
... The first research to address the theoretic underpinning of the Log10 formula (EQ1), as a reasonable and appropriate surrogate for un-perturbed data generating processes only starts to appear some 25 years after Benford's paper. In the 1960s, various aspects of the theoretical context of the Newcomb-Benford curiosity are offered by: Pinkham (1961), Adhikari and Sarkar (1968), Duncan (1969), and Raimi (1969). However, Hill (1995Hill ( a, b, 1996 is usually credited with providing the conclusive theoretical support for the Why and Conditional questions posed above. ...
Article
Full-text available
Introduction: Circa 1996 Theodore Hill offered a definitive proof that under certain conditions a data generating process is likely to produce observations that follow the Newcomb-Benford Log10 (N-B) first digit profile. The central feature of Hill’s proof is the mixing property from which seems to follow base invariance for scale transformations. Further, it has been observed that small datasets are often not part of the N-B profile set. Study Precise: This suggests that, if indeed the mixing process underlies the generation of the N-B profile, that one should be able to take small Non-Conforming base-invariant datasets that are generated by uncorrupted processes and aggregate them to form datasets that conform to the N-B profile. Results: We demonstrate mixing convergence and find a systematic movement from Non-Conformity to Conformity at a transition point on the order of 250 data points. Impact: We suggest the practical importance of the Hill-Mixing result for the certification audit. We have all of these tests, datasets and results coded in a Decision Support System in VBA: ExcelÔ that is available from the authors free without restriction to its use.
... The first research to address the theoretic underpinning of the Log 10 formula, EQ1, as a reasonable and appropriate surrogate for data generating processes only starts to appear some 25 years after Benford's paper. The first early ground breaking work is offered by Pinkham (1961), Adhikari and Sarkar (1968), , and Raimi (1969). However, Hill (1995aHill ( ,b & 1996 is usually credited with providing the conclusive theoretical support (Note 1) for the Why, How, and When questions posed above. ...
Article
Full-text available
The basis of the certification audit, in a non-forensic context, is a random sample of sufficient size to create the evidence needed to justify the audit opinion. This has been clearly stated in all of the relevant SAS pronouncements issued by the AICPA and also by the PCAOB through AS 5. However, there has been a dearth of specifics on the critical selection of the set of accounts that are reasonable targets for the statistical sampling needed if extended audit procedures seem warranted. In this paper, we present a simple and validated protocol based upon the digital frequency testing introduced by Newcomb & Benford and popularized in the audit context by Nigrini to identify accounts under audit that seem reasonable candidates for extended procedures testing. The montage of the protocol centers around the parametric test of proportions the equations which were introduced by Nigrini. We validated the logic of the protocol by using a holdback sample. Finally, we have coded the account identification protocol as a Decision Support System in VBA: ExcelÔ that is available from the authors free without restriction to its use.
... Pesquisas no campo da Teoria das Probabilidades Hill (1995Hill ( ,1996, Pinkham (1961) e Raimi (1969) mostram que a NB-Lei se aplica ao conjunto de dados que tem as seguintes propriedades: (a) é escalar invariante; (b) advém de uma escolha a partir de uma variedade de diferentes fontes. Esse resultado é obtido de uma análise mais rigorosa da Teoria do Limite Central na forma de teoremas para a mantissa de variáveis randômicas sobre o efeito da multiplicação. ...
... Pesquisas no campo da Teoria das Probabilidades, Hill (1995Hill ( , 1996, Pinkham (1961)e Raimi(1969), mostram que a NB-Lei se aplica ao conjunto de dados que tem as seguintes propriedades: (a) é escalar invariante; (b) advém de uma escolha a partir de uma variedade de diferentes fontes. Esse resultado é obtido de uma análise mais rigorosa da Teoria do Limite Central na forma de teoremas para a mantissa de variáveis randômicas sobre o efeito da multiplicação. ...
Article
Full-text available
A questão investigada no presente artigo estrutura-se da seguinte forma: Existem desvios significativos na distribuição do primeiro e segundo dígitos dos gastos públicos estaduais em relação ao comportamento previsto pela Lei de Newcomb-Benford? O objetivo do artigo consiste em detectar a ocorrência de desvios significativos na distribuição do primeiro e segundo dígitos dos gastos públicos estaduais em relação à distribuição-padrão definida na Lei de Newcomb-Benford - NB-Lei. Este artigo desenvolve uma metodologia interdisciplinar e exploratória para analisar 134.281 notas de empenhos emitidas por 20 Unidades Gestoras de dois estados. Foi aplicado, para a análise dos dados, um modelo contabilométrico fundamentado em testes de hipóteses, avaliando-se a conformidade entre a distribuição observada e a prevista na NB-Lei. Como resultado da pesquisa, constatou-se a existência de desvios significativos na distribuição dos algarismos: 7 e 8, excesso de ocorrências; e 9 e 6, escassez de ocorrências em relação à proporção esperada pela NB-Lei para o primeiro dígito. Tal comportamento denota uma tendência de fuga à realização dos processos licitatórios. A análise do segundo dígito, de forma inédita para o caso brasileiro, mostrou um excesso significativo de ocorrências para os algarismos 0 e 5, indicando a utilização de arredondamentos na determinação dos valores dos empenhos. Mostra-se a viabilidade, utilidade e praticidade na aplicação da NB-Lei à ação dos órgãos de controle externo, sobretudo quanto ao planejamento da auditoria e determinação da amostra auditada.
Preprint
A simple method to derive parametric analytical extensions of Benford's law for first digits of numerical data is proposed. Two generalized Benford distributions are considered, namely the two-sided power Benford distribution and the new Pareto Benford distribution. The fitting capabilities of these generalized Benford distributions are illustrated and compared at some interesting and important integer sequences.
Conference Paper
Full-text available
Los datos que se utilizan en las investigaciones en Ciencias Sociales son en la mayoría de las ocasiones de réplica imposible. En una encuesta remitida por el "British Medical Journal" a más de 2.700 investigadores y a 9.036 clínicos y académicos que han enviado artículos a BMJ, o han servido como pares para la revisión de artículos, se puso de manifiesto (con una tasa de respuesta del 31%) que el 13% de esos investigadores que habían respondido, admitió tener conocimiento de colegas que "de manera inapropiada ajustaron, excluyeron, alteraron, o fabricaron datos", con el propósito de lograr la publicación de sus estudios. Una forma de mitigar el problema de su validez externa es recurrir a meta-análisis, pero hay otras. En este documento planteamos una estrategia de validación previa de los datos utilizados en cualquier investigación cuantitativa. Se muestra su eficacia en análisis de datos realizados en otros ámbitos en los que el control es posible y se muestra la forma de aplicarlo en un caso de anorexia. CONCEPTOS DE MENTIRA Si al revisar los datos de una investigación encontrásemos que la estatura de un niño de diez años es 2 metros, sospecharíamos que existe un error. Conocemos o intuimos la distribución de la variable y sin gran esfuerzo podemos detectar que el valor anunciado es extremo. Pero en muchos casos no tenemos noticia de la variable o la investigación propone como ciertos datos asombrosos sobre algo a lo que no tenemos acceso como los daños inesperados causados por las muy excesivas horas de presencia en Internet de personas con un determinado trastorno en Finlandia ¿Cómo podríamos saber si se han inventado? Diferenciamos tres conceptos. a) Mentira. Decir lo falso con pretensión de engañar. b) Bullshit (sandeces), que describe el contenido de proposiciones que intentan convencer sin pretender decir la verdad, como sucede en publicidad o en política. c) Humbuging (paparruchas) paparruchas, que tiene asociado el concepto de fraude intencionado, cercano a nuestro timo. La diferencia entre los tres modos de actuar está en la intención. En el primer caso se quiere convencer de lo falso, en el segundo convencer de lo que se cree verdadero y en el tercero obtener un beneficio con información falsa o incompleta.
Article
In this study, we deploy various methods for analyses of digits and provide rigorous empirical evidence that most banks’ off-balance sheet items have partial conformity to Benford’s law in their first leading (significant) digits. The accounting records also show scarce compliance with Benford’s law in their second leading digits. Most of these banking activities emerge with values that are downward manipulated at several percentages and excessive rounding, with disproportionate usage of 0 and 5 in their last three digits, regardless of whether the items are traded on designated exchanges, handled only Over-the-Counter, or represent business relationships between commercial banks and customers. Overall, we expose here widespread though modest artificial deflation in the recorded values of banks’ OBS items and a unique phenomenon of significant overuse of the numbers 0 and 5 in different digits, with strong violations of Benford’s law. We further notice that, for the majority of banks’ off-balance sheet items, key regulatory developments, such as the three Basel Accords, present meaningful and continuous impact on the overall reduction in the anomalous appearances of the numbers 5 in the first and 0 and 5 in the second leading digits. At present, however, we still observe irregular spreads (well above the norm) of the numbers 0 and 5 in the second leading digits.
Article
For ordinary‐chondrite (OC) mass distributions, Benford’s law applies to the set of individual objects that survive intact on the Earth’s surface after atmospheric disruption of meteoroids. Among OCs, Antarctic finds conform more closely to Benford’s law than observed falls, Northwest Africa (NWA) finds, or Oman finds mainly because Antarctic OCs tend to be relatively unweathered (and mostly intact) and have not been aggregated as pairs under collective meteorite names. Deviations from Benford’s law can result from tampering with data sets. The set of OC falls reflects tampering with the original Benford distribution (produced by meteoroid disruption) by the deliberate aggregation of paired individual samples and inefficiencies in the collection of small samples. The sets of NWA and Oman OC finds have been affected by natural “tampering” of the original distributions principally by terrestrial weathering, which can cause sample disintegration. NWA finds were also affected by non‐systematic collection of samples influenced by commercial considerations; collectors preferred type‐3 OC as revealed by the high proportions of such specimens among NWA chondrites relative to those among falls and Oman and Antarctic finds. The percentage of type‐4 OC among falls is appreciably lower than in the sets of finds. This suggests that type‐4 chondrites are friable and disintegrate into numerous pieces; these are counted individually for the sets of finds, but collectively for falls. However, the fact that the percentages of type‐3 OC are not generally higher for finds may be that these samples tend to break into small pieces that are preferentially lost.
Article
Full-text available
This study demonstrates, through a case study, the applicability of the Newcomb-Benford Law, in the process of controlling the balance sheets of organizations. The applicability was demonstrated by means of a graphical analysis of the frequencies of the observed data, in comparison with the pattern of the Newcomb-Benford Law and verification of the veracity of the results based on the accounting model, in the document analysis based on the publications of the economic results and on the IR (Investor Relations) website from the first quarter of 2008 to the first quarter of 2020 compared to another study carried out from 2008 to 2015, the period covered by the investigation into the lava jet operation, by Petrobras from Brasil, a state-owned mixed economy, which operates in the areas of exploration, production, refining, marketing and transportation of oil, natural gas and its derivatives. It was verified again that the analysis of the balances of the balance sheets, guarantees a more in-depth monitoring in the control of the economic and financial results of the organization, thus allowing its managers to detect possible distortions in the results over the time to be analyzed, once that the relevance of the work was proven through the distortions currently found in smaller quantities than previously for the second, third and fourth digits, indicating a need to monitor and control the economic and financial results so that it is possible for managers to obtain greater rigor in the inspection, monitoring and transparency in
Article
We examine whether audit quality inputs are related to the conformity of financial statements to Benford’s law. We find that overall financial statement conformity increases with audit fees, nonaudit fees, and audit report lag, and decreases with audit firm tenure. We also find that these audit quality inputs are more strongly associated with income statement conformity than with cash flow statement conformity. Our findings document the role that auditing plays in enhancing the conformity of financial statements to Benford’s law.
Article
Benford's and Zipf's Laws have been investigated in large data sets of pore properties of 206 lab-made porous solids including spinels, aluminas, silicas, aluminophosphates, aluminometalates, MCM and SBA and materials. Such properties, like the mean pore diameters, the mean pore lengths and the mean pore anisotropies, were obtained by combining the specific surface areas and the specific pore volumes of the solids. All those parameters exhibit a distribution around a central value and their first digits obey, more or less, Benford's Law. Compliance with the Law depends on the spread of each distribution and improves exponentially with its standard deviation. The above data-sets do not follow Zipf’ Law because they refer to totally independent systems without any internal coherence property linked to their evolution. On the contrary Zipf's law holds over many orders of magnitude for the differential pore numbers and the differential pore radii of 324 experimental points estimated for lab made spinels and MCM silica, volcanic and magmatic porous rocks and a typical soil. The underlying reason is that those data sets exhibit an internal property of coherency which is the mechanism of pore development. This Zipfian behavior leads to impressive Benford distributions of the first digits of the above two differential quantities. It is assumed that extended Zipfian distributions lead to good Benford's Laws, but the reverse is not true since the phenomenon of Benfordness can also be observed from other distributions as far as they are sufficiently populated and extensively spread.
Article
Full-text available
This paper used the Newcomb-Benford Law (NB Law) to analyze 210,899 contracts issued by sixty management units in two states in the Brazilian Northeast in 2010. In this article we seek to address the following question: What proposal emerges from the need to identify financial deviations over time in terms of NB Law compliance in continuous auditing scenarios? To this end, the goal of the paper is to analyze this compliance with the aim of identifying deviations over time. The analysis focuses on first significant digit distribution. Graphical analysis of observed frequencies and time series of relative discrepancies reveals the formation of typical patterns of divergences from Public Tenders Law (Federal Law n. 8.666/93). From the results obtained, we conclude that time series analysis of NB Law compliance can improve the accuracy of sampling procedures in continuous auditing.
Article
Full-text available
Benford’s law is an empirical observation, first reported by Simon Newcomb in 1881 and then independently by Frank Benford in 1938: the first significant digits of numbers in large data are often distributed according to a logarithmically decreasing function. Being contrary to intuition, the law was forgotten as a mere curious observation. However, in the last two decades, relevant literature has grown exponentially, - an evolution typical of ”Sleeping Beauties” (SBs) publications that go unnoticed (sleep) for a long time and then suddenly become center of attention (are awakened). Thus, in the present study, we show that Newcomb (1881) and Benford (1938) papers are clearly SBs. The former was in deep sleep for 110 years whereas the latter was in deep sleep for a comparatively lesser period of 31 years up to 1968, and in a state of less deep sleep for another 27 years up to 1995. Both SBs were awakened in the year 1995 by Hill (1995a). In so doing, we show that the waking prince (Hill, 1995a) is more often quoted than the SB whom he kissed, - in this Benford’s law case, wondering whether this is a general effect, - to be usefully studied.
Article
A derivation of Benford's Law or the First-Digit Phenomenon is given assuming only base-invariance of the underlying law. The only baseinvariant distributions are shown to be convex combinations of two extremal probabilities, one corresponding to point mass and the other a log-Lebesgue measure. The main tools in the proof are identification of an appropriate mantissa cr-algebra on the positive reals, and results for invariant measures on the circle.
Article
The Prime Numbers are well-known for their paradoxical stand regarding Benford's Law. On one hand they adamantly refuse to obey the law of Benford in the usual sense, namely that of a normal density of the proportion of primes with d as the leading digit, yet on the other hand, the Dirichlet density for the subset of all primes with d as the leading digit is indeed LOG(1 + 1/d). In this article the superficiality of the Dirichlet density result is demonstrated and explained in terms of other well-known and established results in the discipline of Benford's Law, conceptually concluding that prime numbers cannot be considered Benford at all, in spite of the Dirichlet density result. In addition, a detailed examination of the digital behavior of prime numbers is outlined, showing a distinct digital development pattern, from a slight preference for low digits at the start for small primes, to a complete digital equality for large primes in the limit as the prime number sequence goes to infinity. Finally an exact analytical expression for the density of the logarithms of primes is derived and shown to be always on the rise at the macro level, an observation that is also confirmed empirically.
Article
In a world where technology has made written communication rapid, languages are being shaped more and more by the requirements of the new cyberspace medium. One of the most conspicuous of these is the proclivity towards efficiency and economy, as evident in the constant production of compressed forms (abbreviations of words and phrases, acronyms, etc.) in the written language of chatrooms and of other such virtual linguistic communities. Is this a new linguistic phenomenon responding to new technologies? Or, is it a contemporary manifestation of an inbuilt "principle of least effort" in communication systems? And is it spreading to the language generally? This paper will look at this question as it concerns the Italian language today, assessing its implications in the light of the history of the language through the ages.
Article
Benford distributions of leading digits arise in a multitude of everyday settings, yet the establishment of the distribution’s genesis to date requires unnecessarily restrictive postulates of scale or base invariance. We derive Benford’s law from the point of view of the limit of Pareto probability densities. We also provide empirical explanations of the Benford distribution’s pervasiveness in practice.
Article
Benford’s Law or The First Digit Law as it is commonly known has been a fascination to many generations. This counter-intuitive law proposes that given a sequence of numbers (usually from a data set like length of rivers, height of mountains, populations of nations or any source of data from real life), the first digit is ‘1’ roughtly 30% of the time. Many mathematical sequences, such as Fibonacci sequences also follow Benford’s Law. Benford’s Law has some interesting applications, especially in fraud detection!
Article
Integrity of elections relies on fair procedures at different stages of the election process, and fraud can occur in many instances and different forms. This paper provides a general approach for the detection of fraud. While most existing contributions focus on a single instance and form of fraud, we propose a more encompassing approach, testing for several empirical implications of different possible forms of fraud.To illustrate this approach we rely on a case of electoral irregularities in one of the oldest democracies: In a Swiss referendum in 2011, one in twelve municipalities irregularly destroyed the ballots, rendering a recount impossible. We do not know whether this happened due to sloppiness, or to cover possible fraudulent actions. However, one of our statistical tests leads to results, which points to irregularities in some of the municipalities, which lost their ballots: they reported significantly fewer empty ballots than the other municipalities. Relying on several tests leads to the well known multiple comparisons problem. We show two strategies and illustrate strengths and weaknesses of each potential way to deal with multiple tests.
ResearchGate has not been able to resolve any references for this publication.