Content uploaded by Kurt S. Schulzke
Author content
All content in this area was uploaded by Kurt S. Schulzke on Aug 05, 2016
Content may be subject to copyright.
1
Moderating “Cry Wolf” Events with Excess MAD
in Benford’s Law Research and Practice
Bradley J. Barney
Visiting Assistant Professor
Department of Statistics
223 TMCB
Brigham Young University
Provo, UT 84602
Phone 8014224505
barney@stat.byu.edu
Kurt S. Schulzke (corresponding author)
Associate Professor of Accounting & Business Law
Kennesaw State University
560 Parliament Garden Way NW
Kennesaw, GA 30144, Mail Stop 0402
Phone 4705786379
Kurt_Schulzke@kennesaw.edu
2
Abstract
False positives or “Type I errors,” wherein test results indicate fraud where none actually
exists, have been described as a costly “cry wolf problem” in auditing practice. Benford’s Law,
used in screening for financial statement manipulation, is especially prone to false positives
when applied to small and moderately sized data sets. Relying in part on Monte Carlo
simulations, we describe with greater precision than extant literature the mathematical
correlation between N and mean absolute deviation (MAD), a statistic increasingly used for
assessing deviation from Benford’s Law. We recommend replacing MAD with an alternative,
Excess MAD, that explicitly adjusts for N in estimating deviation from Benford's Law. Applying
nonparametric, generalized additive modelling to public company financial statement numbers,
we demonstrate the differing outcomes expected from Excess MAD and MAD and produce
evidence suggesting that, despite Sarbanes Oxley and Dodd Frank legislation, Benford’s Law
conformity of public company financial statement numbers remained relatively stable across four
decades beginning in 1970.
3
INTRODUCTION
False positives (a.k.a. “Type I errors”) have been described as a costly “cry wolf
problem” in fraud auditing (McKee 2010). While most practitioners and researchers intuit that
false positives should be avoided as much as possible, evidence suggests that many lack a solid
understanding of the probabilistic linkage between test positives and negatives and audit
efficiency and effectiveness. Factors exacerbating the cry wolf problem include (a) the post
Enron proliferation of “fraud risk factors” in professional auditing standards, mostly without
probabilistic guidance (PCAOB AS 2401; McKee 2010), (b) auditor focus on costly false
negatives in derogation of the costs of concomitant false positives (see, e.g., Wilks and
Zimbelman 2004), and (c) disregard or misapplication of Bayes’ Rule, which mediates the cost
and utility of Benford’s Law as a fraud diagnostic. This paper proposes a new statistic, Excess
MAD,1 which facilitates better calibration of Benford’s Law false positives through Bayes’ Rule.
Bayes’ Rule can be used to calculate (), the conditional probability that
manipulation has occurred given the observation of a particular risk factor or set of factors
(Durtschi, Hillison, and Pacini 2004; McKee 2010). Where (), (), and () are,
respectively, the unconditional probabilities of the state (“S”) of fraud, the observation of a
diagnostic (“D”) signal or risk factor, and Scomplement (“Sc”) or nonoccurrence of S, ()
is given as in Equation 1.
()=()
()() (1)
1 MAD stands for “mean absolute deviation,” a statistic often used in Benford’s Law
research because of its advantages in the Benford’s Law context over traditional NHST statistics
(Nigrini 2012; Gorard 2013). MAD is explained in detail below.
4
The probability, (), may also be termed the base rate or prior probability of
manipulation (which might be fraudulent or not), meaning the probability of manipulation prior
to observing a diagnostic signal or evidence, in Bayesian parlance. Meanwhile, (), (),
and () are, respectively, the conditional probabilities of (a) manipulation given
observation of the diagnostic signal, (b) observation of the diagnostic given the state of
manipulation, and (c) observation of the diagnostic given no manipulation (a.k.a. “false
positive”) (Durtschi et al. 2004, 28; Kruschke 2015). If () is relatively low,2 then () is
more effectively increased by reducing () than by increasing (), meaning that a
premium is placed on reducing false positives.
Bayes’ Rule and the interaction among its basic parameters are often ignored or
misinterpreted in auditing practice and research. In practice, Bayesian errors can be extremely
costly both outofpocket, where entrylevel bigfirm staff are billed out at $300 or more per
hour, and in terms of litigation and reputation risk.
As a diagnostic for fraud and other accounting anomalies, Benford’s Law fits squarely
within the Bayesian analytical framework. This study seeks to reduce () (the probability of
false positives) by better calibrating the assessment of deviation from Benford’s Law thereby
increasing the reliability and reducing the costs associated with using Benford’s Law as a fraud
screen. However, as the Benford’s Law () falls, Bayesian inference allows fraud
2 Financial statement manipulation typically comes to light only through regulatory
enforcement actions or investigative reporting, leaving latent the overall base rate. However,
recent research suggests that between 20 and 30 percent of U.S. GAAP financial statements may
be afflicted by withinGAAP manipulation alone (Dichev et al. 2013, 24). Others have estimated
the fraud (as opposed to withinGAAP manipulation) base rate at between 0.3 percent (McKee
2010, 6162) and 38 percent (Bishop 2001, 13).
5
prevention and detection resources to be progressively reallocated to investigative targets that
harbor higher probabilities of fraud.3
In the audit context, false positives might arise through the unmoderated use of Benford’s
Law output automatically generated by software like IDEA®, a leading audit analytics package.
Output for a sample table of 1,166 bank transactions included with IDEA® appears in Figure 1.
[Insert Fig. 1]
The Figure 1 inset shows a cutaway of individual transactions flagged by the software as
“highly suspicious” and presumably requiring further investigation. Why is this problematic? It
is because this audit tool is a blunt instrument likely to proliferate costly false positives: It
automatically classifies as “highly suspicious” all transactions comprising every 2digit bar that
exceeds by an arbitrary amount the Benford’s Law frequency for that 2digit pair, no matter what
the characteristics of the individual transactions, even if the data conforms overall to Benford’s
Law.4
In an abstract case, reducing the probability of false positives may—though not
necessarily will—come at the cost of increasing the probability of false negatives. However, an
arbitrarily high probability of false positives argues for some movement toward more false
negatives. Knowing how far to move requires a basic understanding of Bayes’ Rule inputs and
the relative costs of false positives and negatives.
3 Bayesian inference iteratively reallocates probability density (belief) toward outcomes
that remain possible in light of additional data. This sensibility is famously captured by the
fictional Sherlock Holmes who said that “[W]hen you have eliminated the impossible, whatever
remains, however improbable, must be the truth.” (Kruschke 2015, 16).
4 The tool becomes somewhat more refined for N > 10,000, above which the user may
invoke “fuzzy logic” cluster analysis to reduce the suspicious transaction count.
6
With this background, we investigated the following four questions unanswered by extant
literature:
RQ 1: What is the precise mathematical relationship between the MAD from Benford’s
Law frequencies and N?
RQ 2: Does the answer to RQ 1 suggest any alternative to MAD as a measure of
deviation from Benford’s Law?
RQ 3: Should practitioners and researchers expect Benford’s Law outcomes from a MAD
alternative to differ from MAD outcomes?
RQ 4: Did the Benford’s Law conformity of public company financial statements
noticeably improve in 1970to2013 interval as measured by Excess MAD, as might be expected
after implementation of the Sarbanes Oxley Act (U.S. Congress 2002) (hereafter “SOX”) and the
Dodd Frank Act (U.S. Congress 2010) (hereafter “DFA”)?
Investigating RQ 1, we rely in part on Monte Carlo simulations to explore the MADN
relationship in sampled Benford distributions (hereafter “Benford sets”), seeking to reduce the
ambiguity prevailing in related professional and academic literature.5 Responding to RQ 2, we
propose a new MADbased statistic, Excess MAD, as an alternative estimator of deviation from
Benford's Law that explicitly adjusts for N and thereby facilitates Benford’s tests of data sets of
nearly any size, thereby avoiding some of the disadvantages of unalloyed MAD and common
null hypothesis significance test (“NHST”) statistics. Finally, we investigate RQ 3 and RQ 4
5 For evidence of such ambiguity in the practice realm, we again look to IDEA® 10, the
help menu of which claims that the firsttwo digits “test” under Benford’s Law is “generally used
in databases with fewer than 10,000 records” (IDEA® 10, Help). First, it is unclear whether the
“test” referenced here is conducted using MAD or an NHST statistic. If MAD, the opposite
would be more accurate: the firsttwo digits test can be used in databases with more than 10,000
records and should generally not be used in databases with fewer than 3,000 records except as
described in more detail below. If NHST, then N < 10,000 would be appropriate.
7
using nonparametric generalized additive modelling (hereafter “GAM”) applied to four decades
of actual public company financial statement numbers.
The remainder of the article is organized as follows. Part II engages literature relevant to
research questions and methodologies. Part III elaborates methods and results for RQs 1 and 2
(which did not utilize data) as well as the methods and data used for RQs 3 and 4. Part IV
outlines results for RQs 3 and 4. Part V discusses results for RQs 3 and 4, while Part VI offers
concluding remarks.
LITERATURE REVIEW
Benford’s Law prescribes expected frequencies for leading digits in naturally occurring
collections of numbers (Fewster 2009). Accountingspecific research on Benford’s Law began
nearly thirty years ago with Carslaw (1988), who used deviations from Benford’s Law
frequencies to examine the psychological motivations underlying the tendency of managers to
manipulate the earnings numbers of New Zealandbased companies. Carslaw found a bias in
favor of numbers just above key cognitive reference points (27). The extended chronicle of
Benford’s Law research is available in multiple sources (e.g., Nigrini 2011, 8997; Amiram,
Bozanic and Rouen 2015).
The tendency of unaltered financial statement numbers to conform to Benford’s Law is
wellestablished, and nonconformity has been identified as a fraud risk factor (Durtschi et al.
2004; Nigrini 2012; Amiram et al. 2015).6 In assessing conformity, Nigrini (2012) indicates that
the first and second digits tests are highly aggregated and, therefore, of limited analytical
6 On the other hand, Benford’s Law tests—like other diagnostic tests—can also result in
false negatives, where fraud exists but is not detected (Nigrini 2012, 213).
8
usefulness except in datasets too small to allow for testing of the first two digits in combination
(75, 78, 87). Further, the firsttwo digits test embodies all of the information offered by the first
and second digits tests while providing higher resolution (more detail) and better focus (Nigrini
2012). Nevertheless, recent research has tested first digits alone (Amiram et al. 2015), second
digits alone (Kinnunen and Koskela 2003), first digits and firsttwo digits (Alali and Romero
2013), or digits two through six (Guan et al. 2008).
Suffice to say, a stream of accounting and auditing research supports Benford’s Law as a
diagnostic tool useful in detecting some anomalies in financial statement data (Durtschi et al.
2004; Nigrini and Miller 2009; Nigrini 2011; Nigrini 2012; Amiram et al. 2015).7 Yet, despite
the evident usefulness of Benford’s Law, confusion persists among researchers and practitioners
over the relationship between sample size (hereafter “N”) and the significance of observed
deviations from Benford’s Law.
One source of confusion involves the interpretation of Bayes’ Rule through the mediation
of which Benford’s Law readings can impact the costs and probabilistic success of audits and
forensic investigations. For example, Durtschi et al. (2004) correctly calculated ()= .085
but misinterpreted Bayes’ Rule by incorrectly describing .085 as “the probability of finding
fraud” and as a “9 percent chance of discovery” (29). In fact, in context, the .085 is the
conditional probability of fraud (not of discovering the fraud) given the presence of the
Benford’s Law diagnostic signal. Using the Durtschi et al. parameters, the correct conditional
7 Benford’s Law stands apart from indirect earnings management (hereafter EM)
diagnostics; unlike discretionary accruals, interview, or “real” EM methods (e.g., Dichev,
Graham and Rajgopal (2013); Cohen, Dey and Lys 2008), Benford’s Law operates as a direct,
highaltitude screen for numeric manipulation.
9
probability of finding fraud (i.e., receiving the diagnostic signal) given the existence of fraud is
0.75, roughly 66 percentage points higher than .085.
A second source of confusion relates to the fact that Benford sets come into measureable
existence only in sufficiently large data sets (Nigrini 2012). Because of the large Ns required to
achieve reliable Benford’s Law readings (Nigrini 2012; Amiram et al. 2015), Benford’s Law
tests on data sets of smalltomoderate size are prone to false positive signals. The precise
meaning of “large” and the interaction of large Ns with statistical results in Benford’s Law tests
merits closer scrutiny.
The large Ns required by Benford’s Law tests can lead to excess statistical power in
NHSTbased studies. Because Benford’s Law is only a model and not an exact description of the
truth, it is possible (especially with large Ns) for subtle but practically insignificant deviations
from the rigors of Benford’s Law to be flagged by NHST statistics (Hochster 2008; Gelman and
Weakliem 2009; Nigrini 2012, 151; Seaman, Seaman and Allen 2015). As a consequence,
traditional NHST statistics—including z, Chisquare, and KS statistics—are all essentially
unusable for Ns beyond a certain threshold (Geyer and Williamson 2004, 234; Nigrini and Miller
2009, 310; Nigrini 2012, 150158).8
For example, in Chisquare tests, Nigrini (2012) states that excess power “starts being
noticeable” for N > 5,000 (154). But this prescription assumes knowledge of effect size (what
auditors might call quantitative materiality)—in terms of some measure of deviation from
Benford’s Law—that Chisquare is designed to detect. However, because consensus is absent on
the appropriate effect size for Benford’s tests, the N that triggers excess power remains elusive.
8 A larger data set will produce higher Chisquare statistics than a smaller data set, ceteris
paribus (Nigrini and Miller 2009, 313) except where the null hypothesis is really true, in which
case Chisquare’s expected value does not change with N.
10
Given the necessity of large Ns for the achievement of a Benford set, the most workable
alternative to NHST tests appears to be MAD (Nigrini 2012, 158160) which is mathematically
expressed for the first two digits as follows:
=

(2)
MAD has certain notable advantages over NHST statistics but suffers from some disadvantages.
Both are discussed in more detail in Part III.
MAD scores are generally understood to grow as N falls below thresholds vaguely
pegged between 10,000 and 1,000 observations for testing the first two digits (Johnson and
Weggenmann 2013, 37; Nigrini, 2011). Nigrini has also suggested a “general rule” that at least
1,000 observations are required for “good conformity” to Benford’s Law but that 3,000 should
provide a “good” fit and that one should not attempt firsttwo digits tests on less than 300 records
(Nigrini 2012, 20). Thus, current literature presents a mostly confusing picture of how big or
small N must be for reliable Benford’s Law tests.
With this background, Johnson and Ireland (2007) used Z statistics, Chisquare statistics,
and MAD (though without predetermined critical values9) to test, individually and jointly, the
first, second, and third digits of 22 income statement accounts of Compustatlisted firms for
years 1998 to 2003. They found that most accounts diverged significantly from Benford’s Law,
especially rental income (N = 4,294) and loss provisions (N = 4,756). In addition, they found
evidence that upward revenue manipulation appears more frequently than downward expense
manipulation, and that manipulation of digits to the left (e.g., first digit) appears less frequently
than manipulation of digits to the right (e.g., third digit).
9 Critical values for MAD are discussed in Part III.
11
While Johnson and Ireland (2007) are unusual in disclosing the observed MAD values
and associated Ns, they do not mention the correlation between the MADs and Ns. Our
regression analysis of their Tables 35 found correlations between 0.68 and 0.76 (for all Ns)
and between 0.64 and 0.78 (for Ns > 5000). To illustrate, Figure 2 plots the regression of Table
3 MADs against Ns > 5000, yielding rsquared of 0.6083 or correlation of 0.78. This correlation
finding suggests that the study’s “significant” results may be driven more by low Ns than by
actual manipulation of financial statement amounts.
[Insert Figure 2]
Jordan et al. (2008) used Z statistics to test the second digits of 2006 earnings numbers of
1,002 U.S. companies finding that, despite SOX, managers continued to manipulate the positive
(but not negative) earnings of small (but not large) companies with low (but not high) debt
leverage in order to meet cognitive income reference points (109110). Because this study did
not attempt to test the first two digits together but tested only second digits, N = 1,002 should be
sufficient to achieve Benford set status.
Guan, Lin and Fang (2008) used Z statistics to test the second through sixth digits of
positive and negative Taiwanese earnings numbers from the 19812005 interval, finding
“pervasive evidence” of rounding to achieve psychological reference points (28). While the
study discloses all Ns (ranging from 16,073 to 1,250) and related Z statistics, the lack of general
agreement on practical or noteworthy effect size makes it difficult to determine whether any of
the tests is overpowered.
Tilden and Janes (2012) used Z statistics to assess the conformity of the first digits of net
sales, net income, inventory, and allowance for doubtful accounts of all listed U.S. companies in
eight separate recessionary periods between 1950 and 2001 (4, 11), finding evidence that net
12
income and doubtful accounts were manipulated, with weaker evidence of sales and inventory
manipulation (5). While the paper does not directly disclose Ns, they can be inferred. Some are
large, as in Table 3, where N ≈ 25,440, and Table 4, where N ≈ 23,060 for the test period.
Whether these Ns resulted in excess power (and, therefore, questionable results) is difficult to
say because effect size is not identified. For example, first digit 1s should have a relative
frequency of 30.1% (Nigrini 2012, 3). What if the observed frequency is 30.4%? Or 32.3%?
Without specifying where to draw the practical line on the materiality of Benford’s deviations, it
is impossible to say if a given N provides too much power.
Wilson (2012) used Chisquare and Z statistics to assess 2009 net sales and earnings
numbers of 5,989 public U.S. companies, first partitioning them into positive and negative
earnings groups. The study found insufficient evidence of nonconformity but encouraged
subsequent research spanning several years (63). Because the results were not significant, the
excess power question is moot.
Johnson and Weggenmann (2013) used a “refined MAD” algorithm in an effort to
mitigate the tendency of Benford’s Law to generate false positives for small data sets, testing 450
observations from state government financial data. We attempted to replicate the algorithm in
order to assess its effectiveness but were unable to do so exactly. Because the paper offered no
evidence or argument that the refined MAD algorithm is mathematically, statistically, or
practically superior to unrefined MAD, we did not pursue it further.
Alali & Romero (2013) tested 24,453 public company firm years spread over six periods
between 2001 and 2010. The study’s 2007 and 2008 baskets contained the smallest numbers of
firm years among the six periods, 2,168 and 2,002, respectively. The six periods were further
segmented into Big Four and nonBig Four auditees and various industry bins for deeper
13
analysis. Because of reviewer objections to excess statistical power, the authors abandoned Chi
square and Z statistics in favor of MAD because “MAD computation is not affected by sample
size” (9). Beyond this misreading of the statistical dynamics of MAD, the authors also reversed
the statistical power concept defining excess power as “failure to reject the null that fraud does
not exist” (9).10 While the paper provides insufficient information on subsample sizes to reach a
firm conclusion, we believe that some of its significant Benford's Law findings may be
jeopardized by small Ns, such as the barely nonconforming 2007 current assets and 2008 total
assets MADs reported in Table 3 (17).
When Amiram et al. (2015) used MAD and the KS statistic to test the first digits of all
numbers in most U.S. public company financial statements during the 20012011 interval, they
found overall conformity (2526). In addition, they found that if revenue alone is manipulated,
such manipulation is likely to reduce overall financial statement conformity at the firm level
(27), that financial statements containing known misstatements exhibited lower conformity
before restatement than after (29), and that conformity among restating firms was higher in post
restatement than prerestatement years (3031).
However, like Alali and Romero, Amiram et al. (2015) both relies on and perpetuates the
misconception that the MAD statistic is “scale invariant,” i.e., that it does not depend on N (9
10). They cite no basis for this interpretation, but it seems likely the result of misreading
Pinkham’s scale invariance theorem. Pinkham’s theorem states that if all numbers in a Benford
set are multiplied by a nonzero constant, the transformed set will also be a Benford set (Nigrini
2012, 3134). In other words, for Benford’s Law tests, the unit of measure (a.k.a. “scale,” as in
10 Excess power causes rejection of the null, not failure to reject the null (Hochster 2008;
Gelman and Weakliem 2009; Nigrini 2012, 151; Seaman, Seaman and Allen 2015).
14
gallons, liters, or barrels) does not matter but N does. We have not ascertained whether the
findings of Amiram et al. (2015) are jeopardized by this apparent misreading.
In relation to the nonparametric GAM methodology employed here, Kukuk and
Rönnberg (2013), reported on a study of corporate credit default models that uses GAM. Pana et
al. (2015) utilized GAM in studying the impact of online services on credit unions, and
Lajbcygier and Shen (2008) used GAM to examine the causes of asset growth in hedge funds.
Further, a November 2015 search of the ProQuest Business database on the phrase “generalized
additive model” produced 250 peer reviewed articles with publication dates of 2005 or later.
Thus, despite its relative novelty in accounting journals, GAM is a common method of statistical
analysis.
METHODOLOGIES
This study addresses four separate research questions using distinct methods. Methods,
results, and discussion for RQ 1 and RQ 2 are presented here in Part III because the answers
were developed through mathematical derivation, not statistical analysis of actual data. In
contrast, RQs 3 and 4 required analysis of actual data; therefore, their data and methods are set
forth in Part III, their results in Part IV, and discussion in Part V.
Because of analytical advantages associated with the firsttwo digits test, our
investigation focuses on the behavior of the first two digits with the understanding that the same
theories are also scalable to other digits singly or in combination. For a population of numbers
that conforms to Benford’s Law, the probability that the first two digits of a data value equal k,
for k = 10,11,…,99, is given by
Pr( =)=10 1 +
(3)
15
(Nigrini 2012, 5). These expected frequencies are graphically displayed in Figure 3, which
demonstrates that the frequency with which the first two digits equals k is expected to
continually diminish from k = 10 to k = 99. Thus, an overabundance of, say, 99s as the leading
digits should indicate an increased probability of manipulation.
[Insert Figure 3]
As outlined in Part II, assessing the statistical and practical conformity of a data set to
Benford’s Law has proven difficult. Given the disadvantages of traditional NHST statistics like
ChiSquare (Equation 4), MAD (Equation 2) appears to offer a reasonable alternative.
Pearson’s Chisquare Statistic = ()
(4)
Like Chisquare, MAD is based on differences between observed and expected counts.
However, MAD and Chisquare differ in some respects. One notable distinction is that MAD is
based on the absolute magnitude of the differences instead of squares of the differences and,
therefore, MAD is less sensitive than Chisquare to a single large difference. Another distinction
is that MAD may be interpreted as the degree of deviation between the observed data distribution
and Benford’s Law (i.e., an effect size), whereas the Chisquare statistic is more strongly
impacted by N and is not advocated as a measure of the degree of conformity.
One disadvantage of MAD is that there is no empirical consensus as to how large the
MAD must be to signal a practically meaningful deviation from Benford’s Law (Nigrini 2012,
159; Amiram et al. 2015, 33). We do not attempt to answer the effect size question in this paper.
However, for frequencies of the firsttwo digits, Nigrini (2012) has proposed approximate MAD
values of 0.0018 and 0.0022 as the upper boundaries of “acceptable” and “marginally
acceptable” conformity, respectively, with the “nonconformity” zone beginning at 0.0022 (160).
16
Without endorsing the abstract suitability of these cutoffs, we use them as benchmarks in this
study.
RQ 1: The MADN Relationship
To probe the relationship between N and MAD, we simulated data randomly sampled
from a known Benford set with N varying between 500 and 3,500 in increments of 20.
Specifically, for a given N a (pseudo)random sample of firsttwo digits were generated based on
the probabilities prescribed under Benford’s Law. For these samples, we calculated the firsttwo
digit MAD value and created indicator variables to track whether the MAD exceeded 0.0018
(Nigrini’s “marginal conformity” cutoff) or 0.0022 (the “nonconformity” cutoff). At each N, this
process was replicated 25,000 times, allowing accurate Monte Carlo estimates of various
percentiles for the MAD sampling distribution as well as estimates of the probability that MAD
exceeds the marginal or nonconformity cutoffs. Figure 4 displays the estimated 5th, 25th, 50th,
75th, and 95th percentiles of MAD as a function of N overlaid by Nigrini’s 0.0018 and 0.0022
thresholds for reference. Figure 5 depicts the estimated probability that the MAD exceeds
Nigrini’s marginally acceptable or unacceptable cutoffs as a function of N.
[Insert Figure 4]
[Insert Figure 5]
Figures 4 and 5 suggest three key takeaways. First, they corroborate the assertion that the
thresholds or critical conformity values for the firsttwodigit MAD depend on N unless N is
sufficiently large. Second, the firsttwodigits MAD is virtually guaranteed to exceed 0.0022 if N
is 500 or less, even if the population of numbers exactly conforms to Benford’s Law. Third, N
must be at least 2,000 (3,000) to ensure no more than a very small chance that the MAD value of
a Benford set exceeds .0022 (.0018). In summary, because of the demonstrated negative
correlation between MAD and N, it is fair to conclude that Nigrini’s conformity thresholds, to
17
the extent that they are otherwise valid, must be raised for N less than 3,000 and arguably for N
greater than 3,000.
A deeper understanding of how and why N may influence twodigit MAD values is
achieved by deriving the expected twodigit MAD for a Benford set. That is, for a fixed sample
of size N from a Benford set, the expected MAD value is
()=
()(1)
(/)
(5)
where =10 1 +
. The application of this formula is computationally demanding for
very large values of N (say, N > 1,000,000), but for N ≥ 500 is well approximated by Equation
6.11
()
. (6)
Equation (6) clearly reveals that for a sample from a Benford set MAD is expected to
decrease as N grows. Illustrating the accuracy of this simple E(MAD) approximation, Table 1
displays both the exact MAD values and approximations for Benford sets of different sizes. The
pairs are very similar to six decimal places if N ≥ 1000.
[Insert Table 1]
While prior literature suggests that the firsttwo digits MAD value ought to be employed
only for large Ns, it does not directly elucidate the probabilistic relationship between N and
11. The approximation is readily derived by recognizing that E(MAD) may be expressed
as the average over the index k of terms having the form ()
and by then recognizing that
in each such term, the asymptotic distribution of Xk is normal with a mean of Npk and a variance
of (1). As such, ()
is approximately equal to ()
. The average of
these 90 approximations over the index k is thus given by
(1 ), and
because (1)= 8.9502, then ()
..
18
MAD or the deterministic relationship between N and the mean or E(MAD). If N is not
particularly large, then Nigrini’s MAD thresholds ought to be elevated. For example, for N ≤
1,000, even a “pure” Benford set is expected to have an unacceptably high MAD. But how
should the thresholds be adjusted? Fortunately, our RQ 2 results suggest that it may not be
necessary to engage this thorny question.
RQ 2: Excess MAD
In place of adjusting the MAD thresholds to mitigate the effect of N on MAD, we
propose entirely replacing MAD with the excess of MAD over E(MAD) corresponding to a
Benford set of the same N as the actual data being tested. The excess represents the portion of
the observed MAD above that attributable to chance alone. Thus, =
().
For practical application, at least for N > 1000, it is advantageous to approximate Excess
MAD using Equation 6.
. (6)
For a Benford set, Excess MAD will have an expected value of 0, regardless of N.
Therefore, it is easier to justify a uniform threshold as being appropriate for “marginal
acceptance” (or for “unacceptability”) if based on the Excess MAD. At an even more basic level,
the sign of Excess MAD is a potentially meaningful signal. If Excess MAD < 0, then the first
two digits MAD is less than expected by chance and, consequently, is evidence of conformity.
On the other hand, if Excess MAD > 0, its magnitude is a direct measure of nonconformity. In
audit and forensic accounting settings, using Excess MAD in place of MAD can offset the
tendency toward too many false positives (e.g., where MAD > .0018) when N is not large,
thereby facilitating more effective and efficient deployment of audit or investigative resources.
19
Importantly, Excess MAD answers a different question than the Chisquare statistic: How
much more do the sample proportions vary from Benford’s Law than what is expected because
of chance? As N grows, the variation expected because of chance alone approaches 0, but excess
MAD will approach 0 only if the null hypothesis is true. Chisquare, on the other hand, is not
suited for assessing the magnitude of the difference between the true and observed proportions,
but rather the statistical significance; unless Benford’s Law perfectly captures reality, Chisquare
tends to grow with N.
RQs 3 and 4: MAD vs Excess MAD
Data
Investigating RQ 3 and RQ 4, we compared MAD to Excess MAD for financial data
covering fiscal years 19702013 inclusive obtained from the merged CRSP/Compustat
fundamentals annual database, considering only consolidated financial statements in U.S. dollars
for firms headquartered in the United States. Some firms in the CRSP/Computstat database had
duplicate records for the same fiscal year: one in “industrial” format and another in “financial
services” format. For firms with twodigit SIC codes equal to 6064 or 67 (finance, insurance, or
investment), we used the financialservicesformat data and used the industrial format for all
others. Another form of duplication appeared to be related to mergers or stock splits. For this, we
used probabilistic record linkage strategies (Wright 2011) to remove likely duplicates. The
portion of firm years thus removed was relatively small (1 percent of the final total).
Rather than considering all numbers reported by a firm in a given year, as did Amiram et
al. (2015), we focused on ten income statement and balance sheet variables: REVENUE, COGS,
EBIT, NET INCOME, RECEIVABLES, INVENTORY, NET PP&E, INTANGIBLE ASSETS,
CURRENT LIABILITIES, and TOTAL ASSETS. These were selected based on financial
accounting theory or prior research (e.g., Johnson and Ireland 2007; Beasley et al. 2010, 16;
20
Kearns et al. 2011; Tilden and Janes 2012, 4; Scholz 2014, 7) suggesting that they either have
been or are likely to be frequently manipulated. For each variable, available firm years were split
into positive and negativevalue subsets, recognizing that manipulative motives are expected to
differ based on the typical sign or valence of the financial statement amount (Guan et al. 2008;
Alali and Romero 2013, 27).
If a desired value was not reported or had less than two available digits, the observation
was ignored for that variable because it was not possible to determine the first two digits.12
While this implies that sufficiently small values were necessarily excluded, we did not exclude
values that were atypically large because the presence of outliers is not inconsistent with
Benford’s Law.13 However, because both MAD and excess MAD are still prone to notably
higher variability as N decreases, N—in this case, the firm years available for each variable—
required some minimum permissible threshold. Thus, if the number of observations for a
variable’s positive (or negative) valence was less than 1,000, we ignored that year for the
variable’s analysis of positive (negative) numbers. This led to a substantial reduction in the
number of negativevalence firm years analyzed.
Methods
For all available years, we began by computing the MAD between the predicted
Benford’s Law frequency and the actual frequency of the firsttwodigits on a variableby
variable basis, separately for positive and negative valences. After calculating and plotting the
12. We accessed the merged CRSP/Compustat database via WRDS. For the variables
examined, the observations had been rounded to the nearest thousand. As such, to have two
usable digits we required the reported magnitude to be at least ten thousand dollars.
13 Fewster (2009) states that for a Benford set, “the distribution of X should span several
orders of magnitude” and be “reasonably smooth” (28). This, coupled with the characteristic
skew of Benford's data, explains why "outliers" should be expected.
21
MAD and Excess MAD time series for each of the variables, we sought to assess whether any
temporal patterns emerged within each time series.
For assessment of temporal patterns, we initially considered using linear spline regression
models with prespecified knots, i.e., firm years in which there is either a sudden change in the
average level and/or a sudden change in the slope of the linear trend. However, many knots are
possible because of the breadth and depth of potentially influential economic and regulatory
events. Further compounding this modeling complexity, linear spline results could be adversely
affected if the knots are not correctly specified. Thus, a more tenable alternative was to avoid
assuming that the landmark knot years are known a priori. For similar reasons, we opted against
an event study approach. Given these considerations, we chose generalized additive modeling
(GAM), a nonparametric statistical technique that enables flexible modeling of the mean
structure, allowing the mean levels to change over time in a relatively smooth manner without
restricting the timing of such changes.
Because GAM is rare in accounting literature, we briefly describe the methodology here.
Further information may be found in Ruppert, Wand, and Carroll (2003) and Wood (2006). The
idea behind the generalized additive model (GAM) is to make minimal assumptions about the
nature of the relationship between the variables. For example, the model relating a response
variable, , to an explanatory variable, , might be posited as
=() + (7)
where it is assumed that the deviations, ε, have a mean of 0 but where f(x) has few restrictions.
The characterizing feature of such models is that they attempt to strike a reasonable balance
between fitting the observed data well (i.e., to have relatively small error terms) while having a
sufficiently “smooth” functional form f(.). The rationale for the former is selfevident, while the
22
purpose of the latter is to avoid overfitting—that is, making the model results too reliant on the
particular data set used and thereby limiting its generalizability. The degree to which overfitting
is mitigated depends on a smoothing parameter that may be empirically selected using automatic
methods beyond the scope of discussion here.
The R package mgcv (Wood 2006) contains many functions to readily fit GAMs.14 Using
this suite of statistical functionality, we fitted a GAM to each variable’s yearly MAD values,
with the year as the explanatory variable. In addition to estimating the value of the assumed
underlying relationship, (i.e., f(x)), we also obtained approximate 95percent confidence
intervals on the “true” value of f(x) for every year considered.15 This capability greatly increases
the utility of GAMs because it allows one to assess whether a simpler relationship would suffice.
For example, if all of the annual pointwise confidence intervals encompass a common
MAD value, then arguably there is no relationship at all, as f(x) could be a flat line. If a single
straight line would fit entirely within all of the pointwise confidence intervals, then f(x) could
be a straight line indicating that a linear model should suffice. Thus, a statistically significant
trend is supported if, over a given domain, no flat line fits entirely within the confidence interval
bounds. Similarly, a statistically significant change in a trend is supported if, over the domain, no
straight line fits within the bounds. While it would be convenient if there were a pvalue
associated with the significance of each curve, the complexities of the nonparametric modeling
14. R is an opensource software environment for statistical analysis and graphing. It is
widely used by statisticians and data analysts in commercial and academic settings (e.g., Poynter,
Winder and Tai 2015; Kruschke, 2015). More information about R is available at https://www.r
project.org/.
15. Each interval was constructed by extending two standard errors above/below the
estimated value of f(x) for the year. The estimate of f(x) was created by estimating an overall
mean response and the (smoothed) systematic yearspecific deviation from this estimate. The
uncertainty in the overall mean and the uncertainty in yearspecific trends from this overall mean
were both incorporated in the computation of the standard error.
23
used to adequately fit these data while also accounting for autocorrelation does not lend itself
well to the direct calculation of a pvalue. Nonetheless, the stated guidelines relating to the
pattern(s) in the confidence bounds are adequate for the purposes of this study.
Because the overwhelming majority of firms represented in any given year were also
represented in the next year, the raw financial statement numbers from yeartoyear would be
expected to exhibit some autocorrelation. Likewise, it is possible that conformity to Benford’s
Law at a given firm exhibits some temporal correlation. To account for the potential correlation
from year to year, the error terms in the GAMs were allowed to follow a first order
autoregressive process (AR(1)) allowing the strength of the dependence to decay as the time
interval between a pair of measurements grows.
RQS 3 AND 4 RESULTS
Descriptive Statistics
Investigation of RQ 3 and RQ 4 required the calculation of MAD and Excess MAD for
the actual financial statement data described here. For each variable, Table 2 reports the mean,
median, and N after applying appropriate data restrictions. The data for each variable are very
skewed away from zero as expected for data conforming to Benford’s Law. We analyzed
positive and negative valences as separate variables. Negative valences not reported in this table
had fewer than 200 available observations.16
[Insert Table 2]
Figures 6 and 7 display the number of firm years available for analysis by variable, by
year. A year’s MAD value was not considered (or shown in Figures 6 or 7) if fewer than 1,000
16. Consistent with their typical positive balances under GAAP, variables having fewer
than 200 available firm years of negative values were total revenue (N=128); COGS (14);
TOTAL ASSETS (0); RECEIVABLES (2); INVENTORY (0); NET PP&E (0); INTANGIBLE
ASSETS (1); and CURRENT LIABILITIES (0).
24
observations were available for that year, a cutoff demarcated in the Figures 6 and 7 plots by a
dashed gray line.
[Insert Figure 6]
[Insert Figure 7]
This data cutoff has a notable effect on negative versions of all variables considered
except NET INCOME and EBIT—only these variables had at least 1,000 firms with a negative
value for any year.
MADN Correlation
Plots resulting from the nonparametric GAM analysis of MAD and Excess MAD appear
in Figures 8a8b (income statement amounts) and Figures 9a9b (balance sheet amounts). In each
figure, the plots on the left depict MAD trends with reference lines at the .0018 and .0022
thresholds. Plots on the right depict Excess MAD with a horizontal reference line at 0. Along
with the estimated means, 95 percent pointwise confidence intervals are displayed. Recall that if
any horizontal line falls within the confidence bounds for all years, compelling evidence is
lacking to support a change in the mean MAD (or mean Excess MAD) level over time. In
contrast, that no horizontal line falls within these bounds would constitute evidence of a
changing mean level. A line of slope 0 and intercept > 0 would indicate a persistent level of
nonconformity.
[Insert Figures 8a8b]
[Insert Figures 9a9b]
Consider the plots depicting MAD. The MAD quantities for positive valences did not rise
above the nonconforming level of 0.0022 proposed by Nigrini (2012) but rose above 0.0018 for
EBIT (1981, 1990) and negative NET INCOME (1983). While the MAD values tended overall
to be acceptably low, substantial commonality appears in the patterns over time across variables.
25
MAD values noticeably decreased during the first half of the 1970’s, consistent with the
relatively small number of firm years available in 19701973. All variables but negative EBIT
and NET INCOME and INTANGIBLES experienced minor fluctuations around a slightly
decreasing trend from the mid1970’s until the late 1990s. MAD values generally increased
slightly starting in about 1999.
Two variables had sufficient negative values to permit a meaningful trend analysis: NET
INCOME and EBIT. For these two variables, two features stand out in Figure 8. First, the
MADs are substantially higher for negative valences than for positive ones. Second, while the
MAD conformity levels increased after the late 1990s, the comparatively wider confidence
intervals indicate that this apparent trend for negative valences is not as precisely estimated as
related trends for positive valences. This is consistent with the visually evident wider dispersion
of negative MAD observations. Based only on the Figure 8 MAD plots, it is tempting to
conclude that there has been a change in the average level of conformity with Benford’s Law
over time, with less conformity since the turn of the century. Without more information, we
might think that SOX, implemented in 2003, drove MADs and manipulation up.
However, a striking pattern emerges on comparing the Ns to observed MAD values. That
is, for all variables, higher Ns tend to be associated with lower MAD values. This association
calls into question the utility of uniform MAD cutoffs (MAD conformity thresholds unadjusted
for N). To explore this association further, we regressed N against MAD by variable, finding
correlations ranging from 0.71 to 0.90. In general, we found a significant negative relationship
between the Ns and observed firsttwo digit MADs presented in Figures 8 and 9. We would
expect a similar result from regressing variances against N. The observed dispersion of MADs
around the plotted means is also most likely an artifact of N.
26
Excess MAD
Given the variation in the firm years available for each fiscal year and the predictable
effect that fluctuating Ns17 can have on MAD, Excess MAD serves up a different picture of
temporal patterns using the same underlying data. In most panels (right sides of Figures 8 and 9),
the estimated slopes and intercepts for Excess MAD are very similar to those of the horizontal
reference line at 0, suggesting that (1) the excess twodigit MAD is relatively constant over time
for most financials we considered, and (2) the observed twodigit MAD values are close to those
expected of a Benford set.
The only clear exception to overall stability appears in positive NET INCOME for which
the negative slope signals slightly decreasing deviation. Depending on the observer’s point of
view regarding practical effect size, less pronounced changes might also be noted for
RECEIVABLES, negative NET INCOME, and INTANGIBLES. For RECEIVABLES, the faint
positive slope hints at slightly increasing deviation, while negative NET INCOME appears to
begin above the reference line and climbs, suggesting a baseline level of slightly increasing
deviation. Deviation of INTANGIBLES may decrease slightly. Overall, however, most variables
show fairly constant Excess MAD values across the study period with intercepts close to the
reference line.
RQS 3 AND 4 DISCUSSION
Comparing MAD to Excess MAD
The results illustrate how Excess MAD can tell a different, more accurate story than
MAD. As our discussion of MAD, N, and E(MAD) suggests, whatever trends appear in the left
side plots of Figures 8 and 9 most likely result, either primarily or in significant part, from N
17 The cause of fluctuating Ns is a question for another study; however, the decline in N
after the year 2000 suggests possible impact of the dot com crash or the September 11, 2001
terrorist attacks on New York’s World Trade Center.
27
trends observed in Figures 6 and 7. Thus, the leftside plots of Figures 8 and 9 serve to illustrate
the virtues of approaching MAD in Benford’s Law analysis with care and skepticism. For a more
reliable measure of Benford’s Law conformity of the underlying numbers, we must turn
elsewhere.
To better assess the Benford’s Law conformity of the financial numbers studied here, we
analyzed the relationship between N and Benford’s Law conformity as measured by MAD,
finding strong evidence that—at least in this application—the annual MADs for each of the
variables are negatively correlated with N. This is consistent with Monte Carlo simulation results
demonstrating that even pure Benford sets are expected to exceed Nigrini’s uniform cutoffs with
probabilities displayed in Figure 5.
Exploration of MAD transformations that might mitigate MAD’s sensitivity to N point to
the expected value of MAD, styled “E(MAD)” which for a Benford set can be reliably calculated
for specific Ns either directly or through a simple approximation (Equation 8). E(MAD), in turn,
informs Excess MAD, a MAD transformation equal to the excess of observed MAD over
E(MAD), which mitigates the Nsensitivity of MAD. Finally, we calculated Excess MADs for
our variables and fit GAM models to them yielding the Excess MAD plots in Figures 8 and 9.
The relative consistency of the Excess MADs over time, despite pronounced fluctuations
in the unadjusted MADs, is of paramount importance in the interpretation of Benford’s Law
conformity: with respect to the data examined in this study, the yeartoyear MAD values shown
in Figures 8 and 9 were affected by whether adjustments were made for the Ns. Going forward,
this outcome should inform Benford’s Law research and practice.
We recommend that Excess MAD be used for all Benford’s Law tests based on MAD or
a MAD derivative. This recommendation is grounded in part on the differential inferences to
28
which MAD and Excess MAD lead. In addition, claims—express or implied—that MAD is
“scale invariant” (e.g., Johnson and Ireland 2007; Alali & Romero 2013; Amiram et al. 2015;
IDEA® 10) suggest that researchers and practitioners often do not make explicit adjustments for
N when interpreting MAD. Yet comparison across populations and studies of MAD values based
on differing Ns has the potential to mislead as to the existence and comparative degree of
deviation from Benford’s Law. This is especially true for N < 3,000 because of MAD’s proven
susceptibility to spuriously exceed the naïve 0.0018 marginal conformity threshold. Indeed,
relying on unalloyed MAD to test for Benford’s Law conformity inflates the base rate of
detection (not manipulation) by a function of the Nbased probability that MAD will exceed the
naïve threshold by chance alone.
Stability of Excess MAD 1970  2013
Testing Excess MAD against actual numbers, we conjectured that 2003 and 2010 might
be inflection points because of the implementation of SOX in 2003 and DFA in 2010,18 one
widely publicized purpose of which was to reduce management manipulation of financial
statements.19 Thus, intervals of special interest were pre2003, 2003 to 2009, and post2009. If
SOX and DFA are fully effective, then post2002 and 2009 financial statement numbers should
be more reliable and less biased20 than pre2003 and 2010 financial statements. A decrease in
18 While SOX was enacted in 2002, its provisions did not take effect until 2003, whereas
DFA was enacted and at least partially implemented in 2010.
19 For example, in his 2003 congressional testimony, SEC Chairman William H.
Donaldson identified as a motivation for SOX, the popular preSOX misperception “that
uninterrupted earnings growth was the hallmark of sound corporate progress [which] caused too
many managers to adjust financial results with the purpose of meeting projected results. . .”
(Donaldson 2003).
20 The term “bias” is used here to denote the extent to which the financial statements
depart from economic reality.
29
MAD levels subsequent to the implementation of SOX or DFA would be consistent with
declining bias and increasing reliability whereas higher MAD levels would suggest the opposite.
For both MAD and E(MAD), we employed a nonparametric model to avoid making a
priori assumptions about which years might actually be inflection points. One price for this
flexibility is an increased potential to overfit the observed data. The inclusion of the pointwise
95percent confidence intervals around each fitted curve is critical to evaluating whether an
apparent change in the trend is statistically significant or, conversely, if it is consistent with the
variability that is to be expected in a statistical analysis even when there is no change in the
trend. Across the 40year study period, MAD figures fluctuated markedly. In contrast, Excess
MAD remained relatively stable throughout, with the possible minor exceptions noted in Part IV.
Notably, in response to RQ 4 and contrary to our expectations, the straight nonparametric mean
functions and related confidence intervals for Excess MAD in Figures 8 and 9 do not support a
significant change in marketwide Benford’s Law conformity levels following implementation of
SOX or DFA.
CONCLUSION
Extant forensic accounting literature related to Benford’s Law offers contradictory and
incomplete guidance on the evaluation of deviation from Benford’s Law and the mathematical
relationship between MAD and sample size or N. The literature also suggests that researchers
and practitioners do not adequately account for sample size in tests of Benford’s Law
conformity.
We demonstrate in detail the effect N can have on MAD—namely, the tendency of MAD
to increase as N decreases. While this mathematical relationship may seem obvious to
researchers steeped in the alchemy of mathematical statistics, we point to evidence that it is
anything but obvious to most forensic accounting researchers and practitioners, some of whom
30
explicitly disavow it. If current guidelines for the interpretation of MAD are utilized, smallto
moderate Ns are at increased risk of being flagged as not conforming with Benford’s law, and an
accompanying rise in false positives should be anticipated.
This paper seeks to neutralize the impact of the N/MAD correlation on MADbased
analysis by empowering practitioners and scholars to achieve a balance between false positives
and false negatives that suits their practice or research objectives. Our findings can thus assist
practitioners in effectively targeting investigative or audit resources thereby minimizing the “cry
wolf” problem with its attendant inefficiencies. To this end, we propose a new MAD statistic,
Excess MAD, which for N ≥ 500 is easily approximated by
1
158.8
Excess MAD appears to avoid the sample size conundrum that afflicts MADbased measures
while retaining their advantages over traditional NHST statistical measures of Benford’s Law
conformity.
We also use generalized additive modeling (GAM) to provide evidence that the
Benford’s Law conformity of public company financial statement numbers has remained
relatively stable since 1970 with some exceptions. Clearly, positive NET INCOME experienced
a noticeable increase in conformity, while with very small effect sizes (a.k.a. low materiality
levels) the conformity of RECEIVABLES, negative NET INCOME, and INTANGIBLES may
have decreased, decreased and increased, respectively. These findings of little or no change in
Benford’s Law conformity over time, particularly in 2003 and 2010, suggest that SOX and DFA
may not be achieving one of their primary objectives—reducing the overall level of financial
statement manipulation among public companies.
31
While we have added considerable clarity to the relationship between MAD and N, a
critical question remains for future research regarding the conformity of financial statement
numbers to Benford’s Law: Where should researchers and practitioners draw the line on the
materiality of Benford’s Law results or, statistically speaking, the minimum practical effect size.
Excess MAD, as a direct measure of effect size, may facilitate investigation and eventual
resolution of the effect size question but does not on its own resolve it. Answering this question
will permit further optimization of tools and methods for assessing conformity, thereby
sharpening investigative efforts.
32
References
Alali, F. and S. Romero 2013. Benford’s Law: Analyzing a Decade of Financial Data. Journal of
Emerging Technologies in Accounting 10: 139.
Amiram, D., Z. Bozanic, and E. Rouen. 2015. Financial Statement Errors: Evidence from the
Distributional Properties of Financial Statement Numbers. Review of Accounting Studies
20:1540–1593.
Beasley, M., J. Carcello, D. Hermanson, and T. Neal. 2010. Fraudulent Financial Reporting
1998 – 2007: An Analysis of U.S. Public Companies (White paper). Committee of
Sponsoring Organizations of the Treadway Commission (COSO). Retrieved from
http://www.coso.org/documents/COSOFRAUDSTUDY2010_001.pdf.
Bishop, T. 2001. Auditing for fraud: Implications of current market trends and potential
responses. The Auditor's Report, 24(2): 1315.
Carslaw, C. 1988. Anomalies in income numbers: Evidence of goal oriented behavior. The
Accounting Review 63 (2): 321327.
Cohen, D. A., Dey, A., and Lys, T. Z. 2008. Real and accrualbased earnings management in the
pre and postsarbanesoxley periods. The Accounting Review, 83(3), 757787.
Dichev I., Graham, J., Harvey, C., and Rajgopal, S. 2013. Earnings Quality: Evidence from the
Field. Journal of Accounting and Economics, 56(23), 133.
Donaldson, W. Testimony Concerning Implementation of the SarbanesOxley Act of 2002
before the Senate Committee on Banking, Housing, and Urban Affairs. September 9, 2003.
Available at: https://www.sec.gov/news/testimony/090903tswhd.htm (last visited July 11,
2016).
Durtschi, C., W. Hillison, and C. Pacini. 2004. The effective use of Benford’s Law in detecting
fraud in accounting data. Journal of Forensic Accounting 5: 17–33.
33
Fewster, R. 2009. A Simple Explanation of Benford’s Law. The American Statistician, 63: 26
32.
Gelman, A., and D. Weakliem. 2009. Of Beauty, Sex and Power: Too Little Attention Has Been
Paid to the Statistical Challenges in Estimating Small Effects." American Scientist 97, no. 4
(2009): 31016.
Geyer, C. and P. Williamson. 2004. Detecting fraud in data sets using Benford’s Law.
Communications in Statistics: Simulation and Computation 33 (1): 229246.
Gorard, S. (2013) The possible advantages of the mean absolute deviation “effect” size. Social
research update, 65 (Winter 2013):14.
Guan, L., Lin F., and Fang, W. 2008. Goaloriented earnings management: Evidence from
Taiwanese firms. Emerging Markets Finance and Trade, 44(4), 1932.
Hochster, H. S. 2008. The Power of “P”: On Overpowered Clinical Trials and “Positive” Results.
Gastrointestinal Cancer Research: GCR, 2(2), 108–109.
IDEA® 10, Help, Benford’s Law Analysis Test (2016).
Johnson, C., and T. Ireland. 2007. An empirical examination of manipulation in components of
the income statement. Journal of Forensic Accounting, 8(1): 128.
Johnson, G., and J. Weggenmann. 2013. Exploratory Research Applying Benford's Law to
Selected Balances in the Financial Statements of State Governments. Academy of
Accounting and Financial Studies Journal, 17(3), 3144.
Kearns, G., K. Barker, and S. Danese. 2011. Developing a Forensic Continuous Audit Model.
The Journal of Digital Forensics, Security and Law, 6(2), 2547.
Kinnunen, J. and M. Koskela. 2003. Who is Miss World in cosmetic earnings management?
Journal of International Accounting Research, 2(1): 3968.
34
Kruschke, J. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan 2d (New York,
N.Y.: Elsevier, 2015).
Kukuk, M., and Rönnberg, M. 2013. Corporate credit default models: A mixed logit
approach. Review of Quantitative Finance and Accounting, 40(3): 467483.
Lajbcygier, P., and Shen, E. 2008. Incentives for asset growth: The different causes of monthly
inflows and outflows of surviving managed futures funds. Journal of Derivatives and
Hedge Funds, 13(4): 287303.
Law, P. (2009). Advancement to partnership in public accounting firms in Hong
Kong. Managerial Auditing Journal, 24(8): 792805.
McKee, T. 2010. The 'Cry Wolf' Problem in Current Fraud Auditing Standards. CPA Journal,
80(1):60.
Nigrini, M. Forensic Analytics: Methods and Techniques for Forensic Accounting Investigations
(Hoboken, N.J.: John Wiley and Sons, 2011).
Nigrini, M. Benford’s Law: Applications for Forensic Accounting, Auditing, and Fraud
Detection (Hoboken, N.J.: John Wiley and Sons, 2012).
Nigrini, M. and S. Miller, S. 2009. Data Diagnostics Using SecondOrder Tests of Benford’s
Law, Auditing: A Journal of Practice and Theory 28(2): 305324.
Pana, E., Vitzthum, S., and Willis, D. 2015. The impact of internetbased services on credit
unions: A propensity score matching approach. Review of Quantitative Finance and
Accounting, 44(2): 329352.
Poynter, J., J. Winder, and T. Tai. 2015. An analysis of comovements in industrial sector indices
over the last 30 years. Review of Quantitative Finance and Accounting, 44(1), 6988.
35
Public Company Accounting Oversight Board (PCAOB). 2016. AS 2401, Consideration of
Fraud in a Financial Statement Audit.
Ruppert, D, M. Wand, and R. Carroll. Semiparametric Regression (New York, N.Y.: Cambridge
University Press, 2003).
Scholz, S. 2014. Financial Restatement Trends in the United States: 20032012. Center for Audit
Quality.
Seaman, C., J. Seaman, J., and I. Allen. 2015. QualityProgress.com, Statistics Roundtable, The
Significance of Power: Avoid mistakenly rejecting the null hypothesis in statistical trials.
Available at: http://asq.org/qualityprogress/2015/07/statisticsroundtable/thesignificance
ofpower.html (last accessed July 10, 2015).
Tilden, C., and T. Janes. 2012. Empirical evidence of financial statement manipulation during
economic recessions. Journal of Finance and Accountancy, 10: 115.
U.S. House of Representatives. 2002. SarbanesOxley Act of 2002. Public Law 107204 [H.R.
3763]. Washington, DC: Government Printing Office.
U.S. House of Representatives. 2010. DoddFrank Wall Street Reform and Consumer Protection
Act. Public Law 111203. Washington, DC: Government Printing Office.
Wallace, W. 2002. Assessing the quality of data used for benchmarking and decisionmaking.
The Journal of Government Financial Management 51(3): 1622.
Wilson, T. 2012. Further Evidence on the Extent of Cosmetic Earnings Management by U.S.
Firms, Academy of Accounting and Financial Studies Journal 16(3): 5764.
Wood, S. Generalized Additive Models: An Introduction with R (Boca Raton, FL: Chapman and
Hall/CRC Press, 2006).
36
Wright, G. 2011. Probabilistic Record Linkage in SAS. Available at:
www.wuss.org/proceedings11/Papers_Wright_G_76128.pdf.
Tables and Figures
Figure 1
Benford’s Law firsttwo digits results for 1,166 sample banking transactions. The “Preview
Database” feature (inset) tables the transactions beginning with “59” (see spike, left of inset)
that were automatically identified as “highly suspicious”. Source: IDEA® 10
37
Figure 2
Regression of MADs against Ns > 5000 obtained
from Table 3 of Johnson and Ireland (2007)
38
Figure 3
Benford’s Law relative frequencies for firsttwodigit pairs
39
Figure 4
Monte Carlo estimates of selected percentiles from the sampling distribution of the first
twodigit MAD as a function of sample size, for data sampled from a population of numbers that
exactly follows Benford’s Law
40
Figure 5
Monte Carlo estimates of the probability that the firsttwodigit MAD exceeds Nigrini
thresholds, as a function of N for data sampled from a Benford set
41
Table 1  Comparison of exact and approximated E(MAD) values
for data sampled from a Benford distribution by N
E(MAD)
N
Exact
Approximated
500
0.003552
0.003549
1,000
0.00251
0.002509
3,000
0.001449
0.001449
10,000
0.000793
0.000793
100,000
0.000251
0.000251
1,000,000
0.000079
0.000079
42
Table 2
Descriptive Statistics for Tested Variables
Variable
Valence
N
Mean
(millions)
Median
(millions)
REVENUE
+
268503
1109
76
COGS
+
241882
744
45
EBIT
+
169736
163
11
EBIT

75505
17
2
NET INCOME
+
159349
99
6
NET INCOME

89168
45
4
RECEIVABLES
+
234220
182
10
INVENTORY
+
197819
123
10
PP&E (NET)
+
267560
489
15
INTANGIBLE ASSETS
+
133749
406
9
TOTAL ASSETS
+
277436
2614
88
CURRENT LIAB
+
244336
249
14
43
Figure 6
Firm years by variable by fiscal year — Income Statement variables
44
Figure 7
Firm years by variable by fiscal year — balance sheet variables
45
Figure 8 (a)
MAD (excess MAD) estimated mean functions and approximate pointwise 95%
confidence intervals relating the fiscal year (xaxis) to the MAD (excess MAD) for selected
income statement variables
46
Figure 8 (b)
MAD (excess MAD) estimated mean functions and approximate pointwise 95%
confidence intervals relating the fiscal year (xaxis) to the MAD (excess MAD) for selected
income statement variables
47
Figure 9 (a)
MAD (excess MAD) estimated mean functions and approximate pointwise 95%
confidence intervals relating the fiscal year (xaxis) to the MAD (excess MAD) for selected
balance sheet variables
48
Figure 9 (b)
MAD (excess MAD) estimated mean functions and approximate pointwise 95%
confidence intervals relating the fiscal year (xaxis) to the MAD (excess MAD) for selected
balance sheet variables