ArticlePDF Available

Abstract and Figures

We hypothesized that multiplicative decomposition would improve accuracy only in certain conditions. In particular, we expected it to help for problems involving extreme and uncertain values. We first reanalyzed results from two published studies. Decomposition improved accuracy for nine problems that involved extreme and uncertain values, but for six problems with target values that were not extreme and uncertain, decomposition was not more accurate. Next, we conducted experiments involving 10 problems with 280 subjects making 1078 estimates. As hypothesized, decomposition improved accuracy when the problem involved the estimation of extreme and uncertain values. Otherwise, decomposition often produced less accurate predictions.
Content may be subject to copyright.
Published in International Journal of Forecasting, 10 (1994), 495-906
Judgmental Decomposition: When Does It Work?
Donald G. MacGregor
Decision Research, Eugene, OR
J. Scott Armstrong
The Wharton School, University of Pennsylvania, Philadelphia, PA
Abstract
We hypothesized that multiplicative decomposition would improve accuracy only in certain
conditions. In particular, we expected it to help for problems involving extreme and uncertain
values. We first reanalyzed results from two published studies. Decomposition improved accuracy
for nine problems that involved extreme and uncertain values, but for six problems with target
values that were not extreme and uncertain, decomposition was not more accurate. Next, we
conducted experiments involving 10 problems with 280 subjects making 1078 estimates. As
hypothesized, decomposition improved accuracy when the problem involved the estimation of
extreme and uncertain values. Otherwise, decomposition often produced less accurate predictions.
Keywords: Decision Analysis; Estimation; Extreme Values; Forecasting; Multiplicative
Decomposition; Uncertainty
I. Introduction
Consider the following question: What is the estimated yearly circulation of a proposed new magazine on
raising exotic animals? People are likely to respond that they have no idea. But do they? What are they likely to say
if asked whether the number was greater than 100 million? Would they say that it is less than 1000? Most likely,
people would say that the true value is somewhere between these two values. Obviously, they know more than they
think they do when first asked.
How well a person is able to forecast a quantity is related to the relevant information that they have at their
disposal, either from information sources or from experts. It is also a function of whether they can break the problem
into parts so that they can use their information effectively. Forecasters frequently break a problem into parts, make
forecasts from each part, then recombine the separate forecasts to make a forecast of the target value. In 1968
Howard Raiffa (1968) claimed that such a procedure, decomposition, is `the spirit of decision analysis.' Since then,
research has seemed to support the view that decomposition is a useful strategy with wide applicability and little
risk.
Prior literature on judgmental decomposition (Armstrong et al., 1975, and MacGregor et al., 1988)
concluded that decomposition would be especially effective for problems involving uncertain values. However, we
do not know much about the conditions under which judgmental decomposition is most useful. Armstrong et al.
(1975) had suggested that the scale of the problem might make further study worthwhile, and our paper addresses
that issue. In examining the problem, we reanalyzed results from two studies. In addition, we, conducted
experiments with new subjects. We also examined alternative approaches for assessing uncertainty to determine
whether they would yield different recommendations about when decomposition is appropriate.
2. Hypotheses
The basic idea behind decomposition is simple. Given a target quantity that is difficult to estimate, one
breaks the problem down into subproblems that are easier to estimate. The difficulty lies in translating this idea into
practice. For decomposition to be done successfully, certain conditions are desirable. First, the target value should
2
be one that is difficult to estimate. Second, estimation errors for each part should be less, relatively speaking, than
the errors for estimating the target value. Third, estimation errors for the parts should not have strong positive
correlations between one another. Negatively correlated errors are desirable so that one has offsetting errors. These
conditions are not easy to specify in operational terms.
Traditionally, the term decomposition has been used to refer to the practice of breaking a problem into
multiplicative elements. An additive breakdown is usually referred to as disaggregation or segmentation. Our paper
is restricted to multiplicative decomposition and we use the term decomposition to refer to this.
Decomposition is often viewed as a safe strategy. Rather than putting all of one's eggs into a single basket,
estimates are provided separately. Errors in one element may compensate for errors in another. However, when
errors are positively correlated, they can be explosive. For example, if two components are in the same direction and
are each equal to 20%, this would translate into an error of 44% in the target value (1.2 x 1.2 = 1.44).
Target values with extreme values are likely to create difficulties for subjects unless these numbers are well
known. For very large numbers, people might make estimates that are too small. Lacking good intuition, an
estimator might assign a `more reasonable number' to a quantity in question. We would expect the converse for very
small numbers, such as `one in 10 million.'
We hypothesized that decomposition would improve accuracy for problems with extreme values when
subjects were highly uncertain about the target value. The reasoning is simply that large numbers are confusing to
many people. With decomposition, the analyst might be able to avoid the extreme numbers associated with high
uncertainty. Uncertainty is an important aspect of this hypothesis. Thus, we do not expect that decomposition would
help to estimate well known numbers, such as the distance from the Earth to the sun (when most of the experts
believe that the distance is about 93 million miles).
The operational definition of an extreme value is difficult to determine. To provide a simple measure of an
extreme value, we initially defined it as any number having more than seven digits (equal to or greater than 10
million). Certainly, many people have difficulty grasping numbers of this magnitude. For example, a book has been
written' with the sole purpose of helping people to understand the magnitude of one million. It consists of one
million dots with comparisons at various points where examples are given (Hertzberg, 1970).1 Psychologists also
refer to the ability of the human mind to handle only seven things (plus or minus two).
The selection of the unit of measure causes problems. For example, one could change the units from miles
to inches when asking someone to estimate the distance from New York to San Francisco. However, some important
quantities are not amenable, either conceptually or computationally, to changes in scale.
We were also concerned about how best to assess uncertainty. In particular, would different approaches
lead to different conclusions about when to use decomposition?
3. Reanalysis of prior studies
In an early study of judgmental decomposition, Armstrong et al. (1975) concluded that multiplicative
decomposition typically improves accuracy and is unlikely to reduce accuracy. The study involved such problems as
estimating the number of packs of Polaroid film that were used in the United States in 1970. The results also
supported the hypothesis that decomposition is especially useful for problems where the estimator's perceived
uncertainty about the true value is high. A subsequent study by MacGregor et al., (1988) also found that judgmental
decomposition improves accuracy. That study used similar problems, for example, estimating the value of imported
passenger cars sold in the U.S. the previous year.
1 As an example of how difficult it is to think about extreme numbers, consider the following. A typographical error
was made in Armstrong et al. (1975). The number of cards saying "Carefree Sugarless Gum" that were sent to a
Philadelphia radio station was reported as 66.5 billion rather than the correct value, which was 66.5 million. We
missed this in proofreading, and the number has subsequently been cited in other papers without any questions being
raised.
3
Armstrong et al. (1975) examined uncertainty by asking 151 subjects to rank problems according to the
confidence that they had in their ability to provide accurate answers. MacGregor et al. (1988) addressed the same
issue by using the variability among 45 subjects in their estimates for each target value. Specifically, they focused
on the interquartile range. The interquartile range represents the middle 50% of a distribution and is calculated as the
difference between the point at the 75th percentile of the distribution (Q3) and the point at the 25th percentile (Ql);
the median of the distribution is at the 50th percentile (Q2). We expected that problems with extreme unknown
values would create uncertainty among estimators and would therefore show up in the interquartile range. We
examined this hypothesis by comparing the number of digits in each of the 16 problems in MacGregor et al. with the
interquartile range of error ratios. As expected, the number of digits was related to uncertainty. The correlation
between the number of digits in the actual values for each problem and the corresponding interquartile range was
about +0.75.
To examine whether decomposition improved accuracy for problems involving extreme unknown numbers,
we split the MacGregor et al. data according to magnitude and disagreement. This yielded five problems where scale
was not extreme (using seven or fewer digits gave a roughly equal breakdown of the problems) and where assessors
were in agreement (we used an interquartile range with a log l o of 1.3 or less, which means that the ratio between
the lowest quartile and the highest quartile is less than two). The five problems were the numbers of physicians,
marriages, alcoholics, university employees and hospital employees. Six problems had extreme magnitude (over
seven digits) and high disagreement among estimators (interquartile range of 1.75 or more, implying a ratio of 5.6 of
the largest to smallest quartile). These problems involved the numbers of welfare cases, imported cars, alcohol
dollars, mail handled by post offices, gasoline and cigarettes.
We estimated the average improvement for decomposition in MacGregor et al. in two steps. First,
geometric mean estimates were calculated for the group of subjects who used the decomposed version (this being
the computed full algorithm from Table 6 in MacGregor et al.) and for those who used the global version. These
estimates were then compared with the actual values for each problem.
Table 1
Decomposition versus global errors: reanalysis of prior studies
Conditions Number of Median error ratios
problems Global Decomposition Error reduction
Not extreme, low uncertainty 5 1.8 2.3 -0.5
MacGregor et al. 1 5.4 2.3 2.1
Armstrong et al.
Extreme, high uncertainty
MacGregor et al. 6 99.3 3.0 96.3
Armstrong et al. 3 18.0 5.7 12.3
Decomposition errors were smaller than global errors for each of the six problems where disagreement
(interquartile range) was high and the actual values were extreme. Subjects who made global estimates were in error
by a factor of 99.3 (9930%) on average. In contrast, the error ratio for the decomposed version for the same six
problems was 3.0, or 300%.2 Thus, the median error was reduced by a factor of 96.3 (see bottom part of Table 1).
For problems without extreme values and where disagreement was low, decomposition yielded less accurate
estimates, as its error was 50% higher than that for the global approach. Table 1 summarizes these results.
We did a similar analysis for the problems in Armstrong et al. (1975). Here, the analysis was based on
individuals rather than groups. Error ratios were calculated for each subject's estimate for each problem by
comparing their estimates with the actual values. The median error ratio was then obtained for each problem.
Decomposition produced substantial gains (1230% error reduction) for the three extreme problems with the highest
2 We calculated the geometric means of the two error ratios in the middle of the distribution for the global and
decompositional conditions.
4
uncertainty.3 Decomposition also provided a lesser improvement for the one problem that did not involve an extreme
number. Table 1 summarizes these results as well.
Averaging across the two studies (weighting according to the number of questions), decomposition reduced
error by a ratio of 68.3 for the nine problems involving extreme uncertain values. However, decomposition had no
overall effect for the other six problems.
4. An experiment on the effects of extreme uncertain values
We conducted an experiment to provide further evidence on the effects of multiplicative decomposition when
applied to problems with extreme uncertain numbers. This section describes the problems and the subjects.
4.1. Problems
We selected problems in which the magnitude of unknown numbers to be estimated varied. Our extreme
problems had seven or more digits, ranging in value from 3 540 940 to 4 243 000 000. As noted earlier, this
definition of extreme is somewhat arbitrary.4 Not extreme numbers in this set of problems had four digits or less, in
order to provide a marked distinction from extreme numbers. Table 2 provides the 10 problems, along with, the
correct answers taken from almanacs and fact books.
Table 2
Problems and magnitudes: versus actual versus estimated
Problem Correct answer Number of digits
Actual Upper quartile estimate
Not extreme magnitude
Circumference of a $.50 coin in inches
U.S. Presidents
Argentine immigrants to U.S. (annually)
Bank failures in 1993
3.71
41
2,800
4,004
1
2
4
4
1
2
5
4
Extreme magnitude
Area of U.S. in square miles
Circulation of TV Guide
Pairs of athletic shoes made per year
Auto accidents per year
Pairs of men’s pants made per year
Bushels of wheat produced in world per year
3,540,940
16,900,000
23,400,000
24,100,000
124,000,000
4,243,000,000
7
8
8
8
9
10
7
6
7
8
9
10
All questions relate to the U.S. unless stated otherwise.
Because actual values would not be known to the subjects, we first determined whether it would be
possible to identify problems that might involve extreme values. We reasoned that typical subjects would not do
well at such estimates. Thus, we used the geometric mean of the upper quartile (top 25%) of the estimates. That is, if
the upper quartile of subjects expected this to be an extreme number, then it was treated as such. By this measure,
the expected number of digits was a good match of the actual number of digits, as shown in Table 2. The largest
estimate for the small group was that Argentine immigrants would be a five-digit number, and the smallest estimate
3 We used the median error ratios across the new groups of subjects that were tested for the film and tobacco
problems. Only two groups did the Contest problem, and here we used the geometric mean.
4 After analyzing the prior research (Table 1), we revised our definition of extreme for this study from `more than
seven digits' to `seven or more digits.' Extremity could also be defined in terms of small numbers. An example
would be, `What is the chance that a person in the U.S. will die next year because of botulism?' (The answer is
1/100,000,000.)
5
for the extreme problems was that the Circulation of TV Guide would be a six-digit number, so the classification of
the problems was the same.
To determine whether the large target values were uncertain, we examined the interquartile ranges. The
smallest of these ranges for the group of problems having extreme values indicted that the upper quartile mean was
more than 10 times as large as the lower quartile mean.
For each problem we constructed a global version and a decomposed version. Table 3 summarizes the full
set of 10 decomposed algorithms. For the sake of brevity, only the algorithm steps requiring subjects to make
component estimates are provided; intermediate arithmetic steps are omitted. We also asked subjects to rate their
knowledge about each target value, their expected accuracy and the probability that their answer would be within
10% of the true value.
For some of the problems, such as Athletic shoes, one of the components involved an extreme value.
However, we were reasonably confident that subjects would know this value. Also, data on these values are readily
available so that one could insert the known value.
4.2. Subjects
Subjects for the experiment were individuals who answered advertisements in the University of Oregon
daily newspaper. The advertisements called for participation in judgment and decision-making tasks. Two hundred
and eighty individuals participated in the experiment, which was conducted in two sessions. Subjects were randomly
assigned to either the global or the decomposition treatment. In the first session, the problems $.50 coin, U.S.
6
presidents, Argentine immigrants, Bank failures, Circulation of TV Guide and Bushels of wheat were administered.
Those subjects assigned to the global treatment received all six problems. Because of time constraints, subjects
assigned to the decomposition condition received half of the problems. In the second session, the remaining four
problems were administered. Again, subjects in the global condition received all four of the remaining problems,
while decomposition subjects received half of the problems.
5. Results
As had been done in previous studies of judgmental forecasting (Armstrong et al., 1975, and MacGregor et
al., 1988), we used the error ratio as an index of accuracy. The error ratio is computed as the ratio of the individual's
estimated value to the correct answer, or the reverse, such that the result is greater than or equal to 1.0. Estimates for
a given problem were summarized across subjects by computing the geometric mean of the error ratios.
We had hypothesized that decomposition would improve accuracy for problems having extreme uncertain
values. The results, shown in Table 4, were consistent with this hypothesis. We summarized the problems into two
groups: extreme problems (correct answer greater than 3,540,940) and not extreme problems (correct answer 4,004
or less). Accuracy was superior for decomposition in five of the six extreme problems, with an error reduction that
ranged from a factor of 4.10 (Athletic shoes) to 91.47 (Auto accidents). Only the Circulation of TV Guide problem
suffered a decrease in accuracy with decomposition. This decrease was modest compared to the gains in accuracy
for the other five extreme problems, and this decrease was not statistically significant. Across all six problems, the
median error was reduced by a factor of 19.78, approximately a 20-fold improvement in accuracy. Following
Winer's method of adding is (as described in Rosenthal, 1978), these results were statistically significant at p <
0.001 using a one-tail test.5
Table 4
Error ratios for global versus decomposed estimates (for individuals)
Problems Sample size Error ratios (geometric
means) Error
reduction t-test
Global Decomp Global Decomp
Not extreme
$.50 coin
U.S. presidents
Argentine immigrants
Bank failures
Median
Combined experiments (z-test)
64
64
65
64
62
63
54
57
1.82
1.23
4.89
10.45
1.41
1.35
46.77
19.50
0.41
- 0.12
-41.88
- 9.03
- 4.58
4.07**
-1.55
-5.85**
-1.69
-2.49*
Extreme
Area of U.S.
Circulation of TV Guide
Athletic shoes
Auto accidents
Men’s pants
Bushel of wheat
Median
Combined experiments (z-test)
30
64
31
31
31
61
30
60
32
30
31
62
33.88
7.76
19.95
93.33
17.38
45.71
1.70
10.96
15.85
1.86
10.00
6.92
32.18
- 3.20
4.10
91.47
7.38
38.79
19.78
6.00**
-1.11
0.47
8.07**
1.01
4.57**
4.37**
*Significant at p < 0.05
**Significant at p < 0.001
By contrast, accuracy for not extreme problems was reduced with decomposition. Error Auction values for
three of the four not extreme problems were negative, indicating a superiority of global estimation over
5 Because the ratios involved some extreme values, the t-tests were done on the logs of the error ratios rather than on
the ratios themselves.
7
decomposition. Decomposition increased the median error or these problems by 458%, an increase that as
statistically significant at p < 0.05. The test or the not extreme values was two-tailed cause we had no directional
hypothesis. Our analysis overstates the statistical significance; the various estimates are not completely independent
of one another.
5.1. Uncertainty of estimation
Whether decomposition is appropriate depends on some measure of uncertainty. We propose that analysts
first determine whether the problem, is subject to much uncertainty. If so, decomposition may be appropriate,
especially if one can structure the problem to avoid extreme certain values.
Otherwise, global estimates should be used. Uncertainty decreases the degree to which an estimate from
various assessors exhibits a lower variance or a reduced range. Table 5 shows the interquartile ranges for the global
and decomposed estimates. The entries consist of the logs of Q1 and Q3 as well as their differences. Q1 corresponds
to the 25th percentile of the dis tribution, while Q3 corresponds to the 75th percentile. If decomposition reduces
uncertainty, then a lower Q3-Q1 difference should result. Computed in this way, the differences in Table 5 can be
interpreted as the number of digits by which the estimates of Q1 and Q3 differed.
Table 5
Analysis of interquartile ranges
Problems Global Decomposed
Log Q3 Log Q1 Differences
(Q3-Q1) Log Q3 Log Q1 Differences
(Q3-Q1)
Not extreme
$.50 coin
U.S. presidents
Argentine immigrants
Bank failures
0.48
1.71
4.30
3.70
0.20
1.59
3.30
2.08
0.28
0.12
1.00
1.62
0.63
1.70
5.74
4.92
0.42
1.53
3.60
3.18
0.21
0.17
2.14
1.74
Extreme
Area of U.S.
Circulation of TV Guide
Athletic shoes
Auto accidents
Men’s pants
Bushels of wheat
6.30
7.54
8.00
6.18
8.00
10.48
4.00
6.18
6.00
5.00
6.00
7.18
2.30
1.36
2.00
1.18
2.00
3.30
6.76
7.80
8.84
7.80
9.53
10.54
6.39
5.95
7.75
6.08
8.07
9.65
0.37
1.85
1.09
1.72
1.46
0.89
For not extreme problems, the interquartile ranges are higher for the decomposed estimates than the global
estimates for three of the four problems. For one problem, Argentine immigrants, the interquartile range for the
decomposed version was higher than that for the global (2.14 versus 1.00). This occurred even though each part had
the same interquartile range as the target value. This problem did not, then, meet the condition that the parts are
easier to forecast than the target value, nor were the errors independent. Thus, it is not surprising that decomposition
was not helpful for this problem.
For extreme problems, the range for the decomposed estimate was less than that for the global, except for
the Auto accidents and Circulation of TV Guide problems. In other words, decomposition often improved confidence
for difficult problems when the agreement among assessors' estimates was used to gauge confidence. Furthermore,
the differences between the global and decomposed ranges for the four problems with improvements were
substantial, being typically greater than one digit. Although the number of problems is not sufficient to assess the
relationship between the interquartile ranges arid errors, this result is consistent with that found in the seven
problems examined by Aschenbrenner and Kasubek (1978).
A tenet of decomposition states that the parts of a problem are more tractable than the whole. This means
that uncertainty in the estimates of a problem's components should be lower than that for the global estimate. We
computed the interquartile ranges for each of the components of the six problems in Table 6. The parts were easier
8
to estimate than the target value for three problems: 50 ¢ coin, U.S. presidents and Bushels of wheat. The first two of
these had target values that were easy to assess directly, whereas Bushels of wheat had an extreme value that was
difficult to measure. The Bushels of wheat problem met all conditions for decomposition. As expected,
decomposition was successful for this problem. Conversely, decomposition was less accurate for four of the other
five questions.
Table 6
Assessments of subjective confidence
Problems Mean knowledge ratingsa Mean accuracy ratingsa
Mean probability ratings
that estimate is within
10% of true answer
Global Decomposition
Global Decomposition Global Decomposition
U.S. Presidents
$.50 coin
Circulation/TV Guide
Bank failures
Argentinie immigrants
Bushels of wheat
6.16
5.50
3.40
3.19
2.15
2.24
5.46
4.45
2.35
2.17
1.81
2.02
6.02
5.61
3.46
3.06
2.38
2.16
5.38
4.45
2.58
2.20
2.32
2.27
64.4
54.9
32.1
28.0
24.4
18.9
35.2
55.8
24.6
18.9
16.8
19.9
a High scores imply greater knowledge and greater perceived accuracy (scale from 1 to 10).
5.2. Subjective confidence ratings
A second source of uncertainty estimates is the subjective confidence that forecasters have in their
knowledge about a problem. We addressed three questions with respect to subjective uncertainty. (1) Do alternative
measures of uncertainty yield similar recommendations? If yes, then we could use the least expensive approach to
assessing uncertainty. (2) Are judges more confident when they make decomposed estimates or global estimates? (3)
Does decomposition lead subjects to become better calibrated about their confidence?
As the simplest and least expensive approach, we asked subjects to provide judgments of their knowledge
about each target value, and the degree to which they thought their estimate would be accurate. Self-ratings of
knowledge and accuracy were obtained from the subjects before they made their estimates by using the following
scales.
“Before you begin, indicate on the scale below how much you think you know about the topic
(1= know very little; 10 = know a great deal).
“How accurately do you think you will be able to estimate this quantity?”
(1= low accuracy; 10 = high accuracy).
Judgments were obtained for a subset of six problems. Table 6 shows alternative assessments of accuracy for these
problems.
After subjects had estimated the value for each of the six problems, we asked them to indicate the
probability that their estimate was within 10% of the correct answer. These results are also presented in Table 6.
Finally, we calculated the interquartile ranges of the global estimate for each problem, shown in the last column of
Table 6.
With the exception of the interquartile range, the different approaches to subjective confidence produced
similar results. The intercorrelations among the three measures across the six problems were all over 0.99. Given the
close correspondence among the three measures, they were expected to be of roughly equal value in deciding when
to use decomposition.
We applied the same procedures to subjects who received the decomposed versions of the problems.
Across all six problems, subjects had higher self-ratings of problem knowledge in the global condition than in the
decomposition condition. Because subjects in the decomposition condition received more than one estimation
problem, their self-ratings of problem knowledge may have been influenced by the difficulties they experienced
9
with the complexity of the problem. This was also the case for self-ratings of accuracy, except for the Bushels of
wheat problem. Similar results were obtained when we asked the questions about confidence after subjects had
completed their estimates. In other words, the different assessments each led to the conclusion that subjects in the
decomposition condition thought that the problems were more difficult than did subjects in the global estimation
condition. These results agree with the findings of Sniezek et al. (1990), who had concluded that the increased
processing (for decomposed problems) leads to a reduction in confidence. In retrospect, it might have been better for
us to have asked for estimates of the difficulty for each of the parts. Henrion et al. (1993) did this, and their subjects
reported that the components were easier to estimate than the global value.
Are subjects better calibrated when they use decomposition? Probability assessments are said to be
externally calibrated if, for a given probability assessment (e.g., 0.6), exactly that proportion (e.g., 60%) turn out to
be correct. We summarized the calibration results for global and decomposed estimates, across all ten problems.
Mean probability assessments were generally higher than the proportion correct for both approaches, indicating
overconfidence. On average, those making global estimates expected 38.9% of their answers to be within 10% of the
true value, but only 10.9% were that accurate. Those using the decomposed approach expected 32.6% of their
estimates to be within 10% of the true value, but only 9.0% were that accurate. In effect, decomposition reduced
overconfidence from 28.0% in the global case to 23.6% for decomposition, with the largest reduction occurring in
those situations where subjects felt most confident, as shown in Fig. 1.
5.3. Limitations
Two of the four problems in the not extreme version (Argentine immigrants and Bank failures) involved
elements with extreme values. Because each of the components had an element dealing with the U.S. population, we
assumed that the subjects would be familiar with these values. To examine this assumption, we analyzed the popu-
lation estimates for each of the problems. The median population estimate for the Argentine immigrants problem
was in error by a factor of 1.97 from the actual, while for Bank failures it was in error by a factor of 1.42. For both
problems, errors for the U.S. population component were less than errors for the global quantities. Nevertheless, we
were surprised at the difficulty individuals had with estimating this value. In practical problems, of course, one
could simply use the actual value. In their study of decomposition, Henrion et al. (1993) gave the U.S. population
value to the subjects.
Fig. 1. Calibration of probability assessments that estimated answer is within 10% of true answer.
The issue of `how extreme is extreme' has not been resolved. We proposed a definition based on the
number of digits (six or seven), but we did not examine alternatives. Nor did we resolve the issue of how to specify
the unit of measure.
We expect that other conditions might affect decisions on when to use decomposition. For example,
question type may have some importance. We do not know the extent to which our problem selection may have
affected findings.
6. Discussion
Despite the improved accuracy it afforded, decomposition did not increase subjects' confidence in the
accuracy of their estimates. However, the interquartile estimates were smaller for the decomposed estimates and
confidence in the accuracy of estimates was slightly more appropriate.
Perceived uncertainty measures are easy to obtain. As shown in Table 6, self-assessments of uncertainty
provided similar rankings of the relative uncertainty for the problems. The interquartile ranges provided somewhat
different information than the self-assessments. Interquartile ranges of the estimates are not expensive, but they do
require a pretest.
The present study addresses the issue of whether estimates by individuals can be improved when no other
data are available. However, we expect that other situational characteristics or estimation-aiding strategies would
also affect the usefulness of decomposition. For example, a forecaster could decompose a problem to use different
sources of information or different experts. For some parts of the problem, known values may exist. Alternative
decomposition methods could be used to produce an estimate, and resulting values for a quantity could be resolved
in light of one another. MacGregor and Liehtenstein (1991) attempted such an approach and found that subjects
tended to resolve estimates by applying an averaging model. Revised estimates generally fell between two estimates
of a target quantity, where each judgmental estimate was produced by a different method.
Although our approach to decomposition was harmful for problems that did not involve extreme - uncertain
numbers, there might be alternative approaches that are successful. For example, decomposition might restructure a
problem so that it is easier for subjects to think about.
Decomposition tended to reduce estimators' confidence levels, perhaps because of the increased processing
involved. This reduction in overconfidence and the improvements in accuracy produced modest gains in calibration.
7. Conclusions
The theory behind decomposition is simple. What is difficult is how to translate the theory into operational
terms. We examined some operational procedures for identifying conditions under which decomposition should
improve accuracy.
Extreme uncertain values are difficult for subjects to estimate. We hypothesized that decomposition to
remove extreme values would improve estimation accuracy. This study examined nine “extreme value-high
uncertainty” problems from two prior studies. Decomposition proved useful for each of these nine problems, and the
typical gain in accuracy was substantial (error ratio was reduced by 96.3 for the study with six problems, and by
12.3 for the study with three problems). In the present study, involving six problems with extreme values, the error
ratio was reduced by a factor of almost 20.6 Decomposition failed for one extreme problem because it was not
successful in producing more accurate estimates of the parts.
6 The results from Hora et al. (1993) also are consistent with our hypothesis. They found that decomposition was
more accurate than global estimates for three quantities whose true values had at least eight digits (e.g., What were
the sales for Long’s Drug Stores in Hawaii in 1986?).
Decomposition was risky for problems that did not involve extreme and uncertain values. For six such
problems from two prior studies, decomposition had little overall effect on accuracy. However, for four such
problems in the current study, decomposition yielded less accurate estimates by an average error ratio of 458%.
Based on the limited evidence to date, we suggest the following procedure for judgmental decomposition.
First, assess whether the target value is subject to much uncertainty by using either a knowledge rating or an
accuracy rating. If the problem is an important one, obtain interquartile ranges. For those items rated above the
midpoint on uncertainty (or above 10 on the interquartile range), conduct a pretest with 20 subjects to determine
whether the target quantity is likely to be extreme. If the upper quartile geometric mean has seven or more digits, de-
composition should be considered. For these problems, compare the interquartile ranges for the target value against
those for the components and for the recomposed value. If the ranges are less for the global approach, use the global
approach. Otherwise use decomposition.
The current study suggests that decomposition has more limited value that previously thought. It improved
accuracy only when the situation involved uncertain and extreme quantities. Furthermore, decomposed elements
needed to be easier to estimate than the global. For problems that did not concern extreme values with high
uncertainty or where estimates of the parts were not more accurate than that of the target value, decomposition
produced less accurate estimates.
Acknowledgements
This research was supported in part by the National Science Foundation under Contract SES-9013069 to
Decision Research. Fred Collopy, George Loewenstein, Robin Hogarth and unidentified referees provided helpful
comments on early drafts. Jennifer L. Armstrong, Suzanne Berman, Gina Bloom, Vanessa Lacoss, Phan
Lam and Leisha Mullican provided editorial assistance.
References
Armstrong, J.S., W.B. Denniston and M.M. Gordon (1975), "The use of the decomposition principle in making
judgments," Organizational Behavior and Human Performance, 14, 257-263.
Aschenbrenner, K.M. and W. Kasubck (1978), "Challenging the Cushing syndrome: Multiattribute evaluation of
cortisone drugs," Organizational Behavior and Human Performance, 22, 216-234.
Henrion, M., G.W. Fischer and T. Mullin (1993), “Divide and conquer? Effects of decomposition on the accuracy
and calibration of subjective probability distributions,” Organizational Behavior and Human Performance,
55, 207--227.
Hertzberg, H. (1970), One Million. Simon and Schuster, New York.
Hora, S.C., N.G. Dodd and J.A. Hora (1993), "The use of decomposition in probability assessments of continuous
variables," Journal of Behavioral Decision Making, 6, 133-147.
MacGregor, D.G. and S. Lichtenstein (1991), "Problem structuring aids for quantitative estimation," Journal of
Behavioral Decision Making, 4, 101-116.
MacGregor, D.G., S. Lichtenstein and P. Slovic (1988), "Structuring knowledge retrieval: An analysis of
decomposed quantitative judgments," Organizational Behavior and Human Decision Processes, 42,
303-323.
Raiffa, H. (1968), Decision Analysis. Princeton University Press, Princeton, New Jersey.
Rosenthal, R. (1978), "Combining results of independent studies," Psychological Bulletin, 85, 185-193.
Sniczek, J.A., P.W. Paese and F.S. Switzer, III (1990), "The effect of choosing on confidence in choice,"
Organizational Behavior and Human Decision Processes, 46, 264-282.
... Dit voorkomt onder andere de nadelige effecten van groupthink (Vlek, 2013a). Ook het opdelen van inschattingen in subonderdelen -zeker in het geval van risico's met veel onzekerheidkan helpen de inschattingen te doen verbeteren (MacGregor, & Armstrong, 1994). Modellen die forceren dat risicocalculaties expliciet worden gemaakt door experts, voorkomen ook dat fouten worden gemaakt in inschattingen en gevolgstrekkingen (Hubbard, & Seiersen, 2016). ...
Thesis
Full-text available
Bij de huidige risicobeoordelingsmethode voor vitale infrastructuur (de NLNRB) kunnen methodologische kanttekeningen worden geplaatst. Deze komen voornamelijk voort uit de kwalitatieve benaderingswijze en het subjectieve risicobegrip. In dit artikel worden vijf ontwerpprincipes opgesteld waaraan een methodologisch verantwoorde risicobeoordelingsmethode zou moeten voldoen. Vervolgens wordt er een voorstel gedaan voor een concrete, alternatieve risicobeoordelingsmethode die aan deze voorwaarden voldoet. Deze kwantitatieve methode maakt gebruik van een geobjectiveerd risicobegrip, een Bayesiaanse benaderingswijze, een vaste risiconormering van een 10 −6 overlijdenskans per jaar voor verwaarloosbaar risico, maatschappelijke kosten-batenanalyses en democratische besluitvorming voor de totstandkoming van een ethische verdeling van risico's. In een casus wordt de werking van de voorgestelde methode geïllustreerd.
... We demarcate two contingen- cies that help explain why task decomposition can become less effective than a direct decision making approach. While MacGregor and Armstrong (1994) suggest that decomposition works best for extreme and uncertain values, we show that this statement depends on the degree of inherent uncertainty in the decision problem and the structure of decompo- sition. In our context, an increase in uncertainty of the underlying demand series was associated with a decrease in the efficacy of task decomposition. ...
Article
We conduct three behavioral laboratory experiments to compare newsvendor order decisions placed directly to order decisions submitted in a decomposed way by soliciting point forecasts, uncertainty estimates, and service-level decisions. Decomposing order decisions in such a way often follows from organizational structure and can lead to performance improvements compared with ordering directly. However, we also demonstrate that if the critical ratio is below 50%, or if the underlying demand uncertainty is too high, task decomposition may not be preferred to direct ordering. Under such conditions, decision makers are prone to set service levels too high or to suffer from excessive random judgment error, which reduces the efficacy of task decomposition. We further demonstrate that if accompanied by decision support in the form of suggested quantities, task decomposition becomes the better-performing approach to newsvendor decision making more generally. Decision support and task decomposition therefore appear as complementary methods to improve decision performance in the newsvendor context.
... Even some researchers that advocate the intuitive mode of decision making, embrace this position and maintain that explicit analytical deliberation and strict adherence to rules are mandatory for symbolic numerical operations [9,18]. The dominance of the analytical mode in numerical processing has been relatively unchallenged, as only few studies directly compared intuitive and analytical numerical evaluations [16], and most of the studies that have actually contrasted the two thinking modes on the same numerical evaluation tasks have found that analytical deliberation leads to more accurate evaluations [19][20][21][22][23]. Yet, some researchers highlight the importance of numeric intuition. ...
Chapter
Full-text available
Numbers play a major role in decisions about vital life issues. This study compared the relative advantage of analytical vs. intuitive numerical processing in numerical average evaluations, while varying information load, complexity of the task and the information presentation formats. Thinking manipulation was based on Dehaene’s [5] model, which postulates two pathways for the numerical processing. The complexity level of the task was manipulated by varying the number of items to be averaged. The information presentation format were simultaneous vs. sequential. When few numbers were presented, analytical evaluations were more accurate. When task complexity increased and a sequential presentation was used, intuitive evaluations were more accurate. The results challenge the common position that analytical thinking is always advantageous in numerical evaluations, suggesting instead that the relative efficiency of each thinking mode is mediated by task’s factors. The cognitive mechanisms that might underlie our results are discussed.
... Although the distinction between difficult and easy tasks is somewhat artificial and subjective to the knowledge base of the individual, the fact remains that people are generally poorly calibrated. Attempts to correct this miscalibration, and other biases have included instructions and guidance about base rates Tversky 1973, Mandel 2015), about adjusting to new information (Sanders 1992), about combining multiple perspectives including contradictory evidence (Herzog and Hertwig 2011), about doing postmortems after feedback (Fischhoff 2001), about decomposing complex problems (MacGregor andArmstrong 1994, Ravinder et al. 1988), and about using scenarios analysis (O'Hagan et al. 2006). People cannot suppress 4 Decision Analysis, Articles in Advance, pp. ...
Article
We use results from a multiyear, geopolitical forecasting tournament to highlight the ability of the contribution weighted model [Budescu DV, Chen E (2015) Identifying expertise to extract the wisdom of crowds. Management Sci. 61(2):267-280] to capture and exploit expertise. We show that the model performs better when judges gain expertise from manipulations such as training in probabilistic reasoning and collaborative interaction from serving on teams. We document the model's robustness using probability judgments from early, middle, and late phases of the forecasting period and by showing its strong performance in the presence of hypothetical malevolent forecasters. The model is highly cost-effective: it operates well, even with random attrition, as the number of judges shrinks and information on their past performance is reduced.
... It is typically assumed that people with a more analytical processing style exhibit better decision making (Kahneman and Frederick, 2002;Kahneman, 2011). In line with this claim, studies assessing performance on numerical tasks have found that analytical rulebased deliberation improves the accuracy and consistency of computations compared to reliance on intuition (MacGregor et al., 1988;MacGregor and Armstrong, 1994;McMackin and Slovic, 2000;Beilock and Decaro, 2007;Rusou et al., 2013). For instance, Mikels et al. (2013) showed that older adults experiencing a decline in analytical abilities were more prone to the ratio bias than their younger counterparts. ...
Article
Full-text available
A framework is presented to better characterize the role of individual differences in information processing style and their interplay with contextual factors in determining decision making quality. In Experiment 1, we show that individual differences in information processing style are flexible and can be modified by situational factors. Specifically, a situational manipulation that induced an analytical mode of thought improved decision quality. In Experiment 2, we show that this improvement in decision quality is highly contingent on the compatibility between the dominant thinking mode and the nature of the task. That is, encouraging an intuitive mode of thought led to better performance on an intuitive task but hampered performance on an analytical task. The reverse pattern was obtained when an analytical mode of thought was encouraged. We discuss the implications of these results for the assessment of decision making competence, and suggest practical directions to help individuals better adjust their information processing style to the situation at hand and make optimal decisions.
Preprint
We introduce Forecasting Argumentation Frameworks (FAFs), a novel argumentation-based methodology for forecasting informed by recent judgmental forecasting research. FAFs comprise update frameworks which empower (human or artificial) agents to argue over time about the probability of outcomes, e.g. the winner of a political election or a fluctuation in inflation rates, whilst flagging perceived irrationality in the agents' behaviour with a view to improving their forecasting accuracy. FAFs include five argument types, amounting to standard pro/con arguments, as in bipolar argumentation, as well as novel proposal arguments and increase/decrease amendment arguments. We adapt an existing gradual semantics for bipolar argumentation to determine the aggregated dialectical strength of proposal arguments and define irrational behaviour. We then give a simple aggregation function which produces a final group forecast from rational agents' individual forecasts. We identify and study properties of FAFs and conduct an empirical evaluation which signals FAFs' potential to increase the forecasting accuracy of participants.
Chapter
We argue that adversarial risk analysis may be incorporated into the structured expert judgement modelling toolkit for cases in which we need to forecast the actions of competitors based on expert knowledge. This is relevant in areas such as cybersecurity, security, defence and business competition. As a consequence, we present a structured approach to facilitate the elicitation of probabilities over the actions of other intelligent agents by decomposing them into multiple, but simpler, assessments later combined together using a rationality model of the adversary to produce a final probabilistic forecast. We then illustrate key concepts and modelling strategies of this approach to support its implementation.
Chapter
In 1954 and 1961 Ward Edwards published two seminal articles that created behavioral decision research as a new field of study in psychology (Edwards, 1954, 1961). The topics of this research include how people make decisions and how these decisions can be improved with tools and training. Behavioral decision research has been conducted in two distinct paradigms: the cognitive illusions paradigm and the engineering psychology paradigm. Both are, in different ways, relevant to decision analysis.
Chapter
Suppose you are a manager who must decide which of your employees to promote to an important, decision-making position. You chose your two top performers as candidates and evaluate their decision-making styles. One, you discover, makes decisions with logic. She collects the relevant information, analyzes the options, and assesses the uncertainties. Her choices nearly always produce good results. Occasionally, however, things have turned out less than perfect because of circumstances impossible to foresee. The other candidate, you find, has had a string of remarkable successes, but bases all of his choices on the flip of a “special” 1964 quarter that was left in his office by a previous employee. What should you do?
Article
I discuss evidence that supports several of the principles put forward in the paper by Armstrong, Green, and Graefe (AGG), but argue that the packaging of these principles as a single “golden rule”’ and the use of the term “conservative” may lead to misunderstandings. Additional work should be carried out to investigate the extent to which these principles should be applied to probability and interval forecasts. Finally, good reasons may support why “rational” forecasters behave in ways that are inconsistent with the guidelines AGG provide in their golden-rule checklist.
Article
Full-text available
One hundred and fifty-one subjects were randomly divided into two groups of roughly equal size. One group was asked to respond to a decomposed version of a problem and the other group was presented with the direct form of the problem. The results provided support for the hypotheses that people can make better judgments when they use the principle of decomposition; and that decomposition is especially valuable for those problems where the subject knows little. The results suggest that accuracy may be improved if the subject provides the data and the computer analyzes it, than if both steps were done implicitly by the subjects.
This study demonstrates the applicability of multiattribute utility theory to the improvement of medical therapy decisions. Certain illnesses (e.g., primary chronic polyarthritis or asthma bronchiale) call for a long lasting therapy with cortisone. This might, however, result in several serious side effects. Several alternative chemical derivatives of cortisone cause these side effects with differing severity and frequency. The problem is to decide which expected side effect combination is the “least evil.” For this purpose, five experienced physicians estimated dangerousness functions and importance weights for the six side effects which were used as attributes. Two multiattribute utility procedures were applied, an indifference curve method and a conditional importance rating procedure. The results were used to evaluate the seven most commonly used cortisone drugs. Comparison of these multiattribute evaluations with intuitive holistic evaluations of the drugs showed that the physicians' multiattribute judgments agreed nearly perfectly and much more than their global evaluations of the alternatives. Further, among the multiattribute methods the simpler rating procedure yielded the more reliable results.
Article
Illustrates the use of one-tailed tests in combining results of independent probabilities. The distinction is made between the use of the tails of (a) those distributions with lower bounds of zero and (b) those that are symmetrical about zero. (7 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Describes methods for combining the probabilities obtained from 2 or more independent studies. The reporting of an overall estimated effect size to accompany the overall estimated probability is recommended. (49 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
An important part of decision making in many contexts is the estimation of numerical values for uncertain quantities, such as the projected costs of a development project or the number of people who use illegal drugs. In previous research, estimation accuracy for such quantities was found to be improved by algorithmic decomposition. The present study examines (a) the estimation performance of individuals using extended algorithms in which component estimates are produced by multiple methods, and (b) the effectiveness of algorithms produced by individuals after receiving training in algorithmic decomposition. The extended algorithm approach yielded some improvement in estimation performance. Subjects trained in algorithmic decomposition were able to produce algorithms, the effectiveness of which were dependent upon the presence of misinformation about components of the quantity to be estimated. The results are discussed in terms of the information processing demands imposed by detailed problem structuring.
Article
In this study, decomposition is used as a tool for the assessment of continuous probability distributions. The goal of using decomposition is to obtain better calibrated probability distributions by expressing a variable as a known function of several component variables. Three target quantities were used in the study. Each was assessed holistically and using two different decompositions. Thus, each subject provided three distributions for each of the three target quantities. The recomposed assessments were compared to holistic assessments. The distributions obtained using decomposition were found to be much better calibrated than those obtained holistically. Two methods of aggregating distributions from multiple subjects were also examined. One involves aggregating (averaging) distributions before recomposing while the second method involves recomposing and then averaging distributions for the target variable. The second method was found to be slightly better, although both showed better calibration than was found in the individual assessments.
Article
Overconfidence has been widely observed in difficult judgment and choice tasks. That is, the subjective probabilities of correctness that people assign to their answers to difficult general knowledge questions are too high relative to actual proportions correct. The present study was designed to gain insight into the overconfidence phenomenon through manipulations of the choice and confidence assessment process. Results for three different measures of confidence show that manipulations of the choice process affected observed confidence levels. The effects of choosing on confidence suggest that overconfidence is most likely to be severe in spontaneous, less contemplated, choices. Suggestions are made for further research on (a) the relationship of choice to confidence, and (b) additional operationalizations of confidence.
Article
Subjects were asked to estimate the answers to 16 questions concerning uncertain quantities like, “How many people are employed by hospitals in the U.S.?” under five different aiding conditions. The most-aided group (Full Algorithm) was given a complete algorithm and asked to make estimates for all the parts of the algorithm and to combine the parts as indicated to arrive at an estimate of the desired quantity. The second group (Partial Algorithm) was given the same algorithm without indications of how to combine the parts. After making estimates of the parts, these subjects then estimated the desired quantity. The third group (List & Estimate) was asked to list components or factors they thought were relevant, make an estimate of each item on their list, and then estimate the desired quantity. The fourth group (List) was asked to make such a list, but they were not asked to make estimates of each item before making an estimate of the desired quantity. The fifth group received no aid. The results generally showed improved performance in terms of both accuracy and consistency across subjects with increasing structure of the aid. Generalization of these results to practical estimation situations is possible but limited by the need, in real situations, for the estimator to develop the algorithm, a task that was done here by the experimenters.
Article
This research tests the divide and conquer principle of decision analysis in the context of assessing subjective probability distributions (SPDs) for continuous quantities. In the Direct Assessment condition, subjects directly estimated five fractiles of an SPD for each of a set of uncertain almanac quantities. In two decomposition conditions, they assessed fractiles for a set of components for each quantity. In the Experimenter′s Decomposition condition, the component variables were specified by the experimenter, assessed by the subject, then combined according to an algorithm provided by the experimenter. In the Subject′s Decomposition condition, the components were identified and assessed by the subject, then combined using an algorithm generated by the subject. Contrary to the divide and conquer principle, decomposition did not significantly affect either the accuracy of assessed medians or the calibration of the SPDs. Nor was there a difference between the accuracy of medians or calibration of the experimenter′s and subjects′ decompositions. Instead, decomposition changed a bias towards underestimating uncertain quantities in the Direct Assessment condition into a bias towards overestimating them in both decomposition conditions. Similarly, direct assessment produced many high surprises (outcomes above the 99th fractile) whereas decomposed assessment resulted in many low surprises (outcomes below the 1st fractile). These findings point to the need for a more extensive empirical examination of the divide and conquer principle and its associated biases.
Combining results of independent studies The effect of choosing on confidence in choice
  • R Rosenthal
Rosenthal, R. (1978), Combining results of independent studies, Psychological Bulletin, 85, 185-193. r12 Sniczek, J.A., P.W. Paese and F.S. Switzer, III (1990), The effect of choosing on confidence in choice, Organizational Behavior and Human Decision Processes, 46, 264-282.