ArticlePDF Available

Abstract and Figures

The usual procedure for developing linear models to predict any kind of target variable is to identify a subset of most important predictors and to estimate weights that provide the best possible solution for a given sample. The resulting “optimally” weighted linear composite is then used when predicting new data. This approach is useful in situations with large and reliable datasets and few predictor variables. However, a large body of analytical and empirical evidence since the 1970s shows that such optimal variable weights are of little, if any, value in situations with small and noisy datasets and a large number of predictor variables. In such situations, which are common for social science problems, including all relevant variables is more important than their weighting. These findings have yet to impact many fields. This study uses data from nine U.S. election-forecasting models whose vote-share forecasts are regularly published in academic journals to demonstrate the value of (a) weighting all predictors equally and (b) including all relevant variables in the model. Across the ten elections from 1976 to 2012, equally weighted predictors yielded a lower forecast error than regression weights for six of the nine models. On average, the error of the equal-weights models was 5% lower than the error of the original regression models. An equal-weights model that uses all 27 variables that are included in the nine models missed the final vote-share results of the ten elections on average by only 1.3 percentage points. This error is 48% lower than the error of the typical, and 29% lower than the error of the most accurate, regression model.
Content may be subject to copyright.
1 of 20
Improving forecasts using equally weighted predictors
Forthcoming (with changes) in the Journal of Business Research
Andreas Graefe
LMU Research Fellow
Center for Advanced Studies
Department of Communication Science and Media Research
LMU Munich, Germany
Abstract. The usual procedure for developing linear models to predict any kind of
target variable is to identify a subset of most important predictors and to estimate weights that
provide the best possible solution for a given sample. The resulting “optimally” weighted
linear composite is then used when predicting new data. This approach is useful in situations
with large and reliable datasets and few predictor variables. However, a large body of
analytical and empirical evidence since the 1970s shows that the weighting of variables is of
little, if any, value in situations with small and noisy datasets and a large number of predictor
variables. In such situations, including all relevant variables is more important than their
weighting. These findings have yet to impact many fields. This study uses data from nine
established U.S. election-forecasting models whose forecasts are regularly published in
academic journals to demonstrate the value of weighting all predictors equally and including
all relevant variables in the model. Across the ten elections from 1976 to 2012, equally
weighted predictors reduced the forecast error of the original regression models on average by
four percent. An equal-weights model that includes all variables provided well-calibrated
forecasts that reduced the error of the most accurate regression model by 29% percent.
Keywords: equal weights, index method, econometric models, presidential election
Acknowledgements: J. Scott Armstrong and Alfred Cuzán provided helpful
2 of 20
1. Introduction
People and organizations commonly make decisions by combining information from
multiple inputs. For example, one usually weighs the pros and cons before deciding on
whether or not to launch a marketing campaign, which new product to develop, or where to
open a branch office. Almost 250 years ago, Benjamin Franklin suggested an approach for
how to solve such problems. Franklin’s friend Joseph Priestley asked for advice on whether
or not to accept a job offer that would have involved moving with his family from Leeds to
Wiltshire. In his response letter, written on September 19, 1772, Franklin avoided advising
Priestley on what to decide. Instead, he proposed a method for how to decide. Franklin’s
recommendation was to list all important variables, decide which decision is favored by each
variable, weight each variable by importance, and then add up the variable scores to see
which decision is ultimately favored. Franklinlabeled this approach “Moral Algebra, or
Method of deciding doubtful Matters” (Sparks, 1844, p. 20). About half a century later,
Franklin’s method had another famous proponent. In 1838, Charles Darwin used it to help
him answer a question of utmost importance: whether or not to get married (Darwin,
Burkhardt, & Smith, 1986).
Franklin’s Moral Algebra gave way to multiple regression analysis, which has
become popular for solving many kinds of problems in various fields. Multiple regression
analysis produces variable weights that yield the “optimal” (in terms of least squares) solution
for a given data set. The estimated regression coefficients are then commonly used to weight
the composite when predicting new (out-of-sample) data. The problem with this data fitting
approach is that it does not necessarily yield accurate forecasts. A large body of empirical and
theoretical evidence since the 1970s shows that regression weights often provide less accurate
out-of-sample forecasts than simply assigning equal weights to each variable in a linear
model (Dawes, 1979; Dawes & Corrigan, 1974; Einhorn & Hogarth, 1975). These results
have yet to impact many fields, including business research. Researchers rarely evaluate the
quality of their models by predicting holdout data and most JBR submissions report the model
fit as the only indication of a good model (Woodside, 2013).
I review the literature on the relative predictive performance of equal and regression
weights and provide new evidence from the field of U.S. presidential election forecasting, a
field that is dominated by the application of multiple regression analysis. The results conform
to prior research, showing that equal weights perform at least as well as regression weights
when forecasting new data. In addition, I show that including all relevant variables in an
equal-weights model yields large gains in accuracy.
3 of 20
2. Equal and regression weights in linear models
This section reviews prior research on the relative performance of equal and
regression weights and discusses the conditions under which either approach is expected to
work best.
2.1 Multiple regression models
As mentioned above, multiple regression analysis is the dominant method to develop
forecasting models in many fields. Once theory is used to select the k relevant predictor
variables, multiple regression analysis estimates their relative impact on the target criterion.
The general equation of the multiple regression model reads as:
The estimated constant a and the k “optimal” (in terms of minimized squared error)
regression coefficients bi are then used when predicting new data.
2.2 Equal-weights models
An alternative to using multiple regression is to assign equal weights to each variable.
That is, one also relies on theory to select the variables. However, one does not let the data
decide about the variables’ weights. Instead, one uses prior knowledge to assess the
directional effects of the variables and then transforms all variables so that they positively
correlate with the target variable. In the final step, the values of all variables are added up to
calculate the single predictor variable in a simple linear regression model, hereafter, the
equal-weights model:
where d is the estimated constant, g is the estimated coefficient of the predictor
variable, and v is the error term.
2.3 Differences between multiple regression and equal-weights models
The multiple regression and the equal-weights model differ in the number of
parameters to be estimated. The multiple regression model estimates k+1 parameters: the
constant a and each variable’s coefficient bi. The equal-weights model is a special case of the
multiple regression model with all bi’s = g. That is, the equal-weight method only needs to
estimate two parameters (d and g).
4 of 20
The number of estimated parameters is crucial to a model’s predictive performance,
since their estimation inevitably creates error. While adding more variables will generally
improve a model’s fit to existing data, the danger of overfitting increases. Overfitted models
tend to exaggerate minor fluctuations in the data by interpreting noise as information. As a
result, the models’ performance for predicting new data decreases. The equal-weights model
uses as few degrees of freedom as possible and thus minimizes estimation error. The relative
performance of multiple regression and equal-weights models for the same data then depends
on the accuracy of the estimated coefficients. See Einhorn and Hogarth (1975) for a more
detailed discussion.
2.4 Empirical evidence on the relative performance of multiple regression and
equal-weights models
Starting at least as early as Schmidt (1971), a number of studies have tested the
relative predictive accuracy of equal and regression weights when applied to the same data.
Many of these studies analyzed unit weights, which are a special case of equal weights in
which each variable is assigned a value of plus or minus one.
An early review of the literature finds multiple regression to be slightly more accurate
than equal weights in three studies but less accurate in five (Armstrong, 1985, p. 208). Since
then, evidence has accumulated. Czerlinski, Gigerenzer, and Goldstein (1999) test the
predictive performance of regression and equal weights for twenty real-world problems in
areas such as psychology, economics, biology, and medicine. Most of these tasks were
collected from statistics textbooks where they were used to demonstrate the application of
multiple regression analysis. Ironically, equal weights provided more accurate predictions
than multiple regression. Cuzán and Bundrick (2009) analyze the relative performance of
equal and regression weights for forecasting U.S. presidential elections. The authors find that
equal-weights versions of the Fair (2009) model and of two variations of the fiscal model
(Cuzán & Heggen, 1984) outperformed two of the three regression models and did equally
well as the third when making out-of-sample predictions.
Such findings have led researchers to conclude that the weighting of variables is
secondary for the accuracy of forecasts. Once the relevant variables are included and their
directional impact on the criterion is specified, the magnitudes of effects are not very
important (Armstrong, 1985, p. 210; Dawes, 1979). As Dawes and Corrigan (1974, p. 105)
put it in their seminal work on that topic: “The whole trick is to decide which variables to
look at and then to know how to add.”
5 of 20
2.5 Conditions for the relative performance of multiple regression and equal-
weights models
The relative performance of equal and regression weights depends on the conditions
of the forecasting problem. Analytical solutions to the problem derived several conditions for
when equal weights can outperform regression weights when predicting new data (Davis-
Stober, Dana, & Budescu, 2010; Einhorn & Hogarth, 1975). These conditions are common
for many problems in the social sciences. In general, the relative performance of equal
weights increases if
1. the regression model fits the data poorly
(i.e., the multiple correlation coefficient R2 is low),
2. the ratio of observations per predictor variable is low
(i.e., in situations with small samples and a large number of predictor variables),
3. predictor variables are highly correlated, and
4. there is measurement error in the predictor variables.
Empirical studies yield similar conclusions. Dana and Dawes (2004) analyze the
relative predictive performance of regression and equal weights for five real non-
experimental social science datasets and a large number of synthetic datasets. They find that
regression weights do not yield more accurate forecasts than equal weights unless sample size
is larger than one hundred observations per predictor. Only in cases in which prediction error
was likely to be very small (adjusted R2>.9), the authors found regression to outperform equal
weights in samples with five observations per predictor.
3. Models for forecasting U.S. presidential elections
The development of quantitative models to predict the outcome of elections is a well-
established sub-discipline of political science. Since the late 1970s, scholars have developed
various versions of election forecasting models. Table 1 shows the specifications of nine
models, including the variables used, their first election forecasted, the sample period, and the
model fit. The figures are based on data up to but not including the 2012 election. That is,
the model specifications show the situation that the forecasters faced prior to the 2012
Eight of the nine models are described in PS: Political Science & Politics 45(4). The
latest specification of the Fair model is described in Fair (2009). Also, note that the
specification of the Abramowitz model differs from the author’s description in PS: Political
Science & Politics 45(4). In his article, Abramowitz proposed a revised model with one
additional variable. However, at the SPSA 2013 meeting in Orlando, Abramowitz indicated
6 of 20
that he would likely return to his old model in the future. Therefore, the present analysis stays
with the established “time-for-change” model.
Each of these models is estimated using multiple regression analysis, with the
incumbent popular two-party vote as the dependent variable and two or more independent
variables derived from theory. For example, it is well known that elections can be viewed as
referenda about the government’s performance or, more narrowly defined, its ability to handle
the economy. That is, voters reward the government for good performance and punish the
incumbent party otherwise. Most models incorporate this information by using one or more
economic variables (e.g., GDP growth or job creation) to measure economic performance.
Other popular measures are presidential popularity, which is commonly seen as a proxy
variable for measuring the incumbent’s overall performance, and the time the incumbent
party has held the White House. The models are then used to test theories of voting, to
estimate the relative effects of specific variables on the aggregate vote, and, of course, to
forecast the election outcome.
The conditions for forecasting U.S. presidential elections suggest that the equal-
weights method should perform well compared to multiple regression. Although most of the
nine models listed in Table 1 fit the data fairly well, the number of observations per predictor
variable is low, the predictors are likely correlated, and forecasters have to deal with
measurement errors in the predictor variables.
Model fit. Models for forecasting U.S. presidential elections are able to explain much
of the variance in the two-party popular vote shares. Table 1 shows the fit of the nine models,
estimated using data up to and including the 2012 election. Seven of the nine models explain
more than 80% of the variance; one model (Cu) achieves an adjusted R2 above .9.
Ratio of observations per predictor variable. Seven of the nine models listed in
Table 1 use only data post-World War II. This means that these models were limited to
around fifteen observations when estimating the vote equation to predict the 2012 election
results. The two exceptions are the models by Fair and Cuzán, which start collecting data in
1916, and thus drew on a sample of twenty-four observations when calculating the forecast of
the 2012 election. The number of predictor variables differs across models. While four
models are based on two variables, the Fair model uses seven variables. Thus, when
calculating forecasts of the 2012 election, the ratio of observations per predictor ranged from
3.4 (F) to 8.0 (C). These ratios are far below what Dana and Dawes (2004) recommended
when using multiple regression. In addition, these ratios were of course lower for forecasts of
earlier elections.
Correlation among predictors. In most real-world forecasting problems, predictor
variables are likely correlated. This also holds for election forecasting. An example is the
combined use of economic indicators and public opinion polls (e.g., presidential popularity)
7 of 20
as predictor variables in the same model. Since presidential popularity is expected to serve as
a proxy for incumbent performance, the measure likely also captures the public’s perceptions
of how the president is handling the economy. Prior research supports this. Ostrom and Simon
(1985) find that presidential popularity is a function of both economic and non-economic
factors. Similarly, Lewis-Beck and Rice (1992, p. 46) show that GNP growth is correlated
with incumbent performance (r = .48). Five of the nine models listed in Table 1 use both
economic indicators and public opinion polls (A, C, EW, Ho, and LT).
Measurement error in independent variables. As shown in Table 1, economic
indicators and public opinion polls are major predictors in election forecasting models; both
measures are subject to measurement error.
First, the state of the economy is difficult to measure. Often, there are substantial
differences between initial and revised estimates of economic figures. For example, on
January 30, 2009, the Bureau of Economic Analysis at the U.S. Department of Commerce
initially estimated a real GDP decrease of 3.8 percent for the fourth quarter of 2008. One
month later, the figure was revised to 6.2 percent, and, at the time of writing, the latest
estimate showed a decrease of 8.9 percent. Revisions of this size are not exceptional. Runkle
(1998) analyzes deviations between initial and revised estimates of quarterly GDP growth
from 1961 to 1996. Revisions were common. There were upward revisions by as much as 7.5
percentage points and downward revisions by as much as 6.2 percentage points. Such
measurement errors are even more critical when different estimates are used for building a
model and calculating the forecast. For example, forecasters commonly use revised economic
figures to estimate the model. However, when making the forecast shortly before the election,
the forecasters have to draw on the initial estimates, since the revised figures are not yet
Second, polls conducted by reputable survey organizations at about the same time
often reveal considerable variation in results. Errors caused by sampling problems, non-
responses, inaccurate measurement, and faulty processing diminish the accuracy of polls and
the quality of surveys more generally (Erikson & Wlezien, 1999). Such measurement errors
can have a large impact on the validity of the estimated model coefficients and thus on the
accuracy of forecasting models.
4. Evidence on the accuracy of multiple regression and equal-weights models in
forecasting U.S. presidential elections
The following analysis extends prior work by Cuzán and Bundrick (2009) and tests
the predictive performance of equal and regression weights for the models listed in Table 1.
8 of 20
4.1 Method
All data and calculations are available at:
4.1.1 Models and data
The present study analyses forecasts from the nine models listed in Table 1. Data for
six models (A, Ca, EW, F, Hi, LT) were obtained from Montgomery, Hollenbach, and Ward
(2012) and enhanced with the variable values from the 2012 election. The data for the model
by Lockerbie were derived from Lockerbie (2012). Thanks to Alfred Cuzán and Thomas
Holbrook who shared their data.
For the purpose of this study it was necessary to perform some transformations on the
original data. Without any loss of generality, the data were transformed in standardized (z-
scores) format such that each predictor correlates positively with the dependent variable. The
dependent variable was the two-party popular vote received by the candidate of the
incumbent party.
4.1.2 Forecast calculations
All forecasts analyzed in the present study can be considered pseudo ex ante,
calculated as one-election-ahead predictions. That is, only data that would have been available
at the time of the particular election being forecast was used to estimate the model. For
example, to calculate a forecast for the 2004 election, only data up to the 2000 election was
used to estimate the model. To calculate a forecast of the 1984 election, only data up to 1980
was used, and so on.
The term “pseudo” reveals that these forecasts cannot be considered truly ex ante.
The reason is that all calculations are based on the models’ specifications that were used for
predicting the 2012 elections. In reality, however, the 2012 versions were often quite different
from the original specifications that were used to predict a particular election. Most models
have been revised at least once since their first publication, usually as a reaction to poor
performance in forecasting the previous election. Such revisions usually improve the fit of the
regression model to historical data. As a result, the pseudo ex ante forecasts tend to be more
accurate than what one would have obtained with the original model specifications that were
used in the actual elections. The only exception are the forecasts of the 2012 election, which
can be considered as “truly” ex ante, since they are only based on information that was
actually available at the time of making the forecast. The interested reader can track how most
of the model specifications have changed over time by referring to the forecasters’
manuscripts, which were published prior to each election since 1992 in special symposiums
of Political Methodologist 5(2), American Politics Research 24(4) and PS: Political Science
and Politics 34(1), 37(4), 41(4), and 45(4).
9 of 20
The multiple regression model forecasts represent the forecasts of the 2012 model
specifications. That is, multiple regression analysis was used to regress the incumbent party’s
popular two-party vote share on the set of independent variables included in each model (cf.
equation 1). The equal-weights model forecasts represent the forecasts of the equal-weights
variant of each model. This was done by summing up the scores of the predictor variables
incorporated in each model. The resulting equal-weights score was then used as the single
predictor variable in a simple linear regression model (cf. equation 2).
4.1.3 Forecast horizon and error measure
Forecast accuracy was analyzed across the ten U.S. presidential elections from 1976
to 2012. The absolute error was used to measure the absolute deviation of the forecast from
the actual election result. The error reduction was used to compare the relative performance
of forecasts based on equal and regression weights. The error reduction is simply the
difference between the absolute errors of the multiple regression and the equal-weights
forecasts. Negative values mean that regression weights provided more accurate forecasts
than equal weights. Positive values mean that equal weights outperformed regression weights.
4.2 Results
Table 2 shows the mean absolute error (MAE) of the multiple regression and the
equal-weights variants of each model, as well as their relative accuracy, measured as the error
reduction in percentage points (and in percent).
Across all ten elections from 1976 to 2012, there were little differences in the relative
accuracy of equal and regression weights. The equal-weights models were more accurate than
the multiple regression models in six cases (A, Cu, EW, F, Ho, L) and less accurate in three
cases (Ca, LT, Hi). On average across the nine models, the error of the equal-weights models
was 0.1 percentage points lower than the corresponding error of the regression models.
However, the multiple regression models have an advantage in this comparison. None
of the nine models was around in 1976, and some were developed not until the late 1990s
(e.g., Cu, Hi, L, LT). Therefore, forecasts for elections held before a model was developed
cannot even be considered pseudo ex ante. The reason is that information from subsequent
elections was used to select the variables when building the model. In order to somewhat
account for this problem, Figure 1 shows the MAE of the forecasts from the nine multiple
regression models and the corresponding equal-weights variants, both for elections before
each model was first developed and since its first application. The results suggest that
multiple regression models benefit from data fitting. When “predicting” elections that were
held before the model was created, regression weights were slightly more accurate than equal
weights. However, for elections held since the models were first created, the results are
10 of 20
reversed. The accuracy loss of regression models when predicting data that were unknown to
the forecasters at the time they decided about the model specifications is a sign for overfitting
(illustrated by the steep slope of the black line in Figure 1). In comparison, the slope for the
equal-weights models (grey line) is flatter. Since the equal-weights method only estimates
two parameters, the risk of overfitting is smaller. Equal-weights models are more robust and
do not suffer from large losses in accuracy when predicting new data.
4.3 Discussion
Equal weight versions of established multiple regression models for forecasting U.S.
presidential elections were found to predict as well as or better than the original models.
Thereby, it is important to emphasize that these results likely underestimate the gains in
accuracy that can be achieved by using equal instead of regression weights, since the analysis
was based on the 2012 specifications of the models (see Section 4.1.2). Thus, the results
should be regarded as a low boundary. The actual gains that one would have obtained by
using equal instead of regression weights at the time the forecast was made are likely to be
higher than the gains reported in Table 2.
These results may surprise given that all of the models that are established in the
political science community (i.e., the models that are regularly published in special issues of
political science journals) are regression models.1 However, the results conform to a large
body of research since the 1970s that provides empirical and analytical evidence in favor of
equal weights when making out-of-sample forecasts for social science problems. This work
concluded that, once the relevant variables are identified, the issue of how to weight variables
is not critical for forecast accuracy. Much of this research was done years before researchers
developed the first U.S. presidential election forecasting models. Since then, evidence
showing that equal weights often outperform regression for social science problems has
accumulated, also for the domain of election forecasting (Cuzán & Bundrick, 2009); the
present study adds more evidence.
Unfortunately, these findings had little impact on election forecasting thus far.
Although researchers increasingly demonstrate the usefulness and accuracy of equal-weights
models for election forecasting (Armstrong & Graefe, 2011; Graefe & Armstrong, 2013;
Lichtman, 2008), these findings are rarely published in political science journals.
1 This is not to say that there are no equal-weights models for forecasting U.S. presidential elections.
The most popular equal-weights model is Lichtman’s “Keys to the White House”, which is based on
thirteen variables. This model has a perfect record in predicting the winners of all 39 elections since
1860, the last eight elections since 1984 prospectively (Lichtman, 2008). Others used the index method
to develop models that predict the election outcome based on candidates’ biographical information
(Armstrong & Graefe, 2011) and voters’ perceptions of the candidates’ ability to handle the issues
(Graefe & Armstrong, 2013).
11 of 20
The present study does not mean to suggest that regression cannot be useful for
forecasting. Regression analysis is an important forecasting method and there is much
evidence that it can provide accurate predictions if used under appropriate conditions (Allen
& Fildes, 2001; Armstrong, 1985). The method is particularly useful in situations with large
reliable datasets, few variables that are based on well-established causal relationships with the
criterion and do not highly correlate with each other, and when the expected changes are large
and predictable (Armstrong, 2012).
The problem is that these conditions are rarely met when predicting social science
problems. Rather, the conditions often favor equally weighted predictors. Given its
demonstrated accuracy and obvious simplicity, it is surprising that the equal-weights method
has been widely overlooked, not to say ignored.
A common objection to the equal-weights method is that the use of equal weights is
considered unscientific or atheoretical. The likely reason for this objection is that the outputs
of equal-weights models do not conform to what users of regression analysis expect to see. In
particular, the equal-weights method does not estimate effect sizes and therefore cannot
provide answers to questions such as whether a variable has a statistically significant impact,
how large this effect is, or whether one variable is more important than another. Given that
users of regression analysis often argue that the main purpose of their model is not to forecast
but to test theory and to estimate the size of effects, this appears to be a major limitation of
the equal-weights method. But should we have faith in the validity of effect sizes that are
little or no more accurate than equal weights when predicting new data? And is not, after all,
the best test of a model’s validity its predictive accuracy?
Furthermore, users of regression analysis commonly assume that they can control for
the relative impact of variables that they put in the equation. However, this assumption only
holds for experimental data. For non-experimental data, variables often correlate with
(combinations of) other variables; a problem that gets worse when the number of variables
increases. Armstrong (2012) refers to this as the “illusion of control” in regression analysis.
He recommends that one should not estimate effect sizes from non-experimental data.
Instead, one should rely on experiments to estimate effect sizes and then incorporate this
information in the model.
Finally, it is a misperception that the equal-weights method prevents analysts from
using and testing theory. Rather, analysts need to draw on theory and prior knowledge when
they select and code variables. To some extent, the equal-weights method is more useful to
test theory than multiple regression, since it can include an unlimited number of variables and
does not let the data decide about the variables’ (directional) effect on the target criterion. The
possibility to include all relevant variables in a model is thus one of the major benefits of the
equal-weights method, and will be discussed in the following.
12 of 20
5. Including all available information
The above analyses are similar to prior research on the relative performance of equal
and regression weights in that they compare both methods by using the same data. However,
this comparison conceals a major advantage of the equal-weights method. While multiple
regression analysis is limited in the number of variables that can be included in a model (cf.
Section 2.5), the number of parameters that need to be estimated in an equal-weights model is
independent from the number of predictor variables (cf. equation 2). That is, with the equal-
weights method one can follow Benjamin Franklin’s advice and use all relevant variables.
The use of all relevant variables is also one of the guidelines in the Golden Rule of
Forecasting Checklist (Armstrong, Green, & Graefe, in press).
5.1 Method
To test this approach while holding the data set constant, the independent variables
were restricted to those that were used by the nine models analyzed above. As shown in Table
1, the nine models use a total of 30 variables. However, two models (F and Cu) use three
identical variables, which reduces the number of unique variables to 27. The sum of these 27
variable values was used as the single predictor variable in a simple linear regression model,
hereafter referred to as the index model, with the incumbent’s popular two-party vote share as
the dependent variable. The index model was estimated based on data starting in 1952, since
this is the first election for which data on all variables were available. Pseudo ex ante
forecasts were again calculated as one-election-ahead predictions.
5.2 Results
As shown in Table 3, the index model provided highly accurate forecasts. The index
model’s mean absolute error across the ten elections from 1976 to 2012 was 1.3 percentage
points. Compared to the individual models, error reductions ranged from 0.5 to 2.7 percentage
points. That is, the index model reduced the error of the most accurate individual model (Ca)
by 29%; compared to the least accurate model (L), error reduction reached 67%. Compared to
the typical model, the index model reduced the error by 48%.
Figure 2 shows the calibration of the forecasts from the index model and eight
individual models.2 The marker shows each model’s point forecast, the vertical lines show
their 95% prediction intervals, and the dashed horizontal line shows the actual election result.3
The index model is well calibrated. For nine of the ten elections, the election result falls
within the 95% prediction interval. The exception is the 2000 election, in which the index
model over-predicted Gore’s vote share by a small margin. In addition, the prediction
2 The data to calculate the prediction intervals for the Hibbs model were not available.
3 The 95% prediction intervals were calculated as twice the standard error.
13 of 20
intervals provided by the index model are narrow, which is an important quality criterion for a
forecast model. Across the ten elections, the average prediction interval is little larger than
five percentage points, which is the lowest value of all models. That is, if the index model
predicts the incumbent to gain 53 percent of the vote, there is a 95% chance that the actual
election result will be between 50.5 and 55.5 percent. In comparison, the prediction interval
for the second most accurate model (Ca) spans a range of almost eight percentage points,
which makes its forecasts more vague and thus less valuable. Prediction intervals for other
models are even wider.
5.3 Discussion
The index model reduced the error of the most accurate individual model by 29% and
cut the error of the typical model nearly in half. In addition, the index model forecasts were
well calibrated. These large gains in accuracy were achieved by aggregating all information
included in the individual models in a single index variable.
These results are consistent with prior research. Researchers have concluded from
comparative studies that having all relevant variables in a model is more important than the
“optimal” weighting of a set of variables (Dawes & Corrigan, 1974; Einhorn & Hogarth,
1975). The equal-weights method enables analysts to include an unlimited number of
variables in a model. This is the most important feature of this method when dealing with
situations in which there are many important variables; a situation that is common for social
science problems.
However, one might object to the equal-weights method because it can incorporate a
large number of variables. Parsimony is commonly regarded an important quality criterion of
a forecasting model (Lewis-Beck, 2005). However, parsimony is only crucial for methods
that need to estimate many parameters and thus bear the risk of overfitting, such as regression
analysis. In comparison, the number of variables is no concern for equal-weights models,
since the index method does not estimate multiple variable weights (cf. equation 2). In fact, as
demonstrated above, it is one of the major benefits of the equal-weights method to be able to
include all relevant knowledge.4
Finally, one might be concerned about unequal distribution of variables. For example,
the index model incorporates thirteen economic variables, six political variables, and eight
measures of public opinion. Thus, one might think that economic variables are
overrepresented, whereas public opinion polls are underrepresented. Here, it helps to think of
the index model as an index of indexes. For example, before calculating the single index
variable, one could sum up all economic variables in an economic index, all poll variables in
4 Note that the index model incorporates aspects of retrospective and prospective voting, the influence
of incumbency, the time-for-change effect, and military losses.
14 of 20
a poll index, all political variables in a political index, and so on. How one aggregates the
variable values does not matter; mathematically the results are the same.
The performance of the index model is heartening and the findings are relevant for
many applications, also in the field of business research. The index method is particularly
useful for situations with many important variables and if there is prior knowledge about the
directional impact of the variables on the target criterion. Prior knowledge can be obtained
from empirical evidence, expert knowledge, or, ideally, experimental studies (Graefe &
Armstrong, 2011). For example, one study develops an index model to predict the
effectiveness of advertisements from 195 evidence-based persuasion principles. When using
this model, advertising novices provided more accurate predictions of ad effectiveness than
experts’ unaided judgment (Armstrong, Rui, Graefe, Green, & House, 2012).
6. Concluding remarks
Benjamin Franklin’s advice was to identify all variables that are considered important
for the problem at hand. And, although he suggested weighting variables by importance, his
advice was pragmatic and simple: use intuition. In contrast to many contemporary
researchers, Franklin seemed to be little concerned about how to estimate optimal variable
weights. (With all due respect for Franklin’s “Moral Algebra”, however, the use of intuition is
likely to be harmful to the accuracy of results. The reason is that such an informal weighting
approach allows people to assign weights in a way that suits their biases (Graefe, Armstrong,
Jones Jr., & Cuzán, 2013).)
Time proved Franklin right. A large body of analytical and empirical evidence from
various fields found that the weighting of variables in linear models is uncritical for forecast
accuracy; what is most important is to include all relevant variables. Therefore, a good rule of
thumb for weighting composites in linear models is to keep things simple and to use equal
weights, although differential weights may be useful under certain conditions.
7. References
Allen, P. Geoffrey, & Fildes, Robert. (2001). Econometric forecasting. In J. S.
Armstrong (Ed.), Forecasting Principles: A Handbook for Researchers and
Practitioners (pp. 301-362). New York: Springer.
Armstrong, J. Scott. (1985). Long-range Forecasting: From Crystal Ball to
Computer. New York: Wiley.
Armstrong, J. Scott. (2012). Illusions in regression analysis. International Journal of
Forecasting, 28(3), 689-694.
Armstrong, J. Scott, & Graefe, Andreas. (2011). Predicting elections from
biographical information about candidates: A test of the index method.
Journal of Business Research, 64(7), 699-706. doi:
Armstrong, J. Scott, Green, Kesten C., & Graefe, Andreas. (in press). Golden Rule of
Forecasting. Journal of Business Research.
15 of 20
Armstrong, J. Scott, Rui, Du, Graefe, Andreas, Green, Kesten C., & House,
Alexandra. (2012). Predictive validity of evidence-based advertising
principles. Working paper.
Cuzán, Alfred G., & Bundrick, Charles M. (2009). Predicting Presidential Elections
with Equally Weighted Regressors in Fair's Equation and the Fiscal Model.
Political Analysis, 17(3), 333-340.
Cuzán, Alfred G., & Heggen, Richard J. (1984). A fiscal model of presidential
elections in the United States, 1880-1980. Presidential Studies Quarterly,
14(1), 98-108.
Czerlinski, J., Gigerenzer, G., & Goldstein, D. G. (1999). How good are simple
heuristics? In G. Gigerenzer & P. M. Todd (Eds.), Simple heuristics that make
us smart (pp. 97-118): Oxford University Press.
Dana, Jason, & Dawes, Robyn M. (2004). The superiority of simple alternatives to
regression for social science predictions. Journal of Educational and
Behavioral Statistics, 29(3), 317-331.
Darwin, Charles, Burkhardt, Frederick, & Smith, Sydney. (1986). The
correspondence of Charles Darwin: Volume 2, 1837-1843. Cambridge:
Cambridge University Press.
Davis-Stober, Clintin P., Dana, Jason, & Budescu, David V. (2010). A constrained
linear estimator for multiple regression. Psychometrika, 75(3), 521-541.
Dawes, Robyn M. (1979). The robust beauty of improper linear models in decision
making. American psychologist, 34(7), 571-582.
Dawes, Robyn M., & Corrigan, Bernard. (1974). Linear models in decision making.
Psychological Bulletin, 81(2), 95-106.
Einhorn, Hillel J., & Hogarth, Robin M. (1975). Unit weighting schemes for decision
making. Organizational Behavior and Human Performance, 13(2), 171-192.
doi: 10.1016/0030-5073(75)90044-6
Erikson, Robert S., & Wlezien, Christopher. (1999). Presidential polls as a time
series: the case of 1996. Public opinion quarterly, 63(2), 163-177.
Fair, Ray C. (2009). Presidential and congressional vote-share equations. American
Journal of Political Science, 53(1), 55-72.
Graefe, Andreas, & Armstrong, J. Scott. (2011). Conditions under which index
models are useful: Reply to bio-index commentaries. Journal of Business
Research, 64(7), 693-695.
Graefe, Andreas, & Armstrong, J. Scott. (2013). Forecasting elections from voters'
perceptions of candidates' ability to handle issues. Journal of Behavioral
Decision Making, 26(3), 295-303.
Graefe, Andreas, Armstrong, J. Scott, Jones Jr., Randall J., & Cuzán, Alfred G.
(2013). Combining forecasts: An application to elections. Forthcoming in the
International Journal of Forecasting,
Lewis-Beck, Michael S. (2005). Election forecasting: principles and practice. The
British Journal of Politics & International Relations, 7(2), 145-164.
Lewis-Beck, Michael S., & Rice, Tom W. (1992). Forecasting Elections.
Washington, DC: Congressional Quarterly Press.
Lichtman, Allan J. (2008). The keys to the white house: An index forecast for 2008.
International Journal of Forecasting, 24(2), 301-309.
Lockerbie, Brad. (2012). Economic expectations and election outcomes: The
Presidency and the House in 2012. PS: Political Science & Politics, 45(4),
16 of 20
Montgomery, Jacob M., Hollenbach, Florian, & Ward, Michael D. (2012). Improving
predictions using ensemble Bayesian model averaging. Political Analysis,
20(3), 271-291.
Ostrom, Charles W. Jr., & Simon, Dennis M. (1985). Promise and performance: A
dynamic model of presidential popularity. American Political Science Review,
79(2), 334-358.
Runkle, David E. (1998). Revisionist history: how data revisions distort economic
policy research. Federal Reserve Bank of Minneapolis Quarterly Review,
22(4), 3-12.
Schmidt, Frank L. (1971). The relative efficiency of regression and simple unit
predictor weights in applied differential psychology. Educational and
Psychological Measurement, 31(3), 699-714.
Sparks, Jared. (1844). The Works of Benjamin Franklin (Vol. 8). Boston: Charles
Tappan Publisher.
Woodside, Arch G. (2013). Moving beyond multiple regression analysis to
algorithms: Calling for adoption of a paradigm shift from symmetric to
asymmetric thinking in data analysis and crafting theory. Journal of Business
Research, 66(4), 463-472. doi:
17 of 20
Table 1: Overview of nine U.S. presidential election-forecasting models
Erikson &
& Tien
Abbreviation in the present study
Fiscal model
and the polls
voting model
Bread and
peace model
Jobs model
Total no. of variables, thereof
Economic indicators
Public opinion polls
First election since model creation
Sample period
Model fit (adjusted R2)
No. of observations / elections
Ratio of observations to predictors
7.5 *
The model specifications and data reflect the situation faced by the forecasters to predict the 2012 election. An exception is the model by Abramowitz, which used four variables to predict the 2012 election.
Here, the original version of the “trial-heat model” is used (see also footnote 2).
* The Hibbs model differs from traditional multiple linear regression model in that it estimates more parameters. Therefore, the ratio of observations to estimated parameters is lower than 7.5.
18 of 20
Table 2: Forecast error of nine multiple regression and equal-weights models (1976-2012)
Multiple regression analysis
Equal-weights method
Error reduction
Figures in italics show error reduction in %.
Individual models are ordered by ascending accuracy (MAE across the ten elections) from left to right.
Table 3: Error reduction achieved through the index model compared to the individual models
Index model
Error reduction due to
index model
Figures in italics show error reduction in %.
19 of 20
Figure 1: Average forecast accuracy of the nine multiple regression models
and their equal-weights variants for elections before and since model creation
Elections before model creation
Elections since model creation
Mean absolute error (MAE)
Multiple regression analysis
Equal-weights method
20 of 20
Figure 2: Calibration of the index model and eight regression models (1976-2012)
Horizontal axis: model; vertical axis: two-party popular vote share of the incumbent party’s candidate;
Marker: point forecast of each model;
Solid vertical lines: prediction interval for each model forecast;
Dashed horizontal line: actual election result;
... Because empirically-determined weights can provide only marginal advantages and may have serious drawbacks, the usual recommendation is to use unweighted composites (Cohen, 1990;Graefe, 2015;Grice, 2001). Indeed, Rönkkö and Ylitalo (2010) demonstrated that PLS weights can harm reliability and validity (see Rönkkö & Evermann, 2013). ...
... The comparisons of PLS and unweighted composites show what has been known for decades: differential weights rarely make a difference. As long as the indicators are at least moderately correlated, advantages from weights are trivial (e.g., Cohen, 1990;Graefe, 2015;Grice, 2001). Nevertheless, a few recent articles have presented simulations where PLS weights make a difference (e.g., Becker et al., 2013;Hair et al., 2017). ...
... Several studies provide analytical solutions for the conditions under which equal weights provide more accurate out-ofsample forecasts than regression weights (e.g., Einhorn and Hogarth (1975); Davis-Stober (2011)). Equal weights can be advantageous when models fit the data poorly (Graefe 2015). ...
... Since ASFRs change in limited ways across age and time, it can be expected that a model that produces smaller in-sample point and interval forecast errors will also produce smaller out-ofsample errors. The performance of the equal weights approach is consistent with previous research (Graefe 2015). It is noted that variation in accuracy across models in Tables 2 and 3 is somewhat limited, partially explaining the superior performance of the equal weights approach. ...
Full-text available
Accuracy in fertility forecasting has proved challenging and warrants renewed attention. One way to improve accuracy is to combine the strengths of a set of existing models through model averaging. The model-averaged forecast is derived using empirical model weights that optimise forecast accuracy at each forecast horizon based on historical data. We apply model averaging to fertility forecasting for the first time, using data for 17 countries and six models. Four model-averaging methods are compared: frequentist, Bayesian, model confidence set, and equal weights. We compute individual-model and model-averaged point and interval forecasts at horizons of one to 20 years. We demonstrate gains in average accuracy of 4–23% for point forecasts and 3–24% for interval forecasts, with greater gains from the frequentist and equal weights approaches at longer horizons. Data for England and Wales are used to illustrate model averaging in forecasting age-specific fertility to 2036. The advantages and further potential of model averaging for fertility forecasting are discussed. As the accuracy of model-averaged forecasts depends on the accuracy of the individual models, there is ongoing need to develop better models of fertility for use in forecasting and model averaging. We conclude that model averaging holds considerable promise for the improvement of fertility forecasting in a systematic way using existing models and warrants further investigation.
... The combined forecasts (denoted PC tþf ) are obtained by assigning equal weights to both the Federal Reserve and KM forecasts. Assigning equal weights follows the recommendation of Graefe (2015) and Green and Armstrong (2015), among others. Column 3 reports the MAEs of the combined forecasts. ...
Full-text available
This study aims to improve the accuracy of the Federal Reserve forecasts of growth in durables spending using disaggregated consumer survey data. Test results for 1988–2016 indicate that these forecasts do (do not) contain past information in consumer durables-buying (home-buying) attitudes of 35–54-year-old participants, participants with a college degree, male participants, and participants with the top 33% income. Using real-time data on durables spending and information in consumer home-buying attitudes and expectations, we construct a knowledge model (KM) to generate comparable forecasts of growth in durables spending. Our results indicate that the one- and four-quarter-ahead KM forecasts can potentially help improve the accuracy of Federal Reserve forecasts. Further results indicate that the one- and four-quarter-ahead combined Federal Reserve and KM forecasts show significant reductions in forecast errors, meaning that there are accuracy gains from using disaggregated consumer survey data. The practical implication is that forecasters should pay special attention to consumer home-buying attitudes and expectations about future business conditions, and policymakers should make use of such survey measures in monitoring the economy in real time.
... That literature has concluded that the pessimistic conclusions drawn in the theoretical social choice literature about the impossibility of rational consensus (Arrow, 1951), may have been overdrawn, e.g., because requirements such as Arrow's "universal domain" requirement (a rational aggregation procedure must satisfy a collection of properties irrespective of any conceivable population distribution of preferences) are overly demanding. (Graefe, 2015;Green and Armstrong, 2015). ...
As has been known for over a century, aggregated preferences of a group may bear little or no similarity to the preference of any single individual, regardless of the aggregation method. Yet, it remains routine to fit or test theories of individual decision making on pooled data, and it remains routine to cast theories of individual decision making at the aggregate level. This mindset may have disastrous policy and business implications. A population of individuals who all satisfy one theory may behave collectively as though they satisfied a competing theory. A collection of individuals satisfying a given theory may collectively satisfy a version of the same theory with qualitatively different scientific or decision analytic implications. Because the resulting artifacts apply at the population level, replications, large samples, and high-quality data can do nothing to detect or repair them.
This study focuses on the current-quarter and one – through four-quarter-ahead Federal Reserve forecasts of housing starts. The aim is to assess the usefulness of nowcast data (measured by the current-quarter forecasts) for predicting housing starts one through four quarters ahead. Specifically, we use the nowcast data to generate the one – through four-quarter-ahead nowcast-based forecasts. For 1985–2016, the Federal Reserve and nowcast-based forecasts (while outperforming the naïve forecasts) contain useful and distinct predictive information. Combining these forecasts yields reductions in forecast errors that are larger at longer horizons. In addition, the Federal Reserve (nowcast-based) forecasts imply symmetric (asymmetric) loss. The nowcast-based forecasts, in particular, are of value to a user who assigns high (low) cost to incorrect downward (upward) moves and, thus, offer useful information for policymaking, when downward moves in housing starts are considered as early-warning signs of overall economic downturns.
Purpose This study is concerned with evaluating the Federal Reserve forecasts of light motor vehicle sales. The goal is to assess accuracy gains from using consumer vehicle-buying attitudes and expectations about future business conditions derived from the long-running Michigan Surveys of Consumers. Design/methodology/approach Simplicity is a core principle in forecasting, and the literature provides plentiful evidence that combining forecasts from different methods and models reduces out-of-sample forecast errors if the methods and models are valid. As such, the authors construct a simple vector autoregressive (VAR) model that incorporates consumer vehicle-buying attitudes and expectations about future business conditions. Comparable forecasts of vehicle sales from this model are then combined with the Federal Reserve forecasts to assess accuracy gains. Findings The findings for 1994–2016 indicate that the Federal Reserve and VAR forecasts contain distinct and useful predictive information, and the combination of the two forecasts shows reductions in forecast errors that are more significant at longer horizons. The authors thus conclude that there are accuracy gains from using consumer survey responses. Originality/value This is the first study that is concerned with evaluating the Federal Reserve forecasts of vehicle sales and examines whether there are accuracy gains from using consumer vehicle-buying attitudes and expectations.
This study focuses on the Federal Reserve and private forecasts of growth in real residential investment. The aim is to improve predictive accuracy by first evaluating these forecasts. The results for 1984–2015 reveal that the Federal Reserve and private forecasts are generally free of systematic bias, superior to the naïve benchmark, and predict directional change with high accuracy rates. However, these forecasts do not contain detailed information in consumer home-buying attitudes and expectations. Using a subset of such information and real-time data on residential investment, a knowledge model (KM) is constructed to produce comparable forecasts. The test results indicate that the KM forecasts of growth in residential investment contain distinct and useful predictive information, and the combined Federal Reserve, private, and KM forecasts show reductions in forecast errors that are more significant at longer horizons. As such, we conclude that consumer survey responses help improve forecast accuracy. Given that accurate forecasts contribute to the success of policy, more transparency in Federal Reserve Open Market Committee (FOMC) decisions is encouraged. With more transparency and clear communication, consumers are able to provide more informative responses, which can then be employed to produce more accurate forecasts of growth in residential investment.
Prognosen stellen in der Politikwissenschaft ein zwar noch kleines, aber stetig wachsendes Forschungsfeld dar, welches in verschiedenen Teilbereichen der Disziplin Anwendung findet. Gemeint sind hiermit statistische Modelle, mit denen explizit politikwissenschaftlich relevante Phänomene vor ihrem Eintreten vorhergesagt werden. Dabei folgen sie den wissenschaftlichen Leitlinien der intersubjektiven Nachvollziehbarkeit und Reproduzierbarkeit. Dieser Beitrag führt ein in die Grundlagen politikwissenschaftlicher Prognosen. Den Schwerpunkt der Darstellung bilden Wahlprognosen, insbesondere strukturelle Modelle, welche beispielhaft anhand eines kanonischen Wahlprognosemodells erläutert werden. Daneben werden synthetische Modelle, Aggregationsmodelle, „Wisdom of the crowd“-Ansätze und Prognosemärkte diskutiert.
Full-text available
This article proposes a unifying theory, or the Golden Rule, of forecasting. The Golden Rule of Forecasting is to be conservative. A conservative forecast is consistent with cumulative knowledge about the present and the past. To be conservative, forecasters must seek out and use all knowledge relevant to the problem, including knowledge of methods validated for the situation. Twenty-eight guidelines are logically deduced from the Golden Rule. A review of evidence identified 105 papers with experimental comparisons; 102 support the guidelines. Ignoring a single guideline increased forecast error by more than two-fifths on average. Ignoring the Golden Rule is likely to harm accuracy most when the situation is uncertain and complex, and when bias is likely. Non-experts who use the Golden Rule can identify dubious forecasts quickly and inexpensively. To date, ignorance of research findings, bias, sophisticated statistical procedures, and the proliferation of big data, have led forecasters to violate the Golden Rule. As a result, despite major advances in evidence-based forecasting methods, forecasting practice in many fields has failed to improve over the past half-century.
Principles of Forecasting: A Handbook for Researchers and Practitioners summarizes knowledge from experts and from empirical studies. It provides guidelines that can be applied in fields such as economics, sociology, and psychology. It applies to problems such as those in finance (How much is this company worth?), marketing (Will a new product be successful?), personnel (How can we identify the best job candidates?), and production (What level of inventories should be kept?). The book is edited by Professor J. Scott Armstrong of the Wharton School, University of Pennsylvania. Contributions were written by 40 leading experts in forecasting, and the 30 chapters cover all types of forecasting methods. There are judgmental methods such as Delphi, role-playing, and intentions studies. Quantitative methods include econometric methods, expert systems, and extrapolation. Some methods, such as conjoint analysis, analogies, and rule-based forecasting, integrate quantitative and judgmental procedures. In each area, the authors identify what is known in the form of `if-then principles', and they summarize evidence on these principles. The project, developed over a four-year period, represents the first book to summarize all that is known about forecasting and to present it so that it can be used by researchers and practitioners. To ensure that the principles are correct, the authors reviewed one another's papers. In addition, external reviews were provided by more than 120 experts, some of whom reviewed many of the papers. The book includes the first comprehensive forecasting dictionary.
When deciding for whom to vote, voters should select the candidate they expect to best handle issues, all other things equal. A simple heuristic predicted that the candidate who is rated more favorably on a larger number of issues would win the popular vote. This was correct for nine out of ten U.S. presidential elections from 1972 to 2008. We then used simple linear regression to relate the incumbent’s relative issue ratings to the actual two-party popular vote shares. The resulting model yielded out-of-sample forecasts that were competitive with those from the Iowa Electronic Markets and other established quantitative models. This model has implications for political decision-makers, as it can help to track campaigns and to decide which issues to focus on.
Purpose – This paper aims to test whether a structured application of persuasion principles might help improve advertising decisions. Evidence-based principles are currently used to improve decisions in other complex situations, such as those faced in engineering and medicine. Design/methodology/approach – Scores were calculated from the ratings of 17 self-trained novices who rated 96 matched pairs of print advertisements for adherence to evidence-based persuasion principles. Predictions from traditional methods – 10,809 unaided judgments from novices and 2,764 judgments from people with some expertise in advertising and 288 copy-testing predictions – provided benchmarks. Findings – A higher adherence-to-principles-score correctly predicted the more effective advertisement for 75 per cent of the pairs. Copy testing was correct for 59 per cent, and expert judgment was correct for 55 per cent. Guessing would provide 50 per cent accurate predictions. Combining judgmental predictions led to substantial improvements in accuracy. Research limitations/implications – Advertisements for high-involvement utilitarian products were tested on the assumption that persuasion principles would be more effective for such products. The measure of effectiveness that was available –day-after-recall – is a proxy for persuasion or behavioral measures. Practical/implications – Pretesting advertisements by assessing adherence to evidence-based persuasion principles in a structured way helps in deciding which advertisements would be best to run. That procedure also identifies how to make an advertisement more effective. Originality/value – This is the first study in marketing, and in advertising specifically, to test the predictive validity of evidence-based principles. In addition, the study provides the first test of the predictive validity of the index method for a marketing problem.
Review of J. Scott Armstrong's 1978 book on forecasting. Click on the DOI link above to read the review
We present ensemble Bayesian model averaging (EBMA) and illustrate its ability to aid scholars in the social sciences to make more accurate forecasts of future events. In essence, EBMA improves prediction by pooling information from multiple forecast models to generate ensemble predictions similar to a weighted average of component forecasts. The weight assigned to each forecast is calibrated via its performance in some validation period. The aim is not to choose some "best" model, but rather to incorporate the insights and knowledge implicit in various forecasting efforts via statistical postprocessing. After presenting the method, we show that EBMA increases the accuracy of out-of-sample forecasts relative to component models in three applied examples: predicting the occurrence of insurgencies around the Pacific Rim, forecasting vote shares in U.S. presidential elections, and predicting the votes of U.S. Supreme Court Justices.