Content uploaded by Andreas Graefe

Author content

All content in this area was uploaded by Andreas Graefe on Jan 11, 2019

Content may be subject to copyright.

Content uploaded by Kesten Green

Author content

All content in this area was uploaded by Kesten Green on Jul 16, 2018

Content may be subject to copyright.

RESEARCH ARTICLE

Accuracy gains from conservative forecasting:

Tests using variations of 19 econometric

models to predict 154 elections in 10

countries

Andreas GraefeID

1☯

*, Kesten C. Green

2,3☯

, J. Scott ArmstrongID

3,4☯

1Macromedia University, Munich, Germany, 2University of South Australia Business School, Adelaide,

Australia, 3The Ehrenberg-Bass Institute, University of South Australia, Adelaide, Australia, 4Wharton

School, University of Pennsylvania, Philadelphia, PA, United States of America

☯These authors contributed equally to this work.

*graefe.andreas@gmail.com

Abstract

Problem

Do conservative econometric models that comply with the Golden Rule of Forecasting pro-

vide more accurate forecasts?

Methods

To test the effects of forecast accuracy, we applied three evidence-based guidelines to 19

published regression models used for forecasting 154 elections in Australia, Canada, Italy,

Japan, Netherlands, Portugal, Spain, Turkey, U.K., and the U.S. The guidelines direct fore-

casters using causal models to be conservative to account for uncertainty by (I) modifying

effect estimates to reflect uncertainty either by damping coefficients towards no effect or

equalizing coefficients, (II) combining forecasts from diverse models, and (III) incorporating

more knowledge by including more variables with known important effects.

Findings

Modifying the econometric models to make them more conservative reduced forecast errors

compared to forecasts from the original models: (I) Damping coefficients by 10% reduced

error by 2% on average, although further damping generally harmed accuracy; modifying

coefficients by equalizing coefficients consistently reduced errors with average error reduc-

tions between 2% and 8% depending on the level of equalizing. Averaging the original

regression model forecast with an equal-weights model forecast reduced error by 7%. (II)

Combining forecasts from two Australian models and from eight U.S. models reduced error

by 14% and 36%, respectively. (III) Using more knowledge by including all six unique vari-

ables from the Australian models and all 24 unique variables from the U.S. models in equal-

weight “knowledge models” reduced error by 10% and 43%, respectively.

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 1 / 14

a1111111111

a1111111111

a1111111111

a1111111111

a1111111111

OPEN ACCESS

Citation: Graefe A, Green KC, Armstrong JS (2019)

Accuracy gains from conservative forecasting:

Tests using variations of 19 econometric models to

predict 154 elections in 10 countries. PLoS ONE 14

(1): e0209850. https://doi.org/10.1371/journal.

pone.0209850

Editor: Nicola Lacetera, University of Toronto,

Rotman School, CANADA

Received: July 21, 2018

Accepted: December 12, 2018

Published: January 10, 2019

Copyright: ©2019 Graefe et al. This is an open

access article distributed under the terms of the

Creative Commons Attribution License, which

permits unrestricted use, distribution, and

reproduction in any medium, provided the original

author and source are credited.

Data Availability Statement: The data used in this

study are freely accessible from Harvard Dataverse

at the following DOI: https://doi.org/10.7910/DVN/

OI9IA3.

Funding: The authors received no specific funding

for this work.

Competing interests: The authors have declared

that no competing interests exist.

Originality

This paper provides the first test of applying guidelines for conservative forecasting to estab-

lished election forecasting models.

Usefulness

Election forecasters can substantially improve the accuracy of forecasts from econometric

models by following simple guidelines for conservative forecasting. Decision-makers can

make better decisions when they are provided with models that are more realistic and fore-

casts that are more accurate.

Introduction

The evidence-based forecasting principle known as the Golden Rule of Forecasting advises

forecasters to adhere closely to cumulative prior knowledge about the situation. We test

whether following this principle of conservatism can help to improve the accuracy of econo-

metric models’ ex ante forecasts. To help forecasters to apply the Golden Rule, Armstrong,

Green and Graefe provided 28 guidelines for conservative forecasting, such as how to formu-

late a forecasting problem; how to forecast with judgmental, extrapolative, and causal methods;

how to combine forecasts from different methods; and how to adjust forecasts. They then

assessed the effects of each guideline on out-of-sample forecast accuracy by reviewing pub-

lished studies that compared the accuracy of forecasts from conservative and non-conservative

forecasting methods. Of the 105 studies they identified, 102 supported the guidelines. On aver-

age, ignoring a guideline increased forecast error by more than 40% [1]. Further research on

the Golden Rule produced additional evidence and a revision of the guidelines [2]. Among the

changes was a suggestion to use knowledge models as an alternative to regression analysis. The

aim of knowledge models is to include all variables that are known to have important causal

relationships with the subject of the forecast, based on the domain knowledge of experts and

evidence from experimental studies. The latest version of the Golden Rule is available at For-

Prin.com.

This paper tests the effect of following conservative guidelines on the accuracy of forecasts

from published models originally estimated using multiple regression analysis. In particular,

we tested three of the guidelines on 19 regression models used to forecast vote shares in 154

elections in ten countries.

Econometric models for forecasting elections

The development of causal models for forecasting voting in elections has become an important

sub-discipline of political science. As of September 2018, about 2,000 results were identified by

a Google Scholar search for the two terms “election forecasting” and “model.” Evidence on the

models’ predictive validity should be of interest to researchers whose theories of voting behav-

ior are represented by the models, and to decision-makers whose plans vary depending on

their expectations of who will win an election.

Causal theories to which the modelers ascribe identify influences on voting behavior; elec-

tion forecasting models include variables that represent these influences. Most election fore-

casting models represent the theory of retrospective voting, which views an election as a

referendum on the incumbent government’s performance, often based on the country’s eco-

nomic performance. Thus, retrospective voting theory assumes that voters reward the

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 2 / 14

incumbent party for good performance and punish it otherwise. Causal models typically repre-

sent this theory by using changes in one or more macroeconomic variables—such as GDP,

unemployment, or prices—to measure performance. The models often include popularity

poll-based variables as proxies for voters’ satisfaction with the government’s handling of both

economic and non-economic issues.

Many of the models include variables that represent aspects of the country’s electoral sys-

tem affecting voting behavior or historical patterns of voting behavior. For example, the time

the incumbent party has held power can be used to allow for the observation that, historically,

leaders have often enjoyed a “honeymoon” period of popularity following their first election,

with the effect fading through a leader’s tenure as the electorate’s desire for change increases.

In the U.S., political economy models have been established in presidential election seasons

since the late 1970s [3]. For the seven elections from 1992, political scientists and economists

have published their models and forecasts prior to the election in special sections of scientific

journals including Political Methodologist 5(2), American Politics Research 24(4) and PS:Politi-

cal Science and Politics 34(1), 37(4), 41(4), 45(4), and 49(4). That work also spearheaded the

development of election forecasting models in other countries, many of which featured in two

special issues of the International Journal of Forecasting 26(1) and 28(4). In particular,

researchers have developed models for France, Germany, the U.K., Portugal, Spain, Turkey,

Australia, and Japan. The models have been used to test theories of voting and to estimate the

relative effects of individual variables on the aggregate popular vote. Most importantly for this

paper, they have been used to provide ex ante forecasts of election outcomes, typically many

months before the election is held.

The dominant method for estimating political economy models is multiple regression anal-

ysis. Multiple regression analysis estimates variable weights that provide the least-squared-

error fit to a given sample of data. The resulting variable weights are then applied to new values

of the causal variables to make forecasts.

We used three criteria for including a model in our analysis. The model (1) was estimated

with multiple linear ordinary least squares (OLS) regression analysis, (2) predicted national

election results, and (3) was published in an academic journal. However, the forecasters of

some models did not publish their data and did not respond to, or declined, our request for

their data; these models were excluded from analysis.

Nineteen models from ten countries met our criteria. While those models are not exhaus-

tive of the election forecasting literature, we believe that they do provide a representative sam-

ple of the models that have been developed for different countries. Table 1 provides an

overview of the 19 models’ key features: the dependent variable, the number of elections

(observations) in the estimation sample, and the number of economic and political variables

in the model. The median ratio of observations to variables was five.

Given the attention that election forecasting attracts in the U.S., models for forecasting U.S.

presidential elections form the largest group; a total of eight models. Australian and Canadian

general elections have two models each, while there is only one model each for Italy, Japan, the

Netherlands, Portugal, Spain, Turkey and the U.K.

In general, the models can be written as:

V¼aþX

k

i¼1

bixi

where Vis the party’s expected share of the national two-party popular vote, ais the vote that

the party would get if all the causal variables were zero (the intercept), and the b

i

‘s are the coef-

ficients—all estimated from historical data—of the kcausal variables, x

i

to x

k

.

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 3 / 14

Conservative guidelines for causal models

When estimating variable weights, multiple regression analysis cannot account for uncertainty

arising from sources including biases in the data, use of proxy variables, omission of important

variables, inclusion of irrelevant variables, lack of variation in variable values in the estimation

sample, and error in predicting or controlling causal variables in the future. As a result, multi-

ple regression models are insufficiently conservative for forecasting as they tend to overfit an

incomplete model specification to inadequate estimation data [4].

The Golden Rule of Forecasting provides four conservative guidelines for causal models

[1]. We test three: (I) modify effect estimates to reflect uncertainty through either damping or

equalizing, (II) combine forecasts from dissimilar models, and (III) include in one single

model all of the causal variables used in the various available models. We hypothesized that fol-

lowing these guidelines would result in forecasts that were more accurate than those from

models estimated using multiple regression analysis.

Table 1. Key features of the 19 models analyzed in this study.

Number of

Country / Election / Model Dependent variable Elections Variables

Australia (general)

Cameron & Crosby [5] Incumbent vote 40 5

Jackman [6] Incumbent vote 22 3

Canada (general)

Be

´langer & Godbout [7] Incumbent vote 19 4

Nadeau & Blais [8] Liberal vote 13 4

Italy (national, European, and local)

Bellucci [9] Incumbent vote 9 3

Japan (general)

Lewis-Beck & Tien [10] LDP (percent seats) 17 3

Netherlands (legislative)

Dassonneville, Lewis-Beck & Mongrain [11] Incumbent vote 20 3

Portugal (general)

Magalhães & Aguiar-Conraria [12] Incumbent vote 11 3

Spain (general)

Magalhães, Aguiar-Conraria & Lewis-Beck [13] Liberal vote 14 4

Turkey (general)

Toros [14] Incumbent vote change 11 3

U.K. (general)

Lewis-Beck, Nadeau & Be

´langer [15] Incumbent vote 12 3

U.S. (presidential)

Fair [3] Incumbent vote 25 7

Cuza

´n [16] Incumbent vote 25 5

Abramowitz [17] Incumbent vote 17 3

Campbell [18] Incumbent vote 17 2

Lewis-Beck & Tien [19] Incumbent vote 16 4

Holbrook [20] Incumbent vote 16 3

Erikson & Wlezien [21] Incumbent vote 16 2

Lockerbie [22] Incumbent vote 15 2

Median across all 19 models 16 3

https://doi.org/10.1371/journal.pone.0209850.t001

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 4 / 14

Modify effect estimates to reflect uncertainty

Regression reduces the estimated effect of a variable in response to unexplained variation in

the estimation data. It does not, however, compensate for all sources of uncertainty. Damping

and equalizing causal variable coefficient estimates are conservative strategies that can be used

to compensate for some of the residual uncertainty.

Damp coefficients. Damping refers to the general idea of reducing the size of an esti-

mated effect toward having no effect. Damping has been used with extrapolation models by

reducing the magnitude of an estimated trend resulting in reductions in forecast errors of

about 12% [1]. The authors of that paper suggested that damping might also be useful for

causal models. Following the same rationale as for extrapolation models, they concluded that

the actual causal effects are weaker than those estimated from the data by regression analysis.

Hence, forecasts should stay closer to the regression model’s constant. Unlike extrapolation,

however, regression analysis already adjusts for uncertainty. As a result, damping is likely to be

less useful when applied to regression coefficients.

Moreover, damping is a conditional guideline. It is not expected to work if the estimated

coefficient is lower than what one would expect based on prior knowledge. If, on the other

hand, the forecaster is uncertain over whether future causal variables values will be more

extreme than those in the estimation data, the case for damping would seem stronger.

Unlike extrapolation, Armstrong, Green and Graefe were unable to find evidence on

whether damping regression coefficients towards no effect improves the accuracy of ex ante

forecasts [1]. This paper addresses the question of whether and when damping can be produc-

tively applied to multiple regression model coefficients.

Damping coefficients is not a new idea. For example, an early study tested “ridge regres-

sion”—a sophisticated approach to damping—using simulated data. Ridge regression model

forecasts were more accurate than OLS model forecasts, which in turn were more accurate

than equal-weights model forecasts [23]. We are not aware of any tests of the accuracy of ex

ante ridge regression model forecasts using real data.

A simple strategy for damping is to multiply the estimated weights with a factor d. The

“damped” version of the original regression model can be written as:

V¼aþ ð1dÞX

k

i¼1

bixi

The factor dcan range from 0 to 1. For d= 0, the original regression model would remain

unchanged, which means no damping. For d= 1, the model coefficients are in effect zero and

the model forecast is simply the value of the intercept a—the incumbent’s vote share that

would be obtained if the predictor variables were equal to their historical mean. The bigger the

factor d, the greater is the shrinking toward the historical average incumbent vote share.

Equalize coefficients. Equalizing is useful if there is uncertainty about the relative impor-

tance of the causal variables; the greater the uncertainty, the more one should adjust the coeffi-

cients towards equality. When relative effect sizes are highly uncertain, one should consider

the most extreme case of equalizing and assign equal-weights to all variables expressed as dif-

ferences from their mean divided by their standard deviation (i.e., standardized).

To equalize, standardize the variables, estimate the model using multiple regression analy-

sis, and adjust the estimated coefficients toward equality. The adjusted vote equation can be

written as:

V¼aþ1eð ÞX

k

i¼1

bixiþe

kX

k

i¼1

biX

k

i¼1

xi

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 5 / 14

where eis the equalizing factor, which can range from 0 to 1. The greater the equalizing factor

e, the greater the amount of equalizing. An equalizing factor e= 0 yields the equivalent of the

original multiple regression model in standardized variables. On the other extreme, when

e= 1, all model coefficients are assigned equal-weights.

One review looked at comparative studies on equal-weights published since the 1970s in a

variety of areas, and concluded that equal-weights models often provide ex ante forecasts that are

more accurate than those from regression models [24]. For example, one of those studies analyzed

the relative predictive accuracy of forecasts from regression and equal-weights models by making

out-of-sample forecasts using five real non-experimental social science datasets and a large num-

ber of synthetic datasets. Regression weights were inferior to equal-weights where there were

fewer than 100 observations per predictor variable available for estimating the model [25]. Yet,

many practical problems—including election forecasting—involve limited sample sizes.

For election forecasting, one study found that equal-weights versions of two published regres-

sion models provided out-of-sample election forecasts that were at least as accurate as those from

the original regression models [26]. Another study showed that equal-weights versions of six of

nine established regression models for election forecasting yielded more accurate forecasts than

the original models. On average across the ten elections from 1976 to 2012, the equal-weights

models reduced the original regression models’ ex ante absolute forecast errors by 5% [24].

Combine forecasts from alternative models

Hundreds of studies have shown that combining forecasts that incorporate diverse data and

information is an effective method for using additional knowledge and to thereby improve

forecast accuracy [27].

Reviews of studies on combining forecasts conclude that simple unweighted averages pro-

vide the most accurate forecasts, except in rare situations where strong evidence suggests that

some models consistently provide more accurate forecasts than others [28]. That paper also

found that the error of simple unweighted averages of forecasts from six election-forecasting

models was 25% lower than the corresponding error of the forecasts from a much more com-

plex combining method. In light of the evidence, we calculated simple unweighted averages of

the forecasts from all models with the same dependent variable to generate combined forecasts

for this study.

Use all important variables

Include all known important variables in a model. The guideline is difficult to implement with

multiple regression analyses because the practical limit of the method is a handful of variables

at best [29]. Researchers typically confront the problem by using only some of the variables

that are known to be important.

One way to avoid the practical limits that regression places on the number of variables in a

model is to use prior knowledge instead of statistical methods to select causal variables and

determine their direction and size effects. This necessitates a review of the cumulative knowl-

edge from prior research. Knowledge models can be traced back to a letter from Benjamin

Franklin, in which he described “Moral Algebra, or Method of Deciding Doubtful Matters,”

his method for choosing between alternatives [30]. In short, Franklin recommended identify-

ing all important variables and whether they add to or subtract from the likelihood or value of

the alternative. Next, weight each variable by the strength of its effect. Finally, apply the model

you have just developed to each alternative by ascertaining the values of the variables, multiply-

ing by the model’s assigned weights, and adding to obtain the score for the alternative model.

A higher scoring alternative is more likely, or better.

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 6 / 14

The major advantage of this approach is that variables are included on the basis of prior

knowledge about their importance (i.e., substantive effect) and direction, and not on the basis

of a given set of data alone. Consequently, one does not need to estimate a coefficient for each

variable from the data and the number of variables that can be included in a model is

unlimited.

Franklin suggested differential weighting of variables. Forecasters, however, often lack ade-

quate prior knowledge about the relative importance of important variables. Given the evi-

dence on the relative accuracy of equal and regression weights outlined above, equal variable

weights are a reasonable starting point for causal models. As the number of variables in a

model increases, the magnitudes of individual variable effects become less important for pre-

dictive validity, as an early paper showed mathematically [31].

Franklin’s approach was intended for rating alternatives, but when the dependent variable

is a scalar and data are available, the scores for alternatives can be used as the independent var-

iable in single regression analysis. One study tested that approach by assigning equal-weights

to all 27 (standardized) variables that were included in nine established models for forecasting

U.S. presidential elections. The resulting model was used to generate ex ante forecasts of the

ten elections from 1976 to 2012 with an average error of 1.3 percentage points. That error was

48% smaller than the typical model’s error and 29% smaller than the most accurate model’s

error [24].

The present study uses a similar approach and sums the standardized values of all variables

that are used in different models that predict the same target variable in order to calculate an

index variable. The resulting vote equation is:

V¼aþbX

N

i¼1

xi

where the x

i

’s are the standardized values of Nunique variables used in different models.

Method

All data and calculations are available at the Harvard Dataserve: https://doi.org/10.7910/DVN/

OI9IA3.

Model estimation and forecast generation

For each of the 19 models, we standardized the original data and transformed variables to

ensure that all predictor variables correlated positively with the dependent variable. Standardi-

zation of variable values was performed by calculating the differences from their mean and

dividing by their standard deviation. Transformation for variables that are correlated nega-

tively with the dependent variable was done by multiplying the variable values by -1.

We analyzed the accuracy of forecasts across all observations available for each model. All

forecasts were out-of-sample using an N-1 cross-validation procedure, an approach that is also

known as jackknifing. In other words, to forecast an election outcome we estimated models

using the data on all other elections in the data set. This method allows for a powerful test of

predictive validity because it maximizes both the size of the estimation sample and the number

of out-of-sample forecasts.

All data and calculations are based on the models’ specifications published in the respective

journal publications. Often, however, these versions were different from the original specifica-

tions that were used to predict a particular election. For example, Ray Fair changed his model

equation in 1992, and kept it constant since [3]. Most models have been revised at least once

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 7 / 14

since their first publication, usually as a reaction to poor performance in forecasting the previ-

ous election. Such revisions usually improve model fit, because the model developer has access

to historical data when selecting the variables and building the model. One study showed that

model accuracy drops substantially for observations not available at the time of model develop-

ment [24].

In sum, N-1 cross-validation favors regression analysis in producing forecasts that use

more information than one would have had available at the time of making the prediction.

Hence, any accuracy gains from applying the conservative guidelines obtained in the present

study should be regarded as a lower boundary.

Error measure

We report the relative absolute error (RAE) of the forecasts that result from the application of

each guideline [32]. The RAE is calculated as the mean absolute error (MAE) of forecasts from

a model that follows the guideline, divided by the corresponding MAE of the original model.

Values of RAE greater than 1 mean that following a guideline yielded forecasts that were less

accurate than those from the original model, whereas values less than 1 mean that following

the guidelines yielded more accurate forecasts.

Accuracy gains from following Golden Rule guidelines

Modification of estimated effects

Damping. Across all 19 models, only damping of 20% or less reduced errors for most

models and on average, and the error reductions were small. For example, damping model

coefficients by 10% reduced error for 14 of the 19 models (74%), with an average error reduc-

tion of 2% (= 1–0.98). Heavier damping than 20% harmed accuracy. Table 2 shows the mean

RAEs of the forecasts across all 19 models with coefficients damped from 10% to 100% in

intervals of 10%, while S1 Table in the supporting information shows the RAEs for forecasts

from each individual model for each of the ten levels of damping.

Equalizing. All levels of equalizing reduced forecast error on average. Error reductions

ranged from 3% to 8%. Moreover, equalizing reduced the errors of forecasts from at least 15 of

the 19 models for all levels of equalizing. The most extreme equalizing—in which all predictor

variables are assigned equal-weights in the models—provided forecasts with a mean RAE of

0.94. In other words, equal-weights models reduced forecast error compared to forecasts from

Table 2. Effect of damping and equalizing on forecast errors relative to original forecast errors.

Level of equalizing / damping (%) Damping Equalizing

Mean RAE % Mean RAEs <1 Mean RAE % Mean RAEs <1

10 0.98 74 0.97 100

20 0.99 63 0.96 95

30 1.02 47 0.95 89

40 1.07 37 0.94 89

50 1.15 32 0.93 89

60 1.23 32 0.92 89

70 1.32 32 0.92 89

80 1.41 26 0.93 89

90 1.52 26 0.93 84

100 1.62 26 0.94 79

Mean 1.23 39 0.94 89

https://doi.org/10.1371/journal.pone.0209850.t002

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 8 / 14

the original models by, on average, 6%. Table 2 shows the mean RAEs of the forecasts across

all 19 models with equalizing from 10% to 100% in intervals of 10%, while S2 Table in the sup-

porting information shows the RAEs for forecasts from each individual model for each of the

10 levels of equalizing.

Error reductions were maximized, more or less, with equalizing of 50% and, both mean

RAEs and the percentage of models with RAEs of less than one improving little and then dete-

riorating with more equalizing. In sum, the results suggest that, by providing an efficient

trade-off between average error reduction (RAE) and the chance of error reduction (% Mean

RAEs <1), 50% equalizing is a sensible compromise. Moreover, this 50–50 rule is easy to

understand and easy to apply: simply average the forecast from the original regression model

and the forecast from an equal-weights version of the model.

Forecast combinations

The benefits of combining forecasts can be tested for elections for which (a) more than one

model is available and (b) the models predict the same dependent variable. This was the case

for the eight models that forecast U.S. presidential elections and the two models that forecast

Australian general elections. (Note that although two models were available for predicting

Canadian federal elections, those models predict a different outcome—incumbent party vote

for one, and Liberal party vote for the other—and thus their forecasts could not be combined.)

Table 3 shows the results.

For Australian elections, model forecasts were combined across the 22 elections from 1951

to 2004 for which forecasts from both models were available. The MAE of the combined fore-

cast was 2.26 percentage points, which was more accurate than the forecasts from both of the

individual models. Compared to the average model forecast (with an error of 2.61 percentage

points), combining reduced error by 14%.

For U.S. elections, model forecasts were combined across the 15 elections from 1956 to

2012 for which forecasts from all eight models were available. The MAE of the combined

Table 3. Effect of combining on forecast errors relative to original forecast errors.

MAE

(original)

RAE

(combined)

Australia,22 elections from 1951 to 2004

MAE of combined forecast 2.26

Cameron & Crosby [5] 2.68 0.84

Jackman [6] 2.54 0.89

Mean (typical model) error 2.61 0.86

U.S., 15 elections from 1956 to 2012

MAE of combined forecast 1.48

Abramowitz [17] 1.76 0.84

Campbell [18] 1.99 0.74

Cuza

´n [16] 2.07 0.72

Erikson & Wlezien [21] 2.54 0.58

Fair [3] 2.49 0.60

Holbrook [20] 2.55 0.58

Lewis-Beck & Tien [19] 2.29 0.65

Lockerbie [22] 2.73 0.54

Mean (typical model) error 2.30 0.64

https://doi.org/10.1371/journal.pone.0209850.t003

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 9 / 14

forecast was 1.48 percentage points and was thus smaller than the average errors of each of the

eight individual models, which ranged from 1.76 to 2.73 percentage points. Compared to the

error of the typical model, which was 2.30 percentage points, combining reduced error by 36%

(Table 3). The larger error reduction in the U.S. compared to Australia was expected as the

combination included four times more models (eight versus two).

Compared to the error of forecasts from Abramowitz’s model, the RAE of the combined

forecast was 0.84, which means that forecast combining reduced error by 16% compared to

the single model that performed best in retrospect. Thus, even if one knew what would be the

best model, it was better to use the combined forecast.

Use more of the important variables: Knowledge models

Similar to the tests of combining forecasts, the benefits from using more important variables

in one model could be tested only for U.S. and Australian elections. While the conservative

guideline is to include all important variables in the forecasting model (a “knowledge model”),

it is important to note that our test was limited to the variables from the respective countries’

election models. We would expect larger gains in accuracy when more of the relevant causal

variables are included.

Table 4 shows the error reductions achieved by using all of the variables used by the experts

in models that weight each of the variables equally. In the Australian case, the model included

a total of six variables: the five variables used by Cameron & Crosby [5], plus one additional

variable—a different measure of unemployment—used by Jackman [6]. The other two vari-

ables in the Jackman model, inflation and “honeymoon”, are also in the Cameron and Crosby

model. Across the 22 elections, the “all-variables” model forecasts had an average error of 2.35

percentage points, which is lower than the error of each of the individual model forecasts.

Compared to the typical model, the more-variables model reduced error by 10%, and 8%

respectively compared to the best individual model.

Table 4. Effect of using all variables in an equal-weights knowledge model on forecast errors relative to original

forecast errors.

MAE

(original model)

RAE

(knowledge model)

Australia (6-variable knowledge model),

22 elections from 1951 to 2004

MAE of knowledge model forecast 2.35

Cameron & Crosby [5] 2.68 0.88

Jackman [6] 2.54 0.92

Mean (typical model) error 2.61 0.90

U.S., (24-variable knowledge model),

15 elections from 1956 to 2012,

MAE of knowledge model forecast 1.32

Abramowitz [17] 1.76 0.75

Campbell [18] 1.99 0.66

Cuza

´n [16] 2.07 0.64

Erikson & Wlezien [21] 2.54 0.52

Fair [3] 2.49 0.53

Holbrook [20] 2.55 0.52

Lewis-Beck & Tien [19] 2.29 0.58

Lockerbie [22] 2.73 0.48

Mean (typical model) error 2.30 0.57

https://doi.org/10.1371/journal.pone.0209850.t004

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 10 / 14

In the U.S. case, the all-variables model included 24 variables. While the total number of

variables used in the eight models is 28, four variables were excluded: The models of Fair [3]

and Cuza

´n [16] use three identical variables, and Fair’s WWII dummy variable is unnecessary

for our more-variables model since we only examine elections for which data for all eight mod-

els are available, from 1956 onwards. Across those 15 elections, the MAE of the all-variables

model forecasts was 1.32 percentage points, which is lower than the errors of each of the indi-

vidual models. Compared to forecasts from the typical model, the all-variable model reduced

error by 43%. Compared to forecasts from the best individual model, the all-variable model

reduced forecast error by 25%.

Discussion

In this paper, we applied three conservative forecasting guidelines to 19 published regression

models for forecasting election results. The guidelines were: (I) modify effect estimates to

reflect uncertainty, (II) combine forecasts from dissimilar models, and (III) include all vari-

ables that are important in the model.

For the first guideline, we tested two approaches to modifying effect estimates to make

them more conservative: damping and equalizing. Small levels of damping yielded 2% ex ante

forecast error reductions, but higher levels harmed accuracy. Equalizing the regression coeffi-

cients almost always improved forecast accuracy and reduced ex ante forecast error by between

3% and 8% in comparison to the typical original model forecasts.

Armstrong, Green and Graefe suggested that the “optimal approach most likely lies in

between. . . statistically optimal and equal, and so averaging the forecasts from an equal-

weights model and a regression model is a sensible strategy” [1]. The evidence from the present

paper supports that contention. Equalizing of 50%, which is equivalent to the suggested

approach, reduced error for nine out of ten forecasts, with an average error reduction of

7%. In addition to the improved accuracy of the resulting forecasts, the 50–50 rule has

other benefits: it is easy to understand, remember, and apply; simply average the forecast

from the original regression model with the forecasts from an equal-weights version of the

model.

Applying the second guideline—combining forecasts—to eight U.S. models, and to two

Australian models, produced forecasts that were more accurate than those from the individual

model that provided the most accurate forecasts in each case. Compared to the typical individ-

ual model forecast, error was reduced by 36% in the U.S. case and 14% in the Australian case.

The results are thus consistent with the average of 22% error reduction for five comparative

studies from different areas—including forecasts of economic variables—that examined com-

bining across dissimilar causal models [1]. The results are also consistent with the guideline

that forecasters should aim to include all important information in the forecast, rather than

seeking to estimate statistically optimal effect sizes from historical data for a small set of

selected variables. The “combine forecasts from dissimilar models” guideline is an established

strategy for incorporating more information.

The third guideline recommends an alternative approach to incorporating more informa-

tion into a forecast: to use all important variables in the one “knowledge model”. As with com-

bining, knowledge models provided forecasts that were more accurate than even the best

individual model. Compared to the typical forecast, a knowledge model that assigned equal-

weights to all unique variables from the original published models reduced forecast error by

10% in the case of the six-variable Australian model and 43% in the case of the 24-variable U.S.

model. As expected, including more variables that have an important causal relationship with

the variable being forecast impoved forecast accuracy.

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 11 / 14

Our tests found that the strongest implementation of the conservative guidelines, in the

form of knowledge models, provided the greatest improvement in ex ante forecast accuracy.

That the knowledge models simply applied equal weights to standardized causal variables

suggests that regression estimated weights contribute less to a model’s descriptive power,

or realism, in practice than does including more of the variables that are known to be

important.

Implementing the conservative guidelines offers more than simply improved ex ante fore-

cast accuracy—as practically useful as that is. Knowledge models, for example, which include

all important variables, also offer greater validity. First, the models are consistent with theory

and knowledge and produce smaller forecast errors than competing models. Second, the mod-

els include more causal variables and thereby provide a more complete representation of

domain knowledge. Forecasters who use knowledge models must have extensive domain

knowledge in order to select all relevant variables and code the direction (and potentially the

relative strengths) of their effects. Hence, they need to (i) study prior theories to identify which

variables likely have an effect and (ii) rely on findings from experimental research. They should

also (3) consult other experts to ensure that important knowledge has not been overlooked.

The gains from combining forecasts and from using more of the important variables were

achieved for election forecasting models that, for the most part, used similar variables. We

expect that further gains in accuracy and model realism could be achieved by incorporating

variables that measure other important effects on voting, such as candidates’ prior experience

[33] and their issue-handling competence and leadership skills [4].

Many forecasters are wary of incorporating a large number of variables into a model,

regarding parsimony as an important quality of a forecasting model [34]. Models that use

fewer variables likely put fewer demands on the forecaster than identifying and using all rele-

vant knowledge and information. But is parsimony in the use of knowledge and information a

good strategy in developing a forecasting model? Our findings suggest otherwise. Moreover,

by assigning equal weights to variables, knowledge models are arguably more parsimonious

than MRA models, because equal weights models need meet none of the many and onerous

statistical assumptions that must be—but are rarely—met for regression analysis.

Conclusions

The strict assumptions of regression analysis are seldom met in practice. As a consequence, the

question of which method should be used for developing a forecasting model cannot be settled

by asserting the superior statistical properties of an optimal regression model. Damping—for

which the results were mixed—aside, the error reductions of between 3% and 43% found in

the study reported in this paper support the contention that for practical forecasting problems,

models developed by following conservative forecasting guidelines are likely to provide fore-

casts that are more accurate than those from the original econometric models.

Forecasters who value forecast accuracy should endeavor to include all important variables

in a model. The variables should be assumed to be equally important in the absence of prior

experimental evidence.

The gains in accuracy reported in this paper were achieved for election forecasting, a prob-

lem that involves little uncertainty and only modest complexity. Larger gains in forecast accu-

racy might be possible when the Golden Rule of Forecasting guidelines are applied to complex

problems that involve much uncertainty. Such problems include forecasting election outcomes

in more volatile political jurisdictions, but also less-structured problems, such as forecasting

the onset of political conflicts, the costs and benefits of government policies, and the long-term

economic growth of nations. Further empirical studies on the value of applying the Golden

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 12 / 14

Rule of Forecasting to such problems would help to assess the conditions under which the

guidelines improve accuracy.

Supporting information

S1 Table. Relative absolute error (RAE) of forecasts from damping compared to forecasts

from the original regression models.

(DOCX)

S2 Table. Relative absolute error (RAE) of forecasts from equalizing compared to forecasts

from the original regression models.

(DOCX)

Acknowledgments

We thank Paul Goodwin, Randy Jones, and Keith Ord for helpful reviews. Amy Dai, Hester

Green, and Lynn Selhat edited the paper. We also received helpful suggestions when present-

ing an early version of the paper at the 2014 APSA Annual Meeting in Washington, DC.

In producing this paper, we endeavored to conform with the Criteria for Science Checklist

at GuidelinesforScience.com. At least one of the authors read each of the papers we cited. We

were able to contact the authors of 20 of the 24 papers that we cite to ask if we had correctly

represented their work. We received replies from the authors of 13 of those papers, which led

to changes to our descriptions in two instances. Each of the references in this paper is linked to

a full-text version, thus making it easy to confirm that the description of findings in our paper

agrees with that provided in the original version.

Author Contributions

Conceptualization: Andreas Graefe, Kesten C. Green, J. Scott Armstrong.

Data curation: Andreas Graefe.

Formal analysis: Andreas Graefe.

Methodology: Andreas Graefe, J. Scott Armstrong.

Project administration: Andreas Graefe.

Supervision: Kesten C. Green, J. Scott Armstrong.

Validation: Andreas Graefe, Kesten C. Green, J. Scott Armstrong.

Writing – original draft: Andreas Graefe, Kesten C. Green, J. Scott Armstrong.

Writing – review & editing: Andreas Graefe, Kesten C. Green, J. Scott Armstrong.

References

1. Armstrong JS, Green KC, Graefe A. Golden rule of forecasting: Be conservative. Journal of Business

Research. 2015; 68(8):1717–31.

2. Armstrong JS, Green KC. Forecasting methods and principles: Evidence-based checklists. Journal of

Global Scholars of Marketing Science. 2018; 28(2):103–59.

3. Fair RC. Presidential and congressional vote-share equations. American Journal of Political Science.

2009; 53(1):55–72.

4. Graefe A. Issue and leader voting in US presidential elections. Electoral Studies. 2013; 32(4):644–57.

5. Cameron L, Crosby M. It’s the economy stupid: Macroeconomics and federal elections in Australia.

Economic Record. 2000; 76(235):354–64.

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 13 / 14

6. Jackman S. Some more of all that: A reply to Charnock. Australian Journal of Political Science. 1995;

30(2):347–55.

7. Be

´langer E

´, Godbout J-F. Forecasting Canadian Federal Elections. PS: Political Science & Politics.

2010; 43(4):691–9. Epub 2010/10/01. https://doi.org/10.1017/S1049096510001113

8. Nadeau R, Blais A. Explaining election outcomes in Canada: economy and politics. Canadian Journal

of Political Science/Revue canadienne de science politique. 1993; 26(4):775–90.

9. Bellucci P. Election cycles and electoral forecasting in Italy, 1994–2008. International Journal of Fore-

casting. 2010; 26(1):54–67.

10. Lewis-Beck MS, Tien C. Japanese election forecasting: Classic tests of a hard case. International Jour-

nal of Forecasting. 2012; 28(4):797–803.

11. Dassonneville R, Lewis-Beck MS, Mongrain P. Forecasting Dutch elections: An initial model from the

March 2017 legislative contests. Research & Politics. 2017; 4(3):1–7. doi: 2053168017720023

12. Magalhães PC, Aguiar-Conraria L. Growth, centrism and semi-presidentialism: Forecasting the Portu-

guese general elections. Electoral Studies. 2009; 28(2):314–21.

13. Magalhães PC, Aguiar-Conraria L, Lewis-Beck MS. Forecasting Spanish elections. International Jour-

nal of Forecasting. 2012; 28(4):769–76.

14. Toros E. Forecasting elections in Turkey. International Journal of Forecasting. 2011; 27(4):1248–58.

15. Lewis-Beck MS, Nadeau R, Be

´langer E

´. General election forecasts in the United Kingdom: a political

economy model. Electoral Studies. 2004; 23(2):279–90.

16. Cuza

´n AG. Forecasting the 2012 presidential election with the fiscal model. PS: Political Science & Poli-

tics. 2012; 45(4):648–50.

17. Abramowitz A. Forecasting in a polarized era: The time for change model and the 2012 presidential

election. PS: Political Science & Politics. 2012; 45(4):618–9.

18. Campbell JE. Forecasting the presidential and congressional elections of 2012: The trial-heat and the

seats-in-trouble models. PS: Political Science & Politics. 2012; 45(4):630–4.

19. Lewis-Beck MS, Tien C. Election forecasting for turbulent times. PS: Political Science & Politics. 2012;

45(4):625–9.

20. Holbrook TM. Incumbency, national conditions, and the 2012 presidential election. PS: Political Science

& Politics. 2012; 45(4):640–3.

21. Erikson RS, Wlezien C. The objective and subjective economy and the presidential vote. PS: Political

Science & Politics. 2012; 45(4):620–4.

22. Lockerbie B. Economic expectations and election outcomes: The Presidency and the House in 2012.

PS: Political Science & Politics. 2012; 45(4):644–7.

23. Keren G, Newman JR. Additional considerations with regard to multiple regression and equal weighting.

Organizational Behavior and Human Performance. 1978; 22(2):143–64.

24. Graefe A. Improving forecasts using equally weighted predictors. Journal of Business Research. 2015;

68(8):1792–9.

25. Dana J, Dawes RM. The superiority of simple alternatives to regression for social science predictions.

Journal of Educational and Behavioral Statistics. 2004; 29(3):317–31.

26. Cuza

´n AG, Bundrick CM. Predicting presidential elections with equally weighted regressors in Fair’s

equation and the fiscal model. Political Analysis. 2009; 17(3):333–40.

27. Graefe A, Armstrong JS, Jones RJ Jr, Cuza

´n AG. Combining forecasts: An application to elections.

International Journal of Forecasting. 2014; 30(1):43–54.

28. Graefe A, Ku¨chenhoff H, Stierle V, Riedl B. Limitations of Ensemble Bayesian Model Averaging for fore-

casting social science problems. International Journal of Forecasting. 2015; 31(3):943–51.

29. Armstrong JS. Illusions in regression analysis. International Journal of Forecasting. 2012; 28(3):689–

94. https://doi.org/10.1016/j.ijforecast.2012.02.001.

30. Sparks J. The works of Benjamin Franklin. Cambridge: Harvard University; 1844.

31. Wilks SS. Weighting systems for linear functions of correlated variables when there is no dependent

variable. Psychometrika. 1938; 3(1):23–40.

32. Armstrong JS, Collopy F. Error measures for generalizing about forecasting methods: Empirical com-

parisons. International journal of forecasting. 1992; 8(1):69–80.

33. Armstrong JS, Graefe A. Predicting elections from biographical information about candidates: A test of

the index method. Journal of Business Research. 2011; 64(7):699–706.

34. Lewis-Beck MS. Election forecasting: principles and practice. The British Journal of Politics and Interna-

tional Relations. 2005; 7(2):145–64.

Accuracy gains from conservative forecasting

PLOS ONE | https://doi.org/10.1371/journal.pone.0209850 January 10, 2019 14 / 14