ArticlePDF Available

Combined Forecasts of the 2008 Election: The Pollyvote

Authors:

Abstract

At PoliticalForecasting.com, better known as the Pollyvote, the authors combine forecasts from four sources: election polls, a panel of American political experts, the Iowa Electronic Market, and quantitative models. The day before the election, Polly predicted that the Republican ticket's share of the two-party vote would be 47.0%. The outcome was close at 46.6% (as of the end of November). In his Hot New Research column in this issue, Paul Goodwin discusses the benefits of combining forecasts. The success of the Pollyvote should further enhance interest is this approach to forecasting. Copyright International Institute of Forecasters, 2009
Electronic copy available at: http://ssrn.com/abstract=1656063
| 41 | Winter 2009 Issue 12 FORESIGHT
INTRODUCTION
In this year’s presidential election, as in 2004, the
Pollyvote applied the evidence-based principle of
combining all credible forecasts (Armstrong, 2001)
to predict the election outcome. Pollyvote is calculated
by averaging within and across four components, all
weighted equally, to forecast the incumbent party’s
share of the two-party vote. The components were
updated on a daily basis, or whenever new data became
available, and included:
Combined trial-heat polls (using the RCP poll
average from realclearpolitics.com)
A seven-day rolling average of the vote-share
contract prices on the Iowa Electronic Market (IEM)
16 quantitative models
A survey of experts on American politics
PERFORMANCE OF THE POLLYVOTE
Polly’s performance was impressive. From August
2007 through the eve of the election, the Pollyvote
PREVIEW
At PoliticalForecasting.com, better known as the
Pollyvote, the authors combine forecasts from four
sources: election polls, a panel of American political
experts, the Iowa Electronic Market, and quantitative
models. The day before the election, Polly predicted that
the Republican ticket’s share of the two-party vote would
be 47.0%. The outcome was close at 46.6% (as of the end
of November). In his Hot New Research column in this
issue, Paul Goodwin discusses the benefits of combining
forecasts. The success of the Pollyvote should further
enhance interest is this approach to forecasting.
consistently predicted that Barack Obama would win
the White House – even just following the conventions
when combined polls, poll projections (such as
fivethirtyeight.com), and prediction markets indicated
at times that John McCain was ahead.
The same was true in 2004, when Polly consistently
predicted George Bush as the winner, despite John
Kerry’s short-term lead in polls and markets. This
year’s final Polly forecast, issued on the day before the
election, missed the actual outcome by 0.4 percentage
points. Across the entire forecast horizon, the mean
absolute error (MAE) was 1.6 percentage points. By
comparison, the corresponding percentage point errors
in 2004 were 0.3 and 0.5, respectively.
Comparing the Pollyvote with two other closely
followed indicators, Real Clear Politics’ average on
election eve was off by 0.5 percentage points, and by
1.8 percentage points across the entire forecast horizon.
The ‘original’ IEM (without calculating 7-day rolling
averages), was off by 0.2 and 1.7, respectively. The
RCP wrongly predicted John McCain as the winner on
41 days, and the IEM did so on 10 days.
Interestingly, the performance of the Pollyvote
components was different in 2008, compared with
2004. Ranked in terms of most-to-least-accurate across
the entire forecast horizon, the 2004 ranking was
the IEM’s most accurate, followed by the polls, the
experts, and the quantitative models. This year, again
over the entire forecasting horizon, the models led in
accuracy, followed by the experts, the IEM
and the polls. The finding that the combined
Pollyvote forecasts for the two elections were
almost equally accurate supports the decision
to weight the components equally, rather
than differentially.
Andreas Graefe J. Scott Armstrong Alfred G. Cuzán Randall J. Jones, Jr.
THE POLLYVOTE TEAM
Combined Forecasts of the 2008 Election: The Pollyvote
ANDREAS GRAEFE, J. SCOTT ARMSTRONG, ALFRED G. CUZÁN, AND RANDALL J. JONES, JR.
Forecast Accuracy Measurement
Electronic copy available at: http://ssrn.com/abstract=1656063
| 42 | FORESIGHT Issue 12 Winter 2009
Forecast Accuracy Measurement
CONTACT
Andreas Graefe
graefe@itas.fzk.de
In a change from the previous presidential election,
this year the Pollyvote incorporated damping to reduce
measurement error in polls. This technique makes
forecasts more conservative in situations involving high
uncertainty. Applying it in 2008 seemed appropriate,
because polls have been found to overestimate support
for the front-runner, especially early in the campaign
(Campbell, 1996). Campbell provides a damping
formula, which we used to discount the polls’ spread
between the candidates, proportionate to the time
remaining until
election day. The
longer the time
until the election,
the larger the
discount applied
to the front-
runner’s margin.
Measured over
the entire forecast
horizon, the MAE
for the damped
polls was 2.7 percentage points vs. 1.8 for the original
RCP average. The overall Pollyvote MAE increased
from 1.3 to 1.6. From this result, which ran contrary
to expectations, we conclude that further analysis
is necessary to more effectively apply damping in
election forecasting.
THE POWER OF COMBINING
The number of quantitative models utilized in the
Pollyvote increased in 2008 to 16 from the 10 used in
2004. Some of the new models brought new methods
and data into the mix. For example, Polly added three
models that use a segmentation approach by aggregating
state-level polls, and two others that employ an index
method. One of the latter, the PollyIssues model,
represents an innovation: it assumes that voters
choose the candidate they believe will better handle
the country’s problems (Graefe & Armstrong, 2008).
Adding additional models constructed by different
methods may have been responsible for the superior
performance of the quantitative model component
this year. As has been shown by Armstrong (2001),
combining forecasts is particularly valuable if you
use methods that differ substantially and draw from
different sources of information.
The Pollyvote was designed to demonstrate the
power of combining forecasts. Combining yields a
forecast error which is never larger, and normally
is substantially smaller, than the error of the typical
forecasts of the components. Still, many forecasters
overlook the combining principle, even though more
than thirty studies have shown that it greatly improves
forecast accuracy. A large part of the problem could
be that combining defies intuition. As demonstrated
by Larrick and Soll (2006) in a clever series of
experiments, a majority of highly intelligent people
did not understand the value of combining. As a result,
combining is not used nearly as much as it should be in
forecasting. People simply think that they can forecast
better on their own.
REFERENCES
Armstrong, J.S. (2001). Combining forecasts. In: J. S. Armstrong
(Ed.), Principles of Forecasting: A Handbook for Researchers
and Practitioners, Norwell, MA: Kluwer Academic Publishers,
417-439.
Campbell, J.E. (1996). Polls and votes: The trial-heat presidential
election forecasting model, certainty, and political campaigns,
American Politics Quarterly, 24, 408-433.
Graefe, A. & Armstrong, J.S. (2008). Forecasting elections from
voters’ perceptions of candidates’ ability to handle issues, Avail-
able at http://www.forecastingprinciples.com/PollyVote/images/
articles/index_us.pdf
Larrick, R.P. & Soll, J.B. (2006). Intuitions about combining
opinions: Misappreciation of the averaging principle, Manage-
ment Science, 52, 111-127.
The Pollyvote
was designed to
demonstrate the power
of combining forecasts.
Many forecasters
overlook the combining
principle, even though
more than thirty
studies have shown
that it greatly improves
forecast accuracy.
... Combining forecasts based on the same method was observed to improve accuracy (Armstrong, 2001), but the emphasis in the literature is on combining forecasts based on different methods (Armstrong, 2001;Bates & Granger, 1969;Clemens, 1990;Graefe, Armstrong, Cuzán & Jones 2009;Makridakis & Winkler, 1983). The rationale for this is that different methods capture different aspects of the phenomenon that are then combined to produce a comprehensive forecast. ...
... The methodology for combining predictions from different methods is simple and is comprised of averaging the separate estimates (Cuzán et al., 2005;Graefe et al., 2009;2014). Averaging smoothens the fluctuations between the estimates, reducing the error to improve accuracy (Graefe et al., 2009). ...
... The methodology for combining predictions from different methods is simple and is comprised of averaging the separate estimates (Cuzán et al., 2005;Graefe et al., 2009;2014). Averaging smoothens the fluctuations between the estimates, reducing the error to improve accuracy (Graefe et al., 2009). While the final prediction exhibited satisfactory accuracy, the theoretical underpinning for the selection of the methods was not clear. ...
Article
Full-text available
In this article, we report the results of a study that tested a values-based method of predicting political election results. The study was carried out on the 2014 New Zealand General Election, randomly selecting a stratified sample from a consumer panel. The survey of 858 participants used open-ended questions to invoke and capture values relevant to the election. By using corpus linguistic analysis techniques, terms were ranked by weighting based on a log-frequency entropy method. Lexicons for Lasswell and Kaplan’s societal value framework reduced the corpus of term-weighted documents to a workable number of eight user-defined societal value-topics. The topics were regressed onto the individual voting decision using a multinomial logit (MNL) regression. The mean absolute deviation (MAD) from the actual vote was 1.8%, much less than the margin of error of 3.5% expected from sampling error alone. The methodology was successful in predicting the outcome for the minor parties with good accuracy, for example, the prediction for the then newly formed Internet-Mana was out by about 0.5%. The framing-balanced, value-based predictions exhibited reasonable stability, considering they were made six weeks before Election Day. Thus the values relevant to the voters and a good prediction of the voting behavior became evident ahead of the official campaign period, which started four weeks before Election Day in New Zealand. Our study concluded that the value-based prediction shows promise for improving the quality of political journalism and public engagement in the period of election campaigns, and will assist greatly in focusing public debate more on values that are influential on citizens’ voting decisions.
... In other words, weighing them equally, we average the forecasts within each method; then, again using equal weights, we average the within-method averages across the different methods. This is the same approach that the PollyVote has successfully used to forecast U.S. presidential elections in 2004 (Cuzán, Armstrong, and Jones 2005), 2008 (Graefe et al. 2009(Graefe et al. 2014a, as well as the 2013 ...
... It forecast a popular vote victory for Barack Obama over the 14 months that it was making daily forecasts. On Election Eve the PollyVote predicted that Obama would receive 53.0% of the popular two-party vote, an error of 0.7 percentage points (Graefe et al. 2009). ...
Chapter
Full-text available
The PollyVote uses evidence-based techniques for forecasting the popular vote in presidential elections. The forecasts are derived by averaging existing forecasts generated by six different forecasting methods. In 2016, the PollyVote correctly predicted that Hillary Clinton would win the popular vote. The 1.9 percentage-point error across the last 100 days before the election was lower than the average error for the six component forecasts from which it was calculated (2.3 percentage points). The gains in forecast accuracy from combining are best demonstrated by comparing the error of PollyVote forecasts with the average error of the component methods across the seven elections from 1992 to 2016. The average errors for last 100 days prior to the election were: public opinion polls (2.6 percentage points), econometric models (2.4), betting markets (1.8), and citizens’ expectations (1.2); for expert opinions (1.6) and index models (1.8), data were only available since 2004 and 2008, respectively. The average error for PollyVote forecasts was 1.1 percentage points, lower than the error for even the most accurate component method.
... The PollyVote has been used to forecast the popular vote in the four U.S. presidential elections in 2004 (Cuzán, Armstrong, & Jones, 2005), 2008 (Graefe, Armstrong, Jones, & Cuzán, 2009), 2012 (Graefe, Armstrong, Jones, & Cuzán, 2014a), and 2016 (Campbell et al., 2017). In addition, the method has been tested retrospectively for the three elections from 1992 to 2000 (Graefe et al., 2014b). ...
Article
Full-text available
The present study reviews the accuracy of four methods (polls, prediction markets, expert judgment, and quantitative models) for forecasting the two German federal elections in 2013 and 2017. On average across both elections, polls and prediction markets were most accurate, while experts and quantitative models were least accurate. The accuracy of individual forecasts did not correlate across elections. That is, methods that were most accurate in 2013 did not perform particularly well in 2017. A combined forecast, calculated by averaging forecasts within and across methods, was more accurate than two out of three component forecasts. The results conform to prior research on US presidential elections in showing that combining is effective in generating accurate forecasts and avoiding large errors.
... International Journal of Applied Forecasting (Cuzán et al., 2005;Graefe et al., 2009Graefe et al., , 2013 A more sophisticated approach to increasing poll accuracy is to calculate "poll projections", as we term them. Poll projections take into account the historical record of the polls when making predictions of the election outcome. ...
Article
Full-text available
We summarize the literature on the effectiveness of combining forecasts by assessing the conditions under which combining is most valuable. Using data on the six US presidential elections from 1992 to 2012, we report the reductions in error obtained by averaging forecasts within and across four election forecasting methods: poll projections, expert judgment, quantitative models, and the Iowa Electronic Markets. Across the six elections, the resulting combined forecasts were more accurate than any individual component method, on average. The gains in accuracy from combining increased with the numbers of forecasts used, especially when these forecasts were based on different methods and different data, and in situations involving high levels of uncertainty. Such combining yielded error reductions of between 16% and 59%, compared to the average errors of the individual forecasts. This improvement is substantially greater than the 12% reduction in error that had been reported previously for combining forecasts.
Article
Full-text available
The PollyVote’s Long-Term Forecast for the 2017 German Federal Election - Volume 50 Issue 3 - Andreas Graefe
Article
Full-text available
The PollyVote Forecast for the 2016 American Presidential Election - Volume 49 Issue 4 - Andreas Graefe, Randall J. Jones, J. Scott Armstrong, Alfred G. Cuzán
Article
Full-text available
Averaging estimates is an effective way to improve accuracy when combining expert judgments, integrating group members' judgments, or using advice to modify personal judgments. If the estimates of two judges ever fall on different sides of the truth, which we term bracketing, averaging must outperform the average judge for convex loss functions, such as mean absolute deviation (MAD). We hypothesized that people often hold incorrect beliefs about averaging, falsely concluding that the average of two judges' estimates would be no more accurate than the average judge. The experiments confirmed that this misconception was common across a range of tasks that involved reasoning from summary data (Experiment 1), from specific instances (Experiment 2), and conceptually (Experiment 3). However, this misconception decreased as observed or assumed bracketing rate increased (all three studies) and when bracketing was made more transparent (Experiment 2). Experiment 4 showed that flawed inferential rules and poor extensional reasoning abilities contributed to the misconception. We conclude by describing how people may face few opportunities to learn the benefits of averaging and how misappreciating averaging contributes to poor intuitive strategies for combining estimates.
Article
Full-text available
This article revises, updates, and examines the background for a highly accurate model for forecasting the national two-party popular vote in presidential elections. The model provides a vote prediction in early September based on Gallup trial-heat or presidential preference polls and the (nonannualized) rate of economic growth in the second quarter of the election year. It is estimated over the 12 presidential elections from 1948 to 1992. The mean absolute error of the model's out-of-sample postdictions is less than 1 1/3 percentage point, and its actual error in predicting the 1992 vote was about half a percentage point. The article also assesses the reasons for confidence in the model, as well as an approach to gauging uncertainty in any specific forecast. The reasons presidential elections can be forecast at or before the beginning of the general election campaign also are explored. Finally, the forecasting model is applied to the 1996 presidential campaign between Clinton and Dole.
Book
Principles of Forecasting: A Handbook for Researchers and Practitioners summarizes knowledge from experts and from empirical studies. It provides guidelines that can be applied in fields such as economics, sociology, and psychology. It applies to problems such as those in finance (How much is this company worth?), marketing (Will a new product be successful?), personnel (How can we identify the best job candidates?), and production (What level of inventories should be kept?). The book is edited by Professor J. Scott Armstrong of the Wharton School, University of Pennsylvania. Contributions were written by 40 leading experts in forecasting, and the 30 chapters cover all types of forecasting methods. There are judgmental methods such as Delphi, role-playing, and intentions studies. Quantitative methods include econometric methods, expert systems, and extrapolation. Some methods, such as conjoint analysis, analogies, and rule-based forecasting, integrate quantitative and judgmental procedures. In each area, the authors identify what is known in the form of `if-then principles', and they summarize evidence on these principles. The project, developed over a four-year period, represents the first book to summarize all that is known about forecasting and to present it so that it can be used by researchers and practitioners. To ensure that the principles are correct, the authors reviewed one another's papers. In addition, external reviews were provided by more than 120 experts, some of whom reviewed many of the papers. The book includes the first comprehensive forecasting dictionary.