PosterPDF Available

KOALA: A new paradigm for election coverage in multi-party electoral systems

Authors:
KOALA: A new paradigm for election coverage
in multi-party electoral systems
Alexander Bauer1, Andreas Bender1, Andr´e Klima1, Helmut K¨uchenhoff1
1Statistical Consulting Unit StaBLab, Department of Statistics, LMU Munich, Germany
Alexander.Bauer@stat.uni-muenchen.de
Motivation
Election poll reporting
What’s the status quo?
Typical election poll reporting:
Share in %
is based on reported party shares
sets the focus on individual party achievements
imparts sample uncertainty only insufficiently
Typical headline:
“The two parties jointly obtain 48% of all votes.”
Real-world Example
Reporting on Union and FDP to jointly obtain a majority before the German federal election 2013
What do we propose?
Proposed type of reporting:
46% 48% 50% 52%
Density
focuses on specific events (e.g. potential majorities)
naturally imparts sample uncertainty using probabilities
prevents misunderstandings by using a holistic approach
Proposed headline:
“The two parties have a probability of 32% to jointly obtain a majority.”
We aim to shift the focus from
Incomprehensive to Uncertainty-based
reported party shares probabilities of events (POEs)
Last pre-election opinion poll: Source: Forsa, 20.09.2013
Union SPD Greens FDP The Left AfD Others
40% 26% 10% 5% 9% 4% 6%
After redistribution of party votes <5%
(i.e. the minimum vote share to enter the German parliament)
Union-FDP jointly obtain exactly 50%.
| {z }
Typical media headline:
“Union-FDP loses its majority”
Source: FAZ.net (2017). Umfrage zur Bundestagswahl: Schwarz-Gelb verliert
die Mehrheit. http://archive.is/I76o3. Accessed 16 July 2018.
ß
Major flaws of this type of reporting:
Misleading conclusions are drawn
A mean share of 50% only means that it’s slightly
more probable to miss a majority
Sample uncertainty is ignored
E.g., with a mean voter share of 5%, FDP will only
enter the parliament with 50%
ß
Foundations of POE-based reporting:
Use event probabilities instead of voter shares
Probabilities comprise sample uncertainty in a natural way
and are less at risk to be misinterpreted
Use event probabilities instead of voter shares
Focusing on the main events allows for easily grasping the
relevant information
| {z }
KOALA headline:
“Union-FDP gains seat majority with 26%,
FDP passes into parliament with 51%
If the election was held today
Methods
Estimating POEs
1
2
3
Multinomial-Dirichlet model for the true party shares θp(Gelman et al., 2013):
(θ1, . . . , θP)TDirichlet(α1, . . . , αP),with α1=. . . =αP=1
2
Given one survey, we obtain a Dirichlet posterior with αp=xp+1
2for each party
p= 1,...,Pand its observed vote count xp.
Using Monte Carlo simulations of election outcomes, we obtain specific POEs by
calculating the events relative frequency of occurrence.
Pooling multiple polls
We aggregate the latest polls within a specific time window (e.g. 14 days) to reduce
sample uncertainty. We adjust the uncertainty of the multinomially distributed summed
number of votes per party by using an effective sample size (Hanley et al., 2003).
1
2
As polls from different polling agencies are correlated, party-specific correlations were
estimated based on 20 surveys of polling agencies Emnid and Forsa, using
Cov(XAp,XBp) = 1
2·(Var(XAp) + Var(XBp)Var(XAp XBp)) ,
with
XAp,XBp the observed vote counts for party pin surveys Aand B,
Var(XAp), Var(XBp) the theoretical variances of binomial distributions,
Var(XAp XBp) estimated from the party share differences.
For simplicity, we set the correlation to a fixed value of 0.5.
The effective sample size neff is then defined as the ratio between the estimated
variance for the pooled sample and the theoretical variance for a sample of size one:
neff =Var(pooled)
Var(sample of size one)
Visualization & Implementation
Selected visualizations
40% 45% 50% 55% 60%
Share of parliament seats
Density
Seat majority no yes
Density plots are used to visualize POEs,
highlighting the area associated with simulations where
the event of interest occurred.
Moreover, such plots highlight
the uncertainty underlying the event of interest
the range of possible outcomes
in a natural and intuitive way.
Jan 2013
Apr 2013
Jul 2013
Election day40% 45% 50% 55% 60%
Share of parliament seats
Seat majority no yes
Ridgeline plots (Wilke, 2017) are used to depict
the development of POEs over time, again
visualizing the uncertainty underlying the event of
interest in a natural way.
Implementation
koala.stat.uni-muenchen.de @KOALA LMU
The R package coalitions (Bender and Bauer, 2018) includes all methods and allows for
their application to any multi-party electoral system.
Our dedicated website and Twitter channel makes current POEs for selected
elections accessible to the general public.
References
Bauer, A. et al. (2018). KOALA: A new paradigm for election coverage. arXiv.org,http://arxiv.org/abs/1807.09665
Bender, A. and Bauer, A. (2018). coalitions: Coalition probabilities in multi-party democracies. Journal of Open Source Software,3(23), 606, https://doi.org/10.21105/joss.00606.
Gelman, A. et al. (2013). Bayesian Data Analysis, 3rd edition. Boca Raton, FL: CRC press.
Hanley, J. A. et al. (2003). Statistical analysis of correlated data using generalized estimating equations: an orientation. American journal of epidemiology,157(4), 364–375.
Wilke C.O. (2017). ggridges: Ridgeline Plots in ’ggplot2’. R package version 0.4.1. URL https://CRAN.R-project.org/package=ggridges
IWSM Bristol 2018
Chapter
We already briefly introduced the Bayesian approach to statistical inference in Chap. 3 and in this chapter we will dive deeper into this methodology. Whole books have been written about the different techniques in Bayesian statistics, which is a huge and very well developed field that we could not hope to cover in a single chapter. For this reason we will focus on major principles and will provide a list of references for deeper exploration.
Article
Full-text available
The method of generalized estimating equations (GEE) is often used to analyze longitudinal and other correlated response data, particularly if responses are binary. However, few descriptions of the method are accessible to epidemiologists. In this paper, the authors use small worked examples and one real data set, involving both binary and quantitative response data, to help end-users appreciate the essence of the method. The examples are simple enough to see the behind-the-scenes calculations and the essential role of weighted observations, and they allow nonstatisticians to imagine the calculations involved when the GEE method is applied to more complex multivariate data.
ggridges: Ridgeline Plots in 'ggplot2
  • C O Wilke
Wilke C.O. (2017). ggridges: Ridgeline Plots in 'ggplot2'. R package version 0.4.1. URL https://CRAN.R-project.org/package=ggridges