Conference PaperPDF Available

On disaggregation of voter results

Authors:

Abstract

Disaggregation of data will not produce accurate results on the individual level in every instance, but aggregate-level relationships can be quite useful for the context of lower-level phenomena. By adding postal code information to regression analysis a particular interpolation of voting station results is possible. Using empirical data and referring to a proposed logical mechanism of voter transition, a case can be made for low level background analysis. The novelty is in the application of the proposed mechanism, and in demonstrating that voter mobility has little influence on voting patterns.
On disaggregation of voter
results
P A P E R A T T H E O C C A S I O N O F T H E 1 3 T H D U T C H - B E L G I A N
P O L I T I C A L S C I E N C E C O N F E R E N C E ( P O L I T I C O L O G E N E T M A A L ) , 1 2 -
1 3 J U N E 2 0 1 4
Joost Smits, Political Academy, Amster dam
14 June 2014 1
ABSTRACT
Disaggregation of data will not produce accurate results on the individual level in every instance, but aggregate-level
relationships can be quite useful for the context of lower-level phenomena. By adding postal code information to
regression analysis a particular interpolation of voting station results is possible. Using empirical data and referring to a
proposed logical mechanism of voter transition, a case can be made for low level background analysis. The novelty is in
the application of the proposed mechanism, and in demonstrating that voter mobility has little influence on voting
patterns.
INTRODUCTION | ONE
A logical mechanism of change and stability in multiparty elections can be formulated (Smits 2014c), which links different
distribution patterns, and provides thus options to calculate transitions, geographical distribution, competitors of political
parties or their counterparts.
For researchers it is a challenge to find underlying descriptive characteristics. Political parties are interested, since those
characteristics, and their distribution over neighbourhoods, streets, and postal codes, are instrumental for micro-targeting.
If the message to voters can be differentiated, election campaigns will be more effective, and the available budget can
be allocated more efficiently.
However, when voting results in voting stations are disaggregated to streets or postal codes, obvious problems of
ecological fallacy need to be addressed. We have one fortune: we do not need to estimate the real proportion of voters
in a neighbourhood or street, but only to rank neighbourhoods or streets. When statements on single streets would be
difficult, or need a lot of extra calculations and precautions, we would be happy with a fitting pattern that covers as
much of the original (unobserved) distribution as possible. When micro-targeting 50.000 households, it would not be
problematic if a certain amount is falsely addressed, as long as most messages arrive at households that are susceptible
to the message.
We can inventory which forms of ecological fallacy exist, and see which assumptions are connected to disaggregate
voting station results to lower levels.
There are two matters that need attention prior to that. The first is so-called voter mobility. In the Netherlands voters are
free to vote in any station in their municipality. If that disturbs voting patterns, disaggregation is out of the question
anyway.
The second matter is linked to ecological inferences, but is on a recent event, and a data analysis can be presented.
VOTER MOBILITY | TWO
Before dealing with disaggregation of voter results we need to treat the question that is always raised in discussions
about statistical inferences on voters in the Netherlands: how about voter mobility. An understandable question, because
since 2010 voters are not required to go to a specific voting station. They receive a voting pass, on which one voting
station is indicated, usually based on the neighbourhood they live in according to Statistics Netherlands' (CBS) mapping.
1 Presented at the 13th Dutch-Belgian Political Science Conference (Politicologenetmaal) at Maastricht University, 12-13 June 2014,
Workshop 4: Dealigned Electorates - Short-Term Vote Choice Determinants. Chaired by Ruth Dassonneville and Joost van Spanje,
discussant of the paper was Marc Hooghe. This 14 June version contains minor improvements, and one extra chapter suggested by
discussant. Figure 5 added on 23 June 2014, to clarify position on linearity.
On disaggregation of voter results
Page 2
Van Driel and De Jong (2014) write: "There must be sufficient polling stations where voters can vote, and therefore the
street pattern, population density, and the turnout in previous elections are important starting points". A rule of thumb is that
a maximum of about 1,000 voters per polling station can be processed with manual vote counting, and around 2,000
when counting electronically. After the problematic elections held in March 2010 in Rotterdam the maximum of voters
was diminished there to 800 (Hofstra 2014).
For individual voters, there may be other voting stations closer to their home. Voters need to vote in their own
municipality, but they can vote in a mall while shopping, or on a train station when commuting. The city of Nijmegen
registered the number of voting passes that were gathered in "foreign" stations, and noted it was as high as 29.5%
(Feldkamp 2010, 18).
If 29.5% of all voters vote in another neighbourhood, how could one infer any conclusions from the pattern?
We can answer this in two ways. First with an example, then with a computer simulation.
Let's assume a city has only three voting stations. A researcher investigates Party X. Unobserved by the researcher there
happen to live 400 voters of Party X near Station A, 200 near Station B and 100 near Station C. They have those
stations printed on their voting passes. Let's say the enormous proportion of 50% votes in a different station. But for the
same party: Party X. We assume for this example that they distribute equally. So, 50% of voters of Station A, is 200
voters, distribute over Station B and C. 50% Voters of Station B, is 100 voters, distribute over Station A and C. And the
same for Station C. In the end Station A will have 275 votes for Party X, Station B has 225 and Station C has 200.
Although the pattern has levelled out, that does not matter for covariance applications, since Station A still holds the most
voters for Party X, Station B is in the middle, and Station C is lowest.
In a simulation we can have for example 200 voting stations, with a very simple pattern as in Figure 1. In an algorithm
they should be seen as a stack. A percentage of the stack is chosen to go and make the same vote (Party X) in a random
other station (which is different from the previous example). In Figure 2 to Figure 4 can be seen that the pattern holds.
FIGURE 1: SIMPLE DISTRIBUTION PATTERN VOTERS PARTY X
(UNOBSERVED)
FIGURE 2: 5% MOBILITY OF VOTERS TO OTHER STATIONS
(OBSERVED OUTCOME)
FIGURE 3: 25% MOBILITY OF VOTERS TO OTHER
STATIONS(OBSERVED OUTCOME)
FIGURE 4: 50% MOBILITY OF VOTERS TO OTHER
STATIONS(OBSERVED OUTCOME)
The question about voter mobility is understandable, since it is counterintuitive that voting patterns hold despite voter
mobility. However, it is not merely an assumption that this pattern holds. The only general assumption may be that the
0
200
400
600
800
1000
1200
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
145
154
163
172
181
190
199
votes
voting stations
partyX
0
200
400
600
800
1000
1200
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
145
154
163
172
181
190
199
votes
voting stations
5%mob
0
100
200
300
400
500
600
700
800
900
1000
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
145
154
163
172
181
190
199
votes
voting stations
25%mob
0
100
200
300
400
500
600
700
800
900
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
145
154
163
172
181
190
199
votes
50%mob
On disaggregation of voter results
Page 3
mobility itself does not have a certain pattern, for example that voters in low income neighbourhoods are less mobile
than high income neighbourhoods, or that young voters are more mobile than older voters. An investigation of the
Nijmegen data could clarify that, although there is also the possibility that Nijmegen distributed its stations poorly. The
turnout fluctuates severely between stations: from 6.50% to 152.96% at the municipal elections of 2010, and from 27%
to 479% at the parliamentary elections in 2012.
Also note that the voter numbers near stations A-C are unobserved. What is observed is the outcome in stations A-C, the
pattern ("most voters for Party X live near station A"), and the researcher can count "foreign" voting passes in stations.
GEENPEIL | THREE
European elections were from 22 to 25 May this year. In the Netherlands those happened on 22 May, but contrary to
2004 and 2009, the Dutch outcome was prohibited to be published before the last voting booths closed (in Italy, on 25
May).
An influential weblog, GeenStijl (No Style, motto "tendentious, unfounded, and gratuitously offensive"), proposed on 30
April to break that embargo in an event baptised No Style Vote Counting (Stijlloos Stemmentellen, later GeenPeil, No
Poll).2 The plan was to ask blog readers to visit a voting station, and make note of the outcome of that station. From all
reports a national results would be constructed, to be published in draft on election night, and a final one on the day
after the elections (Friday).
EU-Commissioner Reding (Justice, Fundamental Rights and Citizenship) sent a letter to the Dutch Minister for the Interior
Plasterk to ask about the legality, since it would violate the European embargo rule. Weeks earlier Plasterk had sent a
letter to all municipalities reminding them not publish any results before 25 May 23:00.
But, as GeenStijl was well prepared, Minister Plasterk had to reply to Commissioner Reding that Dutch vote law obliged
voting stations to read out loud all outcome results after counting the votes.3 The letter from Reding generated 100 extra
volunteers.4 The city of Rotterdam had already issued instructions to voting stations to co-operate with GeenPeil vote
counters.5 On 22 May 2245 volunteers registered.
On election night 1378 voting station results were reported. GeenStijl had procedures in place to prevent trolling
(reporting false results), and although Dutch law prevents the use of photography of official records on election night,
many voting station chair people allowed direct access to the official results, easily captured with mobile phone cameras.
1287 Results were declared as valid.6
Still, the big challenge rested in calculating the percentage outcome per party from those obvious not-random station
results, not evenly spread across the Netherlands. The GeenPeil method, developed by Bram Fokke en Anna
Grebenchtchikova7, did in its final form not involve previous results, which is interesting.8 They matched the outcome of a
voting station to voting stations with a similar demographic profile of the neighbourhood where they were located.
The predictions and final results are below, in EP-seats, with the official distribution in the last column, and the cumulative
error in the last row:9
2 Original appeal on www.geenstijl.nl/mt/archieven/2014/04/dit_is_geen_hoax_we_menen_het_bloedserieus.html, 30 April 2014.
English explanation in GeenStijl-style on
www.geenstijl.nl/mt/archieven/2014/05/en_niet_zeiken_over_het_engels_stelletje_betweters.html, 23 May 2014
3 Dutch vote law Kieswet, art. N9 lid 1, www.st-ab.nl/wetten/0172_Kieswet_KW.htm, also see
www.geenstijl.nl/mt/archieven/2014/05/minister_plasterk_zegt_stijllo.html, 15 May 2014. Letter sent to Commission:
www.rijksoverheid.nl/ministeries/bzk/documenten-en-publicaties/brieven/2014/05/21/afschriftbrief-over-het-openbaar-maken-
van-de-uitslagen-van-de-europese-verkiezingen.html, 21 May 2014
4 "Serious: thanks Viv!", update below
www.geenstijl.nl/mt/archieven/2014/05/europese_zorgen_om_geenstijl_is_de_allerbeste_headline_ooit.html, 21 May 2014
5
www.rotterdam.nl/Clusters/Dienstverlening/Documenten%202014/Verkiezingen%202014/ep%20verkiezingen%2022%20mei/nieu
wsbrieven/Nieuwsbrief2EVdef.pdf, 15 May 2014, p. 2
6 http://www.geenstijl.nl/mt/archieven/2014/05/stijlloos_nederland_meet_en_weet.html, 23 May 2014
7 www.geenstijl.nl/mt/archieven/2014/05/even_voorstellen_het_team_van_stijlloze_experts.html, 20 May 2014. I am mentioned as
"postal code king", and later thanked on the hilarious election night broadcast
www.youtube.com/watch?v=c_qGDzrsSYU&feature=share&t=3h32m15s
8 According to GeenStijl they did check modelling on previous elections.
www.geenstijl.nl/mt/archieven/2014/05/geenpeil_totalen_bronbestanden.html
9 www.geenstijl.nl/mt/archieven/2014/05/dag_van_de_waarheid.html, 25 May 2014, Database elections
www.verkiezingsuitslagen.nl/Na1918/Verkiezingsuitslagen.aspx?VerkiezingsTypeId=5. Ipsos was the agency employed by public
television do predict outcome on 22 May. De Hond is a famous Dutch pollster, with regular updates of election polls via www.peil.nl.
On disaggregation of voter results
Page 4
TABLE 1: PREDICTIONS EP2014, SEATS
As can be seen, GeenPeil got quite close to the final seat distribution. Closer than official bureaus Ipsos and De Hond,
who used more traditional forms of analysis and experiment-setup. NRC Handelsblad, one of the leading quality
newspapers, called it such on Monday 26 May.10
But in percentage they were far removed from reality, when judged by percentages:
TABLE 2: PREDICTIONS EP2012, PERCENTAGES
The chief editor of GeenStijl11, Rob Goossens, acknowledged on Twitter that they probably were quite lucky to get the
seats right.12 A data analyst of RTL-Nieuws (commercial television) had also entered his prediction, and got closer than
any of the other bureaus.13 Although that was, according to himself, a "totally unscientific guess", it was based on a
simple model.14
Which does leave us quite unsatisfied, which is the right emotion to investigate what happened. Although we must be
thankful to GeenStijl to provide a freely available very large sample of voting station data, which is useful for future
research.
ECOLOGICAL FALLACIES | FOUR
Robinson broke the ground for "ecological fallacy" considerations to warn that strong aggregate-level relationships are
not necessarily reproduced at the individual level. There was no escape: although it might be theoretically possible that
ecological and individual correlations can validly be substituted "the conditions under which this can happen are far
10 "GeenStijl dichtst bij uitslag", NRC Handelsblad, 26 May 2014, p. 7
11 Pun intended, editor refers here to Roman "editor", the sponsor of gladiators. Goossens is an important contributor to GeenStijl, and
editor for DasKapital, www.daskapital.nl
12 Tweet of Rob Goossens, 26 May 2014, 01:27, www.twitter.com/GoosR/status/470843547042983937
13 Frank Tiesken, https://twitter.com/franktieskens. When judged by preliminary figures on 26 May he scored best, see
https://twitter.com/GoosR/status/470842395677167617. With final results he scored a cumulative difference of 3.6%, just worse
than Ipsos. Assuming he focused on the main parties, the cumulative error of Ipsos is 2.8%, and Tiesken scores 2.6% (excluding the
"rest" category).
14 Tweet of Frank Tiesken, 23 May 2014, 02:40, www.twitter.com/franktieskens/status/469774774227697664 . Comment on simple
model by Bram Fokke by email on 9 June 2014.
GeenPeil De Hond Ipsos Final
D66 5 4 4 5
CDA 3 (bias: 4) 5 4 4
PVV 4 (bias: 3) 3 3 4
VVD 3 3 3 3
SP 3 (bias: 2) 3 3 2
PvdA 3 3 3 3
CU/SGP 2 2 2 2
GroenLinks
2 2 2 2
PvdD 1 1 1 1
50+ 0 (bias: 1) 0 1 0
rest 0 0 0 0
total 26 26 26 26
cum.error 2 4 4
GeenPeil De Hond Ipsos Final
D66 17.3% 15.1% 15.6% 15.5%
CDA 12.6% 14.8% 15.2% 15.2%
PVV 12.6% 12.9% 12.2% 13.3%
VVD 11.8% 12.4% 12.3% 12.0%
SP 9.6% 9.9% 10.0% 9.6%
PvdA 9.9% 10.3% 9.4% 9.4%
CU/SGP 6.6% 8.0% 7.8% 7.7%
GroenLinks
8.7% 7.5% 7.3% 7.0%
PvdD 4.6% 4.2% 4.2% 4.2%
50+ 3.3% 3.7% 4.2% 3.7%
rest 3.0% 1.2% 1.8% 2.4%
total 100.0% 100.0% 100.0% 100.0%
cum.error 10.0% 4.8% 3.4%
On disaggregation of voter results
Page 5
removed from those ordinarily encountered in data" (Robinson 1950, 357). "Ordinarily" being the operative word when
considering disaggregation.
Taylor and Johnston were much less absolute. Inferences about individuals from ecological correlations should not be
made, unless based in logical hypotheses and backed up by other information. And one should realise that any
ecological correlation is relevant only to that particular set of population aggregates. They illustrate that with a
regression analysis for 10 New Hampshire counties, and a regression analysis for 9 census divisions, which has a
substantially different coefficient. Conclusion: "the trends within New Hampshire are not the same as those across the census
divisions." Then they proceed to show that different regressions can be calculated, on the same data, when only the
design is changed (Taylor and Johnston 1979, 8788).
According to King (1997, xv) ecological inferences are simply required in political science research when individual-level
surveys are unavailable, unreliable, insufficient or unfeasible. That was confirmed by Voss (2004) who found that if one
attempts to use survey data, the amount and selection of survey respondents produce insufficient variation in residential
environments. The "ecological inference problem" has held back research agendas (King 1997, xv). King jokes about the
mantra "Thou shalt not draw conclusions about individual behaviour from aggregate data" (1997, 6). Wu (2007, 122)
states that Robinson's article misled scholars, to believe "individual-level models are always better specified and more
accurate than aggregate-level models, aggregate-level relationships are always intended as substitutes of individual-level
relationships, and aggregate-level variables have no relevance to causal relationships and mechanistic explanations of
individual-level activities". In fact, according to Wu, aggregate-level relationships can be quite useful for defining the
context, generating potential hypotheses, and identifying the relevance for studying individual-level phenomena.
King contributed in adding statistical validity to unknown data elements, and suggestions to add known data in order to
improve data estimates. The purpose is not to "produce precisely accurate results in every instance" (1997, xv), but to get
better results than "115% of blacks voting for Democrats", or "-4% of foreign born Americans being illiterate" (1997, 7).
Alker (1969) created a typology of ecological fallacies, and distinguishes ecological fallacy, individualistic fallacy,
cross-level fallacy, universal fallacy, selective fallacy, contextual fallacy, cross-sectional fallacy and longitudinal fallacy
according to the level of analysis and the direction of inferences between different levels of analysis.
According to Wu (2007, 122) that typology contained a lot of redundancy. The "individualistic" fallacy is a kind of
inverse "ecological" fallacy, and both are in fact cross-level inference fallacies. The others are not more than "related to
different kinds of sampling and conceptual errors in statistical inferences". Many can be described as problems of scale.
Cressie (1993, 285) reports that that the higher the aggregation, the further the sample correlation between the
aggregated variables is from zero. Although we are not dealing with a sample, nonetheless remains the "modifiable
areal unit" (MAUP). By adding a variable to a table, a previous positive statistical dependence can change to negative.
This new variable can also be just the zoom factor (downscaling/upscaling) (Cressie and Wikle 2011, 12).
Famous is the example of two hospital procedures by Charig et al. (1986).15 They compared open surgery to an
ultrasound treatment of kidney stones. Open surgery is reported with a 78% success rate, and ultrasound with 83%. But,
when concentrating on the different success of small stones (<2 cm) compared to larger ones (>2 cm), small stones had
93% success in open surgery, and only 87% in ultrasound. One might expect that on large stones open surgery would do
worse, to come into line with the lower overall success. But Charig et al. show that open surgery did also better on large
stones (73%), and ultrasound worse (69%).
Key element is that most open procedures were for stones greater than 2 cm in diameter, whereas most of the closed
procedures were performed for smaller stones (Charig et al. 1986, 880). The different scales are part of the
"modifiable area unit" problem, also known as Simpson's Paradox. As explained above voting stations are on purpose
designed to a certain "optimal" scale, which helps in controlling the issue.
Yang (2009) mentions that the only real resolution to the MAUP is to use individuallevel data that are geocoded to a
specific (usually residential) location. Which is difficult because of privacy issues (but not impossible, and to be
investigated in further papers). Alternatively local parameter modeling could be applied, which is then the path to
pursue.
Which does not annul the problem as such, but makes it workable to a certain degree. Given the MAUP, we should also
add population density variables to the regression analysis. Leckie (2013), in his overview of cross-level modelling, shows
how levels of data can be layered and nested, to prevent underestimation, or overstate the importance of sources of
variation.
15 See table II of Charig.
On disaggregation of voter results
Page 6
This being said on the relations between data, as we have a multiparty system in the Netherlands, applying King's
method of bounds is increasingly difficult as the number of parties rises (Park 2008, 5053). King even resorted to
software (Markov Chain Monte Carlo diagnostics) solutions to extend his 2x2 inference strategies to RxC contingency
tables (Rosen et al. 2001). Which are not only computationally intensive (Rosen et al. 2001, 134), but also the non-
linearity comes at a price. Park: "identification is hard to achieve, estimated parameters cannot be straightforwardly
interpreted, and the results may be unstable depending on the assumptions" (Park 2008, 31).
Which is quite fine when we know what we are doing, but do we? We can throw impressive statistical techniques and
software at the problem, which may lead to answers that may not lead to knowledge.
REDUCED EXPECTATIONS | FIVE
When taking a step back at Dutch elections we can see that voter mobility does not change the pattern much, but the
observed party percentage success is affected. The original unobserved party affiliation of voters is distorted by turnout,
mobility, and local (neighbourhood) effects.
David and Van Hamme (2011) quoted authors from Agnew to Vandermotten in their treatment of the spatial voting
patterns and electoral behaviour. Spatial voting patterns have been shown to be relatively stable over time. They recall
also that even strong class cleavages are affected at the local level given certain demographic conditions.
Bochsler (2011) published on territorial determinants. A key to success for new parties in post-communist to enter
parliament rests in the existence of an almost homogeneous level of support across the country. He assumes that studies
that solely focus on the effect of electoral systems at the national level produce very convincing results for most Western
democracies. But one can argue whether this really true for the Netherlands.
The first publication on the collected voting booth results of the 9 June 2010 parliamentary elections was front-page
news: the coalition under deliberation at the time of publication (16 July, it was a lengthy job) of Dutch liberal-
conservatives (VVD), social-democrats (PvdA), social-liberals (D66) and Greens (GroenLinks) mainly had its support in the
western urban regions, particularly in the inner cities (Poort and Verkade 2010).
Also, when Dutch voters get the choice, and choice is influential on voter support (Aarts, van der Kolk, and Rosema 2007),
they vote for local parties. It raised from 23.4% in 2010 to 27.8% in 201416, making local parties bigger than the
biggest national party.
Therefore Bochsler's assumption of homogenous support for national parties in the Netherlands may not hold.
Investigations on sub-national levels are in order.
And even if we can control all these factors in some software or statistical procedure, we are left with variabilities in
voting station results. A station may lay close to another station, or even several others. It may attract a non-
representative voter-population, for example a shopping mall in a dense city neighbourhood which is favoured over
other stations close by, or the opposite, a single station in a vast rural area which is in a municipality with also more
urban neighbourhoods, disturbing the pattern. Or for some other reason a station may have a low or high turnout,
disturbing percentages. There may be a popular local candidate for some party, influencing party share. There may
have been counting problems or other procedural issues (Smits 2014a; Smits 2010).
These stations must be flagged as non-representative, but that can only be done from a certain baseline, which is
affected by aforementioned matters. It may even be argued that voting station results are fine as aggregate
information, but even with a high number of stations, random or non-randomly selected, it is doubtful if that will ever
lead to a realistic national forecast. Contrary to a good analysis on this subject (Bethlehem 2014).
Therefore, expectations should be reduced. We should not reach for a national pattern in voting station outcome and
whichever statistical variables or methods, but go for a patchwork of local and regional patterns. Park (2008, 29 ff.)
explored that we should not expect "better data", but at least we could work on a framework of data and method that
gives insight in voting patterns.
LOW LEVEL ANALYSIS | SIX
The proposed logical mechanism (Smits 2014c) describes that voter transition basically has two forms: parties gaining or
losing vote-share within the existing pattern of popularity (within their strongholds), and parties gaining or losing votes
while breaking that pattern. In the first case the correlation coefficient R2 is 1 or close to 1, in the latter case it is far from
1. Parties usually lose voters with apparently similar characteristics to competitors, or win them: R2 is normally close to 1.
16 Data from Database election results (Databank Verkiezingsuitslagen) by the Electoral Council, www.verkiezingsuitslagen.nl, checked
on 6 June 2014
On disaggregation of voter results
Page 7
When R2 is far from 1 would mean parties exchange voters with parties who are very different, called "counterparts",
who usually have voters with very different characteristics.
It also brings up the question why as Park stated, multi-partism "causes" non-linearity (Park 2008, 28). Indeed, when two
parties are projected on the same X-axis, and their covariance is not 1 or -1, the second (third, fourth, etc.) party appear
"bent". See Figure 5, the equation is of the PvdA-line of least differences. Which leads King to an iterative process of
rounds of estimation of binary choices (Park 2008, 51; Rosen et al. 2001) when calculating with multiple parties.
FIGURE 5: CURVES OF LEEFBAAR ROTTERDAM (LR, LIVABLE
ROTTERDAM), AND PVDA (SOCIAL DEMOCRATS ROTTERDAM),
MUNICIPAL ELECTION 2014
Is the logical mechanism too simple in explaining that when a party loses votes to another party away from the linear
relation, it is simply because apparently the influence of that second party is over different characteristics (different
dimensions, a different vector)? R2 drops, but what knowledge can be gained from assuming a non-linear relation? There
may be a linear explanation (vector of regression coefficients) somewhere, which existence is denied by resorting to
better curve fitting.
The linear correlation in the logical mechanism, borrowed from Johnston (1983) and furthered by Aarts and Horstman
(1991), can be applied on empirical data. Voting station results were aggregated by neighbourhood (Statistics
Netherlands classification) to compare between 2010 and 2014, since numbers and locations change.
FIGURE 6: PVDA ROTTERDAM (SOCIAL DEMOCRAT) VOTE
COMPARED 2014 TO 2010
PvdA used to be the biggest party in city council, except for 2002-
2006, after the Pim Fortuyn revolt (most prominent party leader of
Leefbaar Rotterdam).
FIGURE 7: LEEFBAAR ROTTERDAM (LOCAL PARTY LIVABLE
ROTTERDAM) VOTE COMPARED 2014 TO 2010
Leefbaar Rotterdam is now the biggest party in city council.
y = 3E-05x2- 0,0041x -0,3895
Percentage results vot ing stations (norm alised and sorted on L R)
LR
PvdA
y = 0,5x + 0,0154
R² = 0,9065
0%
5%
10%
15%
20%
25%
30%
35%
0% 10% 20% 30% 40% 50% 60% 70%
Percentage results in 2014
Percentage results in 2010
y = 1,0005x - 0,0119
R² = 0,9074
0%
10%
20%
30%
40%
50%
60%
0% 10% 20% 30% 40% 50% 60%
Percentage results in 2014
Percentage results in 2010
On disaggregation of voter results
Page 8
FIGURE 8: VVD ROTTERDAM (LIBERAL-CONSERVATIVE) VOTE
COMPARED 2014 TO 2010
FIGURE 9: CDA ROTTERDAM (CHRISTIAN DEMOCRAT) VOTE
COMPARED 2014 TO 2010
Evaluation shows most change is within supporters' habitats. CDA in Figure 9 stands out because R2 is only 0.55. Closer
investigation suggests that it is probably party policy. Patterns of CDA in 1998 and 2006 are far apart, but 2014 is
closer to 1998. CDA appears to put non-traditional candidates on the ballot, which attract voters in neighbourhoods they
were popular before the Fortuyn revolt in 2002. But is also seems to come at a price:17
FIGURE 10: VOTE SHARE CDA ROTTERDAM
National elections can be compared in the same way. Voting station results for parliamentary elections were (roughly)
aggregated by neighbourhood (Statistics Netherlands classification, n=5544).
FIGURE 11: PVDA (SOCIAL DEMOCRATS) VOTE COMPARED 2012 TO
2010
FIGURE 12: PVV (FREEDOM PARTY) VOTE COMPARED 2012 TO 2010
17 Also see (Smits 2014b)
y = 0,7678x + 0,0024
R² = 0,9092
0%
5%
10%
15%
20%
25%
30%
0% 5% 10% 15% 20% 25% 30% 35%
Percentage results in 2014
Percentage results in 2010
y = 0,7276x + 0,0102
R² = 0,5502
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
0% 5% 10% 15% 20%
Percentage results in 2014
Percentage results in 2010
0,00%
2,00%
4,00%
6,00%
8,00%
10,00%
12,00%
14,00%
1998
2002
2006
2010
2014
y = 1,101x + 0,03 46
R² = 0,9026
0%
10%
20%
30%
40%
50%
60%
70%
80%
0% 10% 20% 30% 40% 50% 60% 70%
Percentage results in 2012
Percentage results in 2012
y = 0,7128x - 0,00 88
R² = 0,9122
0%
10%
20%
30%
40%
50%
60%
0% 10% 20% 30% 40% 50% 60% 70%
Percentage results in 2012
Percentage results in 2012
On disaggregation of voter results
Page 9
FIGURE 13: VVD (LIBERAL-CONSERVATIVE) VOTE COMPARED 2012
TO 2010
FIGURE 14: CDA (CHRISTIAN DEMOCRAT) VOTE COMPARED 2012
TO 2010
FIGURE 15: SP (SOCIALIST) VOTE COMPARED 2012 TO 2010
FIGURE 16: GROENLINKS (GREENS) VOTE COMPARED 2012 TO 2010
The matching of voting stations to neighbourhoods was quite rough and preliminary, which may have some influence on
these graphs, and their R2. Further statistical statements about the validity are therefore not yet in order. Nevertheless,
there is some suggestion that the Freedom Party (Figure 12) lost many votes, but within their "own" neighbourhoods, to
competitors. Whereas the Greens (Figure 16) got squashed and lost also beyond their "own" neighbourhoods.
If parties usually lose and win voters to and from competitors with apparently similar characteristics, within the logical
mechanism, this similarity needs to be geographic as well. Given the stability over time of these patterns the underlying
characteristics need to be stable in time furthermore.
From the above one can conclude that apparently background variables, and notably demographic variables, should not
be underestimated to determine the location of certain votes, in the description of vote patterns. Other variables are not
overlooked, since explanations of voting behaviour, of individual behaviour, are of big importance. But we can refer to
Park, Wu and King (see above) who argue that aggregate-level relationships can be quite useful for the context of
individual-level phenomena.
We could use disaggregation to interpolate voting station results in postal codes, of which there are 430.000 (1 postal
code for 40 Dutchmen on average). The advantage over other kriging techniques is that it will account for the Dutch
situation, probably resulting from urban planning, where neighbourhoods of different character are divided by main
roads (Mey 1994). We know voting stations in one neighbourhood have different outcome compared to voting stations
on the other side of the main road. By adding postal code information to a regression analysis, we will get the abrupt
expected dividing lines, where other kriging techniques show more gradual flow. This is also supposed to help against the
problem that socially viable units do not correspond completely to formal units (Przeworski 1974, 3435).
Following guidelines from Pyle (1999), amongst others18, a combination of databases can be constructed with land
register data (size of land, location of postal codes), basic postal code characteristics (freely available from Statistics
Netherlands on age, income, household composition, ethnicity, density, etc.), neighbourhood characteristics (same), and
voting results (Parliamentary elections for June 2010 and September 2012 gathered in 2010 and 2012 respectively,
results from municipalities acquired by contacting them one by one19) and pinpointing the geographic location of voting
18 This will be explored in further papers.
19 With thanks to Sebastiaan van Niele of the Political Academy, and thanks to the municipalities who dug up "old" election results and
voting booth addresses.
y = 1,139x + 0,036 1
R² = 0,8596
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
0% 10% 20% 30% 40% 50% 60% 70% 80%
Percentage results in 2012
Percentage results in 2012
y = 0,6452x - 0,00 21
R² = 0,9012
0%
10%
20%
30%
40%
50%
60%
0% 10% 20% 30% 40% 50% 60% 70%
Percentage results in 2012
Percentage results in 2012
y = 0,9958x - 0,0017
R² = 0,8302
-5%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
0% 10% 20% 30% 40% 50%
Percentage results in 2012
Percentage results in 2012
y = 0,4277x - 0,0052
R² = 0,8089
-2%
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
0% 5% 10% 15% 20% 25% 30%
Percentage results in 2 012
Percentage results in 2012
On disaggregation of voter results
Page 10
stations. The numeric data were prepared for correlation analysis by converting them to the same denominator as voting
percentages (referring to Pyle) . Postal codes are linked to voting stations based on proximity.
The level of analysis is now brought down to municipalities. Since at that level any correlation can be found as long as
enough variables are added, the number of variables will be limited. Currently this type of procedure has been done for
over 35 municipalities, for local and parliamentary elections, which results in a series of around 70 analyses (being
independent investigations on separate data). Since often two or more parties are evaluated the number grows above
100. It has been found that 6 variables is enough to describe the pattern for 99% of the parties (so far), and sometimes
less than 6 variables.
We borrow from King the iterative approach. A software will find the 6 (or less) most significant variables in the
covariance of a linear correlation with the voting outcome. This procedure will also deal with inter-dependence of
variables: only the variables that stand out in determining the variance are kept. It will also judge whether and which
voting stations are too far off. In the end it will show on a map which voting stations were left out, and why. It will return
a correlation coefficient R2, and also an F-value, corresponding with the alternative hypothesis of a random correlation. If
F is below a threshold, if R2 is too far from 1, if too many voting stations are left out, if the variables do not all reach
significance, then the correlation is not valid. Since local political parties are users of these analysis, they will be reported
as such.
With the resulting equation (an intermediate product) a map can be created of coloured squares on postal codes. The
colour of the squares corresponds with the rank of the postal code score on five decile groups: lowest two deciles (very
low chance of finding votes for that party), the two just above (low chance), etc. The scores are not informative
themselves, they only correlate in variance with the voting station results on aggregate level.
Since from the logical mechanism can be deduced that those postal code patterns also cover areas where thus far voters
vote for competitors, it opens possibilities to point political parties to regions where they have unrealised potential.20
After all maps are supplied, and an address list of street names with number ranges (f.e. Hoofdstraat 1-11, 1234 AB
Hoofddorp) for micro-targeting, there is contact with the political party which is end user. To see if they agree about the
plausibility of the outcome. So far those conversations have been quite delightful.
PRACTICAL USE | SEVEN21
a. Postal code ma p of voters
The most popular application of the disaggregation are interactive online maps of postal codes. These maps are
enhanced for colour-blind people (8% of all men), and the division is in five colour groups: dark blue for the highest two
deciles, light blue for the 2 deciles after that, white for middle, light red for the second lowest deciles, and red for the
lowest. Grey indicates if there were not enough data to make the calculation. Purple and light purple replace blue and
light blue when there is potential to find new voters.
A working demo of such a map (for Rotterdam Delfshaven) can be found here:
www.politiekeacademie.eu/verborgen-potentieel-delfshaven-sp/
20 "Purple is the magic colour", according to Jan van Loenen of the current affairs news show "Nieuwsuur". See
www.nieuwsuur.nl/onderwerp/604530-partijen-op-de-huid-van-de-kiezer.html, 31 January 2014
21 This chapter was added after discussion at the conference.
On disaggregation of voter results
Page 11
FIGURE 17: POSTAL CODE INFERENCE OF PVDA (SOCIAL DEMOCRAT) IN DELFSHAVEN, ROTTERDAM,
2010
b. Micro-targeting
The postal code map can be translated into address lists, which allow selective addressing of voter subgroups. Addresses
where competitors are active, addresses by location, etc.
FIGURE 18: EXAMPLE OF ADDRESS LISTS WITH SCORES
c. Grid maps
Also the information by postal code can be re-aggregated to 500x500 meter grids. This gives quick overviews of a
fingerprint of party alliance.
FIGURE 19: GRID MAP OF INVALID VOTES IN MUNICIPAL ELECTIONS IN
ROTTERDAM, 2014
d. Comparison
Comparative grid maps quickly show where a party gained or lost votes.
On disaggregation of voter results
Page 12
FIGURE 20: COMPARISON
LITERATURE
Aarts, K., and R. Horstman. 1991. ‘Political Change And The Electoral Geography Of The Netherlands’. In ECPR Joint
Sessions of Workshops, Essex.
Aarts, K., H. van der Kolk, and M. Rosema. 2007. ‘Een Verdeeld Electoraat?’ In Een Verdeeld Electoraat: De Tweede
Kamerverkiezingen van 2006, edited by K. Aarts, H. van der Kolk, and M. Rosema, 23547. Utrecht: Spectrum.
http://doc.utwente.nl/61279/.
Alker, H.R. 1969. ‘A Typology of Ecological Fallacies’. In Quantitative Ecological Analysis in the Social Sciences, edited by
M. Dogan and S. Rokkan, Quantitative ecological analysis in the social sciences Cambridge:6986. MIT Press.
Bethlehem, J. 2014. ‘Europese Verkiezingen 2014: De Drie Prognoses van de Einduitslag Nader Beschouwd’. Stuk Rood
Vlees. http://stukroodvlees.nl/peilingen/europese-verkiezingen-2014-de-drie-prognoses-van-de-einduitslag-
nader-beschouwd/.
Bochsler, D. 2011. ‘It Is Not How Many Votes You Get, but Also Where You Get Them. Territorial Determinants and
Institutional Hurdles for the Success of Ethnic Minority Parties in Post-Communist Countries’. Acta Politica 46 (3):
21738. doi:10.1057/ap.2010.26.
Charig, C., D. Webb, S. Payne, and J. Wickham. 1986. ‘Comparison of Treatment of Renal Calculi by Open Surgery,
Percutaneous Nephrolithotomy, and Extracorporeal Shockwave Lithotripsy.’ British Medical Journal (Clinical
Research Ed.) 292 (6524): 879.
Cressie, N. 1993. Statistics for Spatial Data. Rev. ed. Wiley Series in Probability and Mathematical Statistics. New York:
Wiley. http://books.google.nl/books?id=4SdRAAAAMAAJ.
Cressie, N., and C.K. Wikle. 2011. Statistics for Spatio-Temporal Data. Wiley. http://books.google.nl/books?id=-
kOC6D0DiNYC.
David, Q., and G. van Hamme. 2011. ‘Pillars and Electoral Behavior in Belgium: The Neighborhood Effect Revisited’.
Political Geography 30 (5): 25062. doi:10.1016/j.polgeo.2011.04.009.
Feldkamp, R. 2010. Verkiezing Gemeenteraad 3 Maart 2010 - Uitslagen Gemeente Nijmegen. Nijmegen: afdeling
Onderzoek & Statistiek, Directie Wijk en Stad, gemeente Nijmegen.
http://www.nijmegen.nl/rapportenzoeker/Docs/Gemeenteraad_3 maart 2010.pdf.
Hofstra, P. 2014. ‘Van de Rekenkamer Rotterdam Een Brief Inzake Het Onderzoek Gemeenteraadsverkiezingen’, March
20. http://www.ris.rotterdam.nl/cgi-
bin/showdoc.cgi/action=view/id=165191/14gr710_Van_de_Rekenkamer_Rotterdam_een_brief_inzake_het_
onderzoek_gemeenteraadsverkiezingen..pdf.
Johnston, R.J. 1983. ‘Spatial Continuity and Individual Variability: A Review of Recent Work on the Geography of
Electoral Change’. Electoral Studies 2 (1): 5368. doi:10.1016/0261-3794(83)90106-3.
King, G. 1997. A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data.
Princeton, NJ [etc.]: Princeton University Press. http://gking.harvard.edu/eicamera/kinroot.html.
Leckie, G. 2013. ‘Cross-Classified Multilevel Models’. EMMA VLE, EMMA VLE, Module 12 (Concepts): 160.
http://www.bristol.ac.uk/cmm/learning/course.html.
Mey, M.G. 1994. ‘Het stedelijke mozaïek: een vertaling van de voorkeuren van stedelijke bevolkingscategorieën naar
ruimtelijke milieus’. Delft: Publikatieburo Bouwkunde, Faculteit der Bouwkunde, Technische Universiteit Delft.
Park, W. 2008. Ecological Inference and Aggregate Analysis of Elections. Ann Arbor, MI: ProQuest.
http://deepblue.lib.umich.edu/handle/2027.42/58525.
Poort, A., and Th. Verkade. 2010. ‘Paars Plus Komt Uit de Grote Stad’. NRC Handelsblad, July 16.
http://archief.nrc.nl/index.php/2010/Juli/16/Voorpagina/01/Paars+Plus+komt+uit+de+grote+stad.
Przeworski, A. 1974. ‘Contextual Models of Political Behavior’. Political Methodology 1 (1): 2761.
http://www.jstor.org/stable/25791366.
Pyle, D. 1999. Data Preparation for Data Mining. San Francisco, Calif: Morgan Kaufmann Publishers.
On disaggregation of voter results
Page 13
Robinson, W.S. 1950. ‘Ecological Correlations and the Behavior of Individuals’. American Sociological Review 15 (3): 351.
doi:10.2307/2087176.
Rosen, O., W. Jiang, G. King, and M. Tanner. 2001. ‘Bayesian and Frequentist Inference for Ecological Inference: The R×
C Case’. Statistica Neerlandica 55 (2): 13456.
Smits, J.H.F. 2010. Twee Maal Drie Is Vier, Wiedewiedewiet En Twee Is Negen - Stemmen Met Het Rode Potlood, de
Rotterdam-Case. Berkel en Rodenrijs. http://www.politiekactief.net/files/rdamtweemaaldrie1006.pdf.
———. 2014a. Spookstemmen - Analyse Aan de Hand van de Rotterdamse Uitslagen. Berkel en Rodenrijs: Stichting
Politieke Academie. http://www.politiekactief.net/files/Spookstemmen1403.pdf.
———. 2014b. De Onvermoede Stabiliteit van de Rotterdamse Verkiezingen - Observaties. Berkel en Rodenrijs: Stichting
Politieke Academie. http://www.politiekactief.net/files/Observaties%20Rotterdam%20GR2014.pdf.
———. 2014c. On the Mechanism of Change and Stability in Multiparty Elections. Amsterdam/Berkel en Rodenrijs:
Stichting Politieke Academie.
https://www.researchgate.net/publication/263089859_On_the_mechanism_of_change_and_stability_in_multi
party_elections.
Taylor, P.J., and R.J. Johnston. 1979. Geography of Elections. New York: Holmes & Meier Publishers.
http://books.google.nl/books?id=pUOFAAAAMAAJ.
Van Driel, C., and R. de Jong. 2014. De Tweede Kamerverkiezingen in vijftig stappen. Amsterdam: Boom.
Van Gent, W.P.C., E.F. Jansen, and J.H.F. Smits. 2014. ‘Right-Wing Radical Populism in City and Suburbs: An Electoral
Geography of the Partij Voor de Vrijheid in the Netherlands’. Urban Studies 51 (9): 177594.
doi:10.1177/0042098013505889.
Voss, S. 2004. ‘Using Ecological Inference for Contextual Research’. In Ecological Inference - New Methodological
Strategies, edited by G. King, O. Rosen, and M. Tanner, 6996. Cambridge: Cambridge University Press.
Wu, J. 2007. ‘Scale and Scaling: A Cross-Disciplinary Perspective’. In Key Topics in Landscape Ecology, edited by J. Wu
and R. Hobbs, 11542. Cambridge: Cambridge University Press.
http://ebooks.cambridge.org/ref/id/CBO9780511618581A060.
Yang, T. 2009. ‘Modifiable Areal Unit Problem’. PennState University - GIA Resources - GIA Tips.
http://help.pop.psu.edu/gia-resources/giatips/MAUP.pdf.
BIOGRAPHY
Joost Smits (1965), MA in Public Administration (University of Twente), researcher at Political Academy (Amsterdam),
works on PhD (University of Twente (2011-2013), now at Political Academy)
Recent publication: (van Gent, Jansen, and Smits 2014)
joost@politiekeacademie.eu
www.politiekeacademie.eu
... Geographical stability contrasts with voters being really adrift (Andeweg 1982) or even floating (Daudt 1961). The limited volatility of voters is instrumental to micro-targeting (Smits 2014a(Smits , 2016, but that subject is not part of this paper. ...
Conference Paper
Full-text available
This paper hop step jumps through the history of social choice and elections since 1130 to arrive at the mechanical and psychological effects of electoral systems. Or: the “game of elections”. Ignorance-based quantative modelling is used to analyse fragmentation in municipal, provincial and parliamentary levels of government in Belgium and the Netherlands. The paper treats the maths and logic behind voter preference to votes to seats to representation. Concluded is that there are more regional differences in Belgium than in the Netherlands. The reforms in electoral law in Belgium are clearly seen in the level of fragmentation. The Netherlands provincial and municipal electoral system can be called “feeble”, even though it has some barriers to prevent too many parties appearing on ballot papers. The situation of too many large parties in parliament is worrying. The Belgian district system is very apt to limit the number of large parties to normal baselines. The municipal system of “list of the mayor” may be a bit too constraining. In total the Belgian electoral system can be considered “strong”.
Technical Report
Full-text available
Vaak lijken verkiezingen grote verschuivingen te tonen. Dat kiezers op drift zijn, dat alles onvoorspelbaar is. Aan verlies van partijen bij verkiezingen wordt dan grote betekenis toegeschreven. Dat ze "uitgespeeld" zijn, dat ze hun aantrekkingskracht voor de kiezer hebben verloren. Voor de derde maal verschijnt bij NRC Handelsblad de interactieve kaart "Wat stemden uw buren" op stembureauniveau. Die kaarten laten geografisch grote stabiliteit zien. Maar in 2015 niet voor alle partijen. In deze korte toelichting leggen we uit wat er in de kaarten te zien is en welke betekenis eraan kan worden gegeven. Het is belangrijk dat er voorspelbaarheid zit in de uitslagen. Politiek in Nederland ziet er anders uit als de gunst van de kiezer een loterij is. Wij denken dat veel inzet van politici en politieke partijen en goed contact met kiezers loont. Delen van deze toelichting zijn gemaakt voor een paper voor het Politicologenetmaal, de jaarlijkse bijeenkomst van (inmiddels) internationale politicologen en onderzoekers van politiek, dat dit jaar wordt gehouden op11 en 12 juni in Maastricht.
Article
Examines the implications of the geographical basis to the organisation of elections. Considers the ways in which votes are cast, are translated into parliamentary seats and are analysed by academics and others. This involves a discussion of electoral law and procedures, the geographical influences in voting, and the geography of representation at the translation of votes into seats and the biases in this process which can be produced by electoral cartography. -from Author
Book
Overzicht van de wet- en regelgeving betreffende de Tweede Kamerverkiezingen in heden en verleden.
Article
http://www.bristol.ac.uk/cmm/learning/course.html
Book
This book provides a solution to the ecological inference problem, which has plagued users of statistical methods for over seventy-five years: How can researchers reliably infer individual-level behavior from aggregate (ecological) data? In political science, this question arises when individual-level surveys are unavailable (for instance, local or comparative electoral politics), unreliable (racial politics), insufficient (political geography), or infeasible (political history). This ecological inference problem also confronts researchers in numerous areas of major significance in public policy, and other academic disciplines, ranging from epidemiology and marketing to sociology and quantitative history. Although many have attempted to make such cross-level inferences, scholars agree that all existing methods yield very inaccurate conclusions about the world. In this volume, Gary King lays out a unique--and reliable--solution to this venerable problem. King begins with a qualitative overview, readable even by those without a statistical background. He then unifies the apparently diverse findings in the methodological literature, so that only one aggregation problem remains to be solved. He then presents his solution, as well as empirical evaluations of the solution that include over 16,000 comparisons of his estimates from real aggregate data to the known individual-level answer. The method works in practice. King's solution to the ecological inference problem will enable empirical researchers to investigate substantive questions that have heretofore proved unanswerable, and move forward fields of inquiry in which progress has been stifled by this problem.