Conference PaperPDF Available

A Comparison of Language Processing Models in Political Analysis: Evidence from Sweden

Authors:

Abstract and Figures

This study evaluates two different natural language processing techniques: the normalised co-occurrence (PMI) versus neural networks. In contrast to most previous studies, the focus is on the context of a proportional representation system-Sweden-, where the parties in parliament tend to form coalitions. We test the models by collecting data from the national parliament (Swedish Riksdag) and using the Swedish language for training sets and dictionaries. The tests focus primarily on the meaning and attention that the party representatives confer to important terms, as well as left-right ideological positioning, with an emphasis on the dimension of "security". The analysis covers parliamentary motions from the two main competitor parties (Moderates and Social Democrats) over two time spans, from 1988-2009 and 2010-2020. The two models delivered different foci of keywords, and we found that balancing pre-training and fine-tuning was crucial to obtaining differences between parties in the neu-ral network approach. The PMI model benefited from a larger context window, and the neural network model (word2vec) from a smaller one. We discuss the results in relation to future opportunities to learn about political vocabularies and the nature of conflict in parliamentary politics.
Content may be subject to copyright.
A Comparison of Language Processing Models in
Political Analysis: Evidence from Sweden
Annika Fred´en, Karlstad University
Moa Johansson, Chalmers University of Technology
Pasko Kisic Merino, Karlstad University
Denitsa Saynova, Chalmers University of Technology
Paper prepared for the APSA general meeting
Seattle/online, 30 September-2 October 2021
Abstract
This study evaluates two different natural language processing tech-
niques: the normalised co-occurrence (PMI) versus neural networks. In
contrast to most previous studies, the focus is on the context of a propor-
tional representation system – Sweden –, where the parties in parliament
tend to form coalitions. We test the models by collecting data from the
national parliament (Swedish Riksdag) and using the Swedish language for
training sets and dictionaries. The tests focus primarily on the meaning
and attention that the party representatives confer to important terms,
as well as left-right ideological positioning, with an emphasis on the di-
mension of “security”. The analysis covers parliamentary motions from
the two main competitor parties (Moderates and Social Democrats) over
two time spans, from 1988-2009 and 2010-2020. The two models delivered
different foci of keywords, and we found that balancing pre-training and
fine-tuning was crucial to obtaining differences between parties in the neu-
ral network approach. The PMI model benefited from a larger context
window, and the neural network model (word2vec) from a smaller one.
We discuss the results in relation to future opportunities to learn about
political vocabularies and the nature of conflict in parliamentary politics.
1 Introduction
Quantitative text analysis is a growing field in the social sciences, and recently
scholars have advanced the analysis to include neural networks and other more
complex estimation techniques (Rheault and Cochrane, 2020; Rodman, 2020;
Rodriguez and Spirling, 2021). These analyses of text have a strong tendency
to use examples from political systems with clear party cleavages, such as the
US congress (Rodriguez and Spirling, 2021), or more recently, mixed electoral
systems, such as in Germany (Lewandovsky et al., 2021, forthcoming). However,
1
if the dataset is small, and involves conceptual, discursive or ideological conflict
to a lesser extent, the language processing should look a bit different and a
combination of simpler and more complex estimation techniques are potentially
more fruitful to attain a deeper understanding of the nature of the data. The aim
of this study is to explore how different machine learning techniques for natural
language processing (NLP) perform on the analyses of parties’ vocabularies
and positions in a proportional representation (PR) context, which tends to
produce parliamentary coalitions. We use materials from the PR system of
Sweden, where minority governments are the rule, and party identification is
weakening (B¨ack and Bergman, 2016). The Swedish PR system is similar to
many others in Europe, such as the Netherlands, Norway, and Denmark. Despite
their prevalence, these cases have so far been neglected in the growing field of
machine learning-oriented political science.
We began this study by assessing techniques which convert text data to
numeric vectors (word embeddings) based on their semantic similarity and co-
occurrence patterns (Landauer and Dumais, 1997; Sutskever et al., 2014; Rod-
man, 2020; Rodriguez and Spirling, 2021). We evaluated two main types of com-
putational techniques: co-occurrence matrices normalised by point-wise mutual
information (PMI), which are regularly applied to psychological research; and
neural networks, which recently have been employed more frequently in political
science. We use the word2vec algorithm for the neural network analyses.
Given that the dataset is sufficiently large, complex natural language pro-
cessing (NLP) techniques relying on neural networks seem to perform well in
plurality system contexts, such as the United Kingdom and the United States
(Rheault and Cochrane, 2020). Rodman (2020) suggests that word vectors can
perform rather well even on ideological concept analyses over time based on
relatively small data. In the English language, there are a number of estab-
lished training sets that scholars tend to rely on (Rodriguez and Spirling, 2021).
Rodriguez and Spirling found small differences between pre-trained and locally
trained datasets when they analyse data from the US congress. Using a so-
called question-based computer language assessment approach, where citizens
describe their political leaders in keywords, Fred´en and Sikstr¨om (2021b) found
that locally trained datasets can be beneficial when the dataset contains a lot
of variation and context-specific words related to political qualities.
When the data come from the parties themselves via more official sources
and contain more bureaucratic and/or rhetorical language, as well as longer
sentences, the impact of the training set probably increases compared with
texts with fewer words, such as Twitter feeds or question-based approaches. For
example, Rodriguez and Spirling find that GloVe, which is a computer algorithm
for neural network word embeddings similar to word2vec, tends to overweigh
uncommon terms in the vocabulary. None of the above mentioned studies,
however, provide a deeper understanding of the importance of a process called
pre-training, which is something that we will elaborate further in our study.
Since previous political science-oriented quantitative text analyses usually
involve English-speaking countries (Rheault and Cochrane, 2020; Osnabr¨ugge
et al., 2021), training datasets are almost exclusively trained on English, and
2
comparisons between contexts are seldom equivalent (Rodriguez and Spirling,
2021). One solution is to translate original languages into English (Fred´en and
Sikstr¨om, 2021b). However, these translations can be insufficient for revealing
nuances when extending text materials over a few words. Oftentimes, the stan-
dardised dictionaries or training sets used are unsuitable for, or non-transferable
from, a different context. We believe that we can generate more in-depth knowl-
edge about the pitfalls and opportunities offered by different computational
techniques because we investigate the data in its original language. We do this
by using relevant and context-specific text materials to evaluate and train the
data, and by possessing a nuanced understanding of the meaning of concepts in
the Swedish language.
In the present case, we have chosen to study seven concepts and ten key-
words which reflect the conflict and character of the PR system of Sweden (drugs
[droger], taxes [skatt/er], security [s¨akerhet, trygghet], crime [brott, brottslighet,
kriminalitet], equality [j¨amlikhet], solidarity [solidaritet], and justice [r¨attvisa].
To a large extent, the key concepts replicate the words that were used by Ro-
driguez and Spirling (forthcoming) to detect the performance of different word
embeddings in the US context. These are concept words such as ”justice” and
”equality”, and words that reverberate in the left-right ideological positioning
such as ”taxes”. We focus on embeddings related to the broad and increas-
ingly important concept of ”security”, which, in the Swedish context, is related
to two different dimensions: state security, referring to systems and programs
(”s¨akerhet”); and social security, referring to human feelings of safety and secu-
rity (”trygghet”). Besides ”solidarity” and ”justice”, Swedish voters themselves
tend to mention words related to safety and security when describing parties that
they would vote for following the covid-19 crisis (Fred´en and Sikstr¨om, 2021a).
The concept of ”security” thus provides a context-specific and relevant example
in the framework of our study that has different implications and meanings in
the Swedish language. We performed analyses over two periods: First, an anal-
ysis concentrating on recent years, with eight parties in parliament including
a radical right-wing populist party (Sweden Democrats, Sverigedemokraterna)
(2010-2020); and a second analysis concentrating on a previous period, including
six to eight different parties in parliament (1988-2009).
We found that the neural network approach was sensitive to the balance
between pre-training and fine-tuning (developed below), which led to the model
being more or less able to deliver substantive differences between the parties’
vocabularies and foci. The PMI approach, on the other hand, required more
interpretation from the researchers to be reasonable, as it produced a lot of less
informative words (e.g., ”also”, ”therefore”, ”be”, ”all”, ”thin”, ”four”) along
the more relevant ones. Furthermore, the models produced different types of
associations. The neural network tended to favor more abstract concept words
such as ”justice” and ”governance”, whereas the PMI model referred to organi-
sations (e.g., NATO, the EU) and specific attributes of the topic (e.g., ”juvenile
delinquency”). The differences between models were, to a large extent, greater
than the differences between parties and over time. The standard deviation was
greater for the PMI models, which benefited from larger context windows in our
3
exercise. It appears that previous studies have underestimated the value of pre-
training of datasets using word2vec. When using word2vec, it also seems like
smaller context windows are preferable to larger ones in the studied context. We
discuss the results in relation to supervised learning and future research paths.
2 Materials and Methods
2.1 Materials
The goal of this study is to compare language use from politicians represent-
ing two major Swedish parties over two time periods: Social Democrats (So-
cialdemokraterna) and Moderates (Moderaterna). We chose these parties be-
cause, first, they constitute the main electoral choices for voters on different
sides of the political ideology spectrum, and second, they are the two potential
leaders of the Swedish government. The material should thus be representative
of a more general political agenda in the system. The materials also contain
motions that are co-written between the leading parties and their support or
coalition partners, since a key characteristic of the Swedish political system is
that the major parties do not govern alone. Since the early 2000s, Swedish
parties have been more oriented toward coalitions, and since 2006, the country
has been governed by coalitions (B¨ack and Bergman, 2016).
There are some practical language-related problems when comparing text
data. Most NLP analyses in political science so far have used English word
references, such as n-grams. One of the techniques used for surveys that have
collected a small(er) number of keywords from each respondent is to translate
words from different languages into English using Google Translate in order to
obtain a pooled dataset. (Fred´en and Sikstr¨om, 2021b). In our study, where
the text corpora are longer and more complex, we have chosen to conduct the
analyses in the original language. We believe that the interpretations will benefit
largely from understanding what certain concepts and example words mean in
the Swedish language. This form of analysis will facilitate the comparison over
models and parties.
The starting point for the NLP analyses is text corpora from motions (pro-
posals) from the main parties to the National Swedish Parliament (Riksdag)1.
We choose motions in the Swedish case because policy positions are often more
explicitly expressed in these instances, rather than in the parliamentary debates.
The latter tend to be low-profile during regular terms (Osnabr¨ugge et al., 2021).
In these motions, parliamentarians, normally from the opposition, express their
views on topics that are particularly relevant to them. Motions can thus be
seen as emblematic frames of the politicians’ discourses and party agendas dur-
ing a specific period in time. These materials are relevant representations of
parliamentarians’ vocabularies and discourses, and are suitable for comparing
the usability of different approaches in the NLP analysis. Most importantly, we
1http://data.riksdagen.se
4
can distinguish if and how word meanings tend to differ between parliamentar-
ians from the main parties in Sweden, and how the different NLP models reveal
these differences.
Two time frames are considered in the analysis: 1988-2009 and 2010-2020. In
addition, both purely Social Democrats (S) or Moderates (M) motions, as well
as motions co-written by multiple parties are considered. The period 1988-2009
contained the presence of 6-8 parliamentary parties and the following shifts in
office: Social Democrat 1988-1991, Moderate party 1991-1994, Social Democrat
1994-2006, back to Moderate from 2006 until 2010. The second period (2010-
2020) marks the entrance of the radical right-wing Sweden Democrats (2010),
whereas the Moderates were in office from 2010 until 2014, when the government
shifted back to the Social Democrats.
The resulting number of documents and words are summarised in Table 1,
where a ”document” refers to a single motion.
1988-2009 2010-2020
Moderates Social
Democrats
Moderates Social
Democrats
Documents 18,319 22,370 12,553 9,448
Words 15,042,389 11,835,351 7,371,382 5,030,519
Table 1: Size statistics of the data used in the analysis.
For one of the models involved in the analysis, additional datasets are used
for pre-training (explained in the next section). In order to obtain models that
cover the terms we are interested in, and similarity patterns close to the ones
in the studied data, the dataset is chosen from the same domain as the text we
are interested in: Riksdag documents from Interpellations, Propositions, and
Parliamentary letters (Riksdagsskrivelse). After removing some of the available
material due to quality issues, such as too many typos and misspellings, the
final material batch contains 21,886 documents with 160,198,694 words.
2.2 Methods
Machine learning tools need to encode words as numerical values, called word
embeddings, which should capture the meaning of a word as a numeric value or
vector. In this study we train separate instances of each model on data from
the respective parties, resulting in several different word embeddings. We are
interested in exploring how different methods for computing these embeddings
behave in terms of which words get similar encodings and can be assumed to
have related usage, and which words get different encodings, indicating different
nuances in the use of language between members of different political parties.
We apply two unsupervised machine learning methods for computing differ-
ent embeddings. The first, PMI, uses deterministic matrix summary statistics
of co-occurrence counts. The second, word2vec, is based on online stochastic
updates - the parameters are adjusted as the model processes more examples
5
from the data.
Point-wise mutual information (PMI): This method relies on a matrix
of co-occurrence counts, which is a large, sparse matrix with dimensions cor-
responding to the size of the vocabulary (one row per word). Entries indicate
how many times words co-occur within a given distance from each other in the
training dataset. This distance (the window size ) can be adjusted in order to
expand or contract the context considered for a word.
The entries in the co-occurrence matrix are then normalised by the total
number of occurrences of both words for each co-occurrence count. This helps
avoid frequently occurring words getting too much weight. These can be stop-
words (e.g. ”the”, ”and”, ”or”) or words used frequently by both parties (e.g.
”motion”, ”parliament”, ”more”, ”important”).
In this study, the vocabulary is limited to about 10,000 words for computa-
tional reasons. Words are selected to be a part of the vocabulary only if they
occur more times than 0.0005 percent of the total words in either party’s data.
Neural Networks (word2vec): We experiment with one of the most com-
mon neural network-based methods, word2vec Mikolov et al. (2013), specifi-
cally with its variant CBOW (Continuous Bag of Words). This model is a
neural network trained to predict a word, given its context window. We use
the Gensim2implementation available in Python and keep most of the default
hyper-parameter values (embedding vector length is set to 100, number of train-
ing epochs is set to 5). The large number of hyper-parameters that need to be
set represents one disadvantage of word2vec in this unsupervised setting. The
optimal values for these are probably sensitive to the type of data and language,
among other factors. However, we cannot perform a hyper-parameter search in
this setting, so we relied on the default/recommended values.
A known issue with neural network models is that they require more data
to be trained, in contrast to smaller, less complex models. After initial tests
showing that the Swedish motions dataset size was insufficient for training, a
pre-training approach was applied. Pre-training is the process of, first, training
a model on an external corpus of data, and second, iterating over the dataset
of interest for further training of the model (called fine-tuning). The dataset
chosen for pre-training will influence the results, specifically in terms of specify-
ing the vocabulary. The main reason why a large publicly available pre-trained
model is unsuitable for this task is that political agendas use a specific set of
terms that might be absent or underrepresented in the datasets used for pre-
training those models. While vocabularies can be extended, absent words in
the original vocabulary are typically initialised randomly, which would make it
harder to learn the connections for those terms. A pre-training set with a similar
distribution to the studied dataset is preferable in order to obtain vocabularies
and learned relationships closer to the ones from the fine-tuning data.
2https://radimrehurek.com/gensim/
6
The PMI model showed sensitivity to non-lemmatized data, so data for this
model was lemmatized using the Spacy 3Python library. Lemmatisation is the
process of transforming inflected forms of a word to their dictionary form (or
lemma). Word2vec did not show similar issues, and due to the computational
effort required and the typical use case of these models, word2vec is trained
on non-lemmatized data in our study and other research. PMI was also more
susceptible to stop-words showing high similarity scores between them and some
of the explored terms. Therefore, stop-words were removed when producing the
”word clouds” described in section 2.2.1
For both models, we explore context window sizes of 5, 50, and 300 to test
whether larger context windows capture broader meanings, and if the context
window plays the same role. We choose the shortest window, as it is a commonly
used size (Antoniak and Mimno, 2018) and is a default value in the Gensim
library. The second window size is chosen, as that covers around two sentences,
which is in line with a typical span for deep language models like BERT. The
second window size would also cover a typical tweet, which is a widely used
unit of analysis in NLP and political science. The longest context window size
corresponds to the median length of motion text in the studied dataset.
2.2.1 Evaluation
Once word embeddings have been obtained, there are different ways to evaluate
the results. The main idea behind all methods for evaluation lies in the concept
of distance. Embedding vectors can be seen as positions in a high-dimensional
space, and the distance between them can be seen as indicative of the magnitude
of the difference in meaning (i.e., two words that have similar meanings will have
vectors that are close to each other). In this study, we use the cosine distance,
which is one of the most commonly used distance measures. The cosine distance
is defined as the cosine of the angle between the two vectors, which results in a
score between 0 and 1 that can be used as a measurement of similarity between
words.
To evaluate the learned concepts we produce ”word clouds”, which are a
list of the top 20 most similar words to a particular term that is being studied.
These lists can be compared between political parties and between models. The
overlaps (i.e., how many of the top 20 words are common between the two
parties) and differences should reflect the nuances in the speakers’ use of the
term that each model manages to disambiguate. In this study we produce these
lists based on the mean cosine distance between bootstrapped iterations, and
calculate the standard deviation as a guide for the reliability of the results.
We also use bootstrapping with 20 iterations for each model to asses the
stability of the results. The average of the mean standard deviation (std) per
term is calculated for the models. To make these comparable between models
the standard deviation is normalized by the score.
In the present application we focus on ten Swedish terms: droger (drugs),
3https://spacy.io
7
skatter (taxes), s¨akerhet (security), trygghet (security/safety), brott (crime),
brottslighet (crime/delinquency), kriminalitet (crime), j¨amlikhet (equality), sol-
idaritet (solidarity), and r¨attvisa (justice). These key concepts are similar to
those in previous studies. For instance, ”equality” has been scrutinised in the
US context (Rodman, 2020; Rodriguez and Spirling, 2021), and ”taxes” is a
clear ideological left-right divider in Sweden, as it was found to be in the US
(Rodriguez and Spirling, 2021). These terms vary regarding the number of
mentions between time periods and between parties (appendix A).
3 Results
Both models proved able to learn words related to the main term, within which
some nuances can be observed. There are more differences in the learned words
between models rather than between parties. Word2vec learns more example-
based and policy terms (e.g., types of drugs and taxes), while PMI produces
more general terms. PMI also shows higher mean standard deviations, and the
results are improved by higher context windows. Word2vec shows lower average
standard deviation and is not influenced as much by the context window in terms
of stability. However, Word2vec seems to perform worse with larger windows.
A clear consequence of this lack of performance is that more data quality issues
manifest in the results.
One possible explanation for the low standard deviation and high overlap
between parties in the word2vec model is the over-representation of the pre-
training data. In typical fine-tuning scenarios, long fine-tuning is avoided in
order to mitigate an issue referred to as ”catastrophic forgetting”. This issue
is that in which the model forgets the general language patterns learned and
amplifies data artifacts from the fine-tuning dataset. As this work is interested
in those nuances (learned language patterns), this process might in fact be
beneficial rather than detrimental. Increasing the number of fine-tuning epochs
from 5 to 50 for the context window 5 word2vec model significantly decreased
the overlap and only slightly increased in standard deviation. This adjustment
suggests that the original results are greatly influenced by the pre-training data.
3.1 Stability of models
As a measure of the stability of the word representations learned by the different
models with varying context window size, we have computed the mean standard
deviation for the ten terms listed in section 2.2.1 (see Table 3.1). The stability
of the PMI models increase when we increase the context window size, which
is consistent over both time periods. On the other hand, for word2vec, the
mean standard deviation is generally lower, and increases with larger context
windows. However, the difference is smaller than for PMI.
As we are interested in nuances, the standard deviation should not directly
be used as a measurement of the model quality, as low standard deviations can
8
also point to low complexity in the learned patterns. The std should be used
more as a guide to flag unreliable results.
model time period window size mean std
PMI 1988-2009 5 0.344
50 0.096
300 0.069
2010-2020 5 0.462
50 0.215
300 0.111
word2vec 1988-2009 5 0.024
50 0.042
300 0.075
2010-2020 5 0.019
50 0.035
300 0.063
Table 2: Mean standard deviation per model, measured from the ten terms
used in the analysis in 20 bootstrapped iterations. The standard deviations are
normalized by the cosine similarity score to make them comparable between
models.
3.2 Security: A topic with different meanings
In order to illustrate the differences and similarities between models and parties
over time, we focus on differences and similarities between models and parties in
the ”security” dimension. ”Security” is a topic that has recently become dom-
inating both among voters, when they are to describe parties they would vote
for (Fred´en and Sikstr¨om, 2021a), and in leading Swedish broadcasting political
media. Security is a wide concept, and we explore two aspects which trans-
late to different words in Swedish. First, trygghet, refers to the more subjective
feeling of being safe and secure, and second, akerhet, refers to objectively mea-
surable security, e.g., the secure and safe operation of machines, traffic, and give
associations to state organisations such as the police etc.
”Trygghet” (social security and safety dimension). In the PMI models,
we notice some differences and similarities between the two time periods, and
between the parties. The word ”school” occurs for both parties in almost all
PMI models in both time periods, which is a sign that this word is highly
relevant to both parties. The motions in the earlier period (1988-2009) appear to
focus more on economic security. Moderates’ words lean more toward individual
capacities and responsibility (”be able to”), while Social Democrats emphasise
the social dimension and government (see Figure 1, which represents significant
example words). In general, Social Democrats were more group-oriented in their
use of ”trygghet”, whereas the Moderates emphasised the individual. This
9
diverging approach incarnates a traditional difference between these parties.
The motions in the later period (2010-2020) appear to focus more on sources
of unsafety (otrygghet), following a general trend in society. In this context,
Social Democrats mention topics such as ”youth crime” and ”weekend”. The
results for the Moderates during this period are dominated by non-informative
and more general words such as ”work”. We require a deeper analysis of the
context of the term ”trygghet” for the Moderates during the later period in
order to ascertain why we lack more meaningful results.
Figure 1: Example words from the PMI model (context window 300) related to
the term trygghet
In the first word2vec models, it appears that much of the results are domi-
nated by the pre-training dataset, as there are many words occurring for both
S and M. In addition, many of the same words appear for both time periods.
These words tend to be related to the welfare system and social security. While
these words certainly are relevant for the topic, they do not reveal many differ-
ences between the parties. When passing the fine-tuning data (the motions from
S and M) through more epochs during training - thus encouraging the learned
embeddings to adapt more to the respective data from S and M -, differences
between parties became much clearer. We extended the fine-tuning to 50 epochs
in the smallest model (context window 5), and it reduced the overlap by 2 in
the early period and by 8 in the later period, supporting our hypothesis. For
instance, in the 1988-2009 period, the results for the Moderates leaned toward
more focus on ”stability”, whereas the Social Democrats put forth ”justice”
and ”trust” (Figure 2). Furthermore, in the 2010-2020 period we see many
more words related to the ”individual” for M (e.g responsiveness), while S still
talk primarily about the social security system (e.g., health insurance). Our
interpretation is that, for the 2010-2020 period, the fine-tuned word2vec model
represents the data and the parties’ positioning to a greater extent than the
PMI model.
”S¨akerhet” (system security dimension). In the 1988 – 2009 period, the
PMI models reveal a very different meaning of the two dimensions ”trygghet”
and ”s¨akerhet”. Under the latter meaning (Figure 3), common words involve
10
Figure 2: Example words from the word2vec model (context window 5) related
to the term trygghet
primarily issues related to international relations and conflict, such as ”peace”,
”military”, etc. An interesting aspect is that the Moderates mention NATO
(which they are in favour of joining) and the EU, while the Social Democrats
do not. In the 2010–2020 period, commonalities still emphasise issues related to
international relations and conflicts. Moderates once again mention ”NATO”
and ”Europe”, while Social Democrats instead mention the ”UN” and ”conflict
prevention”. Thus, when studying the more general concept ”s¨akerhet” from
the PMI model, we obtain concrete subjects such as organisations.
Figure 3: Example words from the PMI model (context window 300) related to
the term akerhet
Turning to the first word2vec model, we once again see many words oc-
curring for S and M, and few clear distinctions between these parties. Many
words are related to the ”security” topic in general. These include compound
words involving security (”security risk”, ”security requirements”, etc.) as well
as some specific areas of security (”nuclear security”, ”cyber security”). Words
relating to defence and the military also occur, but are a lot less pronounced
than in the PMI results. Again, this finding is likely due to the pre-training data
set dominating the embeddings. In the model that was further fine-tuned for
additional epochs, we see that the differences between the parties’ focus appear
clearer, in particular for the 2010–2020 period (Figure 4). The military and in-
11
ternational relations dimensions becomes slightly less pronounced for the Social
Democrats, who emphasise the ”governance” dimension, while the Moderates’
focus on ”sovereignty”. This model thus better reflects the parties’ position be-
ing slightly on the left (Social Democrats) and on the right (Moderates). Once
again we find that fine-tuning the pre-training dataset is crucial for obtaining
a fair representation of the motions text materials from the neural network
approach.
Figure 4: Example words from the word2vec model (context window 5) related
to the term akerhet
Summary The PMI and word2vec models seemed to highlight different as-
pects of the topics. We saw signs that when using a pre-trained word2vec model
for our purposes, it can be beneficial to run more epochs of fine-tuning as we wish
to make the models highlight differences in language patterns of the respective
parties, rather than commonalities. Our preliminary experiment is encouraging.
Part of the reasons we see differences between PMI and word2vec can also
be attributed to the differences in vocabulary, which the models will use and
learn. While the PMI vocabulary is chosen as the most commonly used words
for either party, the word2vec vocabulary becomes defined by the pre-training
dataset. For example, in the PMI model, for the word ”security”, we saw
the Moderates frequently mentioning NATO, which was not the case for the
word2vec model. In fact, the word NATO occurs exclusive in the Moderate
dataset, and was potentially not present in the pre-training data - hence, it was
ignored by the word2vec model. This is also the reason why the pre-training set
should be as similar as possible to the training data of interest. Taking any pre-
trained model on general text might miss many more domain-specific words.
Users should be aware of this subtlety in model vocabularies when choosing
machine learning models and pre-training sets.
12
4 Discussion
This study demonstrated that, when applied to a proportional representation
context, the choice of method and training set influences the picture of the
political landscape. Applying a neural network model to the motions material
required a more careful set-up for the pre-training. Without additional iterations
of fine-tuning, party differences were obscured. On the other hand, the PMI
model produced some meaningful differences related to objects such as specific
organisations (e.g., ”NATO”, ”EU”, ”UN”), but also more general words (e.g.
”all”, ”therefore” and ”also”). It was thus necessary to interpret the output
from the PMI model to a greater extent, whereas the neural network produced
more consistent, although abstract and concept-based, results.
In contrast to the general trend in analyses of text-as-data in political science
(Rodman, 2020; Rheault and Cochrane, 2020; Rodriguez and Spirling, 2021),
our study concerned a rather limited time period was focused on a proportional
representation context with bipolar political tendencies. The PMI model was
valuable to pick up more ordinary vocabulary and concrete examples of poli-
tics of specific time periods. The neural networks model, on the other hand,
was more valuable for characterising words and concepts that are significant
in political discourse more generally. Longer fine-tuning of the model revealed
differences between the parties to a greater extent.
We focused on dimensions believed to be indicative of differences and posi-
tioning: security, taxes, and equality. From this approach we could characterise
the respective party’s vocabulary, focus of attention, and use of words during
the two time periods. For example, on ”security”, Social Democrats focused on
groups and communities, whereas Moderates focused on the individual.
In the framework of securitisation, Bubnovskaia et al. 2019 find that security
and safety are more strongly associated or prioritised in relation to issues of so-
cial and political institutions, rather than to behavioural or personality features.
They argue that the dominance of the institutional approach to “security” and
“safety” can be linked to the swift advancement of the “securitisation of hu-
man security” and the “safety discourse” (Bubnovskaia et al. 2019 p. 9) in
our societies. This finding could further contribute to explain the tendency to-
ward institutionalised words and concepts we found from the neural network
approach.
This institutional dominance could add another interesting analytical layer
for the comparison between our NLP techniques and the two Swedish parties.
It should be noted that Bubnovskaia et al. 2019 and most previous studies
are focused on databases and conceptions of “security” stemming from English
sources and databases. In this line, potential research avenues on the issues of
security and safety in comparative political speech could focus on developing
dictionaries and conceptual associations in other languages, such as we did in
this study with the Swedish language. A deeper study covering additional mate-
rials could certainly inform future research on how political speeches reproduce
securitisation practices and biases into other spheres of public life.
Another perspective is that of Goet (2019), who argues that to better learn
13
specific differences between party word use, it could be beneficial to instead
train word embeddings in conjunction with a supervised proxy classification
task, such as predicting the party label of a text. This way, the model can be
tuned to emphasise differences in language use which are more indicative of the
respective parties. In a supervised learning scenario (unlike in our work) we
set up a clear criteria for what constitutes a ”better” model: namely the one
that is better at guessing the correct party label. Investigating learning word
embeddings in a supervised setting is future work. A classifier could also learn
to position words-in-context from left to right, rather than by party label, which
is relevant since some parties may change positions over time.
Potentially, an additional development is to combine these two types of un-
supervised approaches (and materials), in order to get a more accurate picture
of the political actor’s vocabulary and stance. A more applied research path
in the future would be to further deeply examine to what extent the parties’
use of words correlate with disagreement and shifts in actual behavior, such as
parliamentary voting.
Acknowledgements
This work was supported by the Wallenberg Al, Autonomous Systems and Soft-
ware Program - Humanities and Society (WASP-HS) funded by the Marianne
and Marcus Wallenberg Foundation and the Marcus and Amalia Wallenberg
Foundation
14
References
M. Antoniak and D. Mimno. Evaluating the stability of embedding-based word
similarities. Transactions of the Association for Computational Linguistics,
6:107–119, 2018. doi: 10.1162/tacl a 00008. URL https://aclanthology.
org/Q18-1008.
O. V. Bubnovskaia, V. V. Leonidova, and A. V. Lysova. Security or safety:
Quantitative and comparative analysis of usage in research works published
in 2004–2019. Behavioral Sciences, 9(12):146, 2019.
H. B¨ack and T. Bergman. The Parties in Government Formation. In J. Pierre,
editor, The Oxford Handbook of Swedish Politics, pages 206–226. Oxford Uni-
versity Press, Oxford, 2016.
A. Fred´en and S. Sikstr¨om. Voters’ sympathies and antipathies studied by
quantitative text analysis: Evidence from a two-wave panel experiment in
Sweden during covid-19. In Annual Midwest Political Science Association
Conference, MPSA, Chicago, April 2021a.
A. Fred´en and S. Sikstr¨om. Voters’ view of leaders during the covid-19 crisis:
Quantitative analysis of keyword descriptions provides strength and direction
of evaluations. Social Science Quarterly, n/a(n/a), 2021b.
N. D. Goet. Measuring polarization with text analysis: Evidence from the UK
house of commons, 1811–2015. Political Analysis, 27(4):518–539, 2019. doi:
10.1017/pan.2019.2.
T. K. Landauer and S. T. Dumais. A Solution to Plato’s Problem: The
Latent Semantic Analysis Theory of Acquisition, Induction, and Represen-
tation of Knowledge. Psychological Review, 104(2):211–240, 1997. doi:
10.1037/0033-295X.104.2.211.
M. Lewandovsky, J. Schwanholz, C. Leonhardt, and A. Bl¨atte. New Parties,
Populism, andParliamentary Polarization: Evidence from Plenary Debates in
the German Bundestag. In M. Oswald and E. Broda, editors, The Palgrave-
Handbook of Populism. Palgrave McMillan, Basingstoke, New York, 2021,
forthcoming.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed
representations of words and phrases and their compositionality. In C. J. C.
Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, edi-
tors, Advances in Neural Information Processing Systems, volume 26. Curran
Associates, Inc., 2013.
M. Osnabr¨ugge, S. B. Hobolt, and T. Rodon. Playing to the gallery: Emotive
rhetoric in parliaments. American Political Science Review, 115(3):885–899,
2021. doi: 10.1017/S0003055421000356.
15
L. Rheault and C. Cochrane. Word Embeddings for the Analysis of Ideological
Placement in Parliamentary Corpora. Political Analysis, 28(1):112–133, 2020.
ISSN 1047-1987, 1476-4989. doi: 10.1017/pan.2019.26.
E. Rodman. A Timely Intervention: Tracking the Changing Meanings of Po-
litical Concepts with Word Vectors. Political Analysis, 28(1):87–111, 2020.
ISSN 1047-1987, 1476-4989. doi: 10.1017/pan.2019.23.
P. Rodriguez and A. Spirling. Word embeddings: What works, what doesn’t,
and how to tell the difference for applied research. Journal of Politics, 2021.
doi: https://doi.org/10.1086/715162.
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to Sequence Learning with
Neural Networks. In Proceedings of the 27th International Conference on
Neural Information Processing Systems (Online), pages 3104–3112, Dec. 2014.
16
Appendix
A Frequencies of terms used in the analysis
1988-2009 2010-2020
M S M S
docs freq docs freq docs freq docs freq
droger 204 541 221 382 124 346 105 209
skatt 2720 13167 1023 3103 1199 4389 470 1760
akerhet 1370 2912 1362 2609 1004 2316 564 1312
trygghet 1277 3145 1254 2116 962 2027 672 1154
brott 1561 7182 1106 3069 1320 6148 513 1687
brottslighet 712 2594 553 2113 735 2209 303 950
kriminalitet 309 558 234 375 229 490 128 261
amlikhet 98 140 214 328 59 87 168 281
solidaritet 109 159 269 396 53 71 85 151
attvisa 493 681 591 886 235 297 171 282
17
... For example, Papasavva et al. [22] apply Word2Vec to find words associated with QAnon. It has also been used to compare semantic contexts: e.g. the language use of parliamentary motions of the opposing Swedish political parties [23]. While our goal is somewhat different, the approach is similar. ...
Conference Paper
Full-text available
By identifying and characterising the narratives told in news media we can better understand political and societal processes. The problem is challenging from the perspective of natural language processing because it requires a combination of quantitative and qualitative methods. This paper reports on work in progress, which aims to build a human-in-the-loop pipeline for analysing how the variation of narrative themes across different domains, based on topic modelling and word embeddings. As an illustration, we study the language associated with the threat narrative in British news media.
Article
Full-text available
Objectives Previous research suggests that governments usually gain support during crises such as the Covid-19. However, these findings are based on rating scales that only allow us to measure the strength of this support. This article proposes a new measure of how voters evaluate Prime Ministers (PM) by asking for descriptive keywords that are analyzed by natural language processing. Methods By collecting a representative sample of citizens’ own key words describing their PM in 15 countries in Europe during the outbreak of Covid-19, and analyzing these by latent semantic analysis and a multiple OLS regression, we could quantify the strength and direction of voters’ view. Results The strength analysis supported previous studies that describing the PM with positive words was strongly associated with vote intention. Furthermore, a change in the direction of the attitudes from “good” to “honest” was found. A new finding was that the pandemic was associated with an increase in polarization. Conclusions The keyword evaluation analysis provides opportunities of evaluating both strength and direction of voters’ view of their PM, where we show new results related to increased polarization and shift in the direction of attitudes.
Article
Full-text available
Research has shown that emotions matter in politics, but we know less about when and why politicians use emotive rhetoric in the legislative arena. This article argues that emotive rhetoric is one of the tools politicians can use strategically to appeal to voters. Consequently, we expect that legislators are more likely to use emotive rhetoric in debates that have a large general audience. Our analysis covers two million parliamentary speeches held in the UK House of Commons and the Irish Parliament. We use a dictionary-based method to measure emotive rhetoric, combining the Affective Norms for English Words dictionary with word-embedding techniques to create a domain-specific dictionary. We show that emotive rhetoric is more pronounced in high-profile legislative debates, such as Prime Minister’s Questions. These findings contribute to the study of legislative speech and political representation by suggesting that emotive rhetoric is used by legislators to appeal directly to voters.
Article
Full-text available
This article is devoted to the statistical analysis of security and safety frequency in the context of categories connected with social institutions and personality features in research works from 2004–2019. Research was based on the following methods: quantitative analysis of safety frequency in the context with coded “categories” related to social institutions and personality features; analysis was conducted with computer-assisted content analysis QDA Miner Lite v. 1.4 and Fisher’s F-test. An analysis of 1157 works showed that the terms “security” and “safety” were quantitatively more frequent when used with concepts related to social institutions than with concepts related to personality features. In our opinion, this qualitative trend shows the prevailing significance of social aspects of security over its personal (psychological) traits for research analysis and practical social aspects. The priority usage of the terms “security” and “safety” can be related to the securitization of society, (i.e., to the increased role and significance of social ways of providing security and protection from threats), primarily with the help of external law-enforcing actors such as the state, police, and army. Securitization counterweights the development of social and psychological mechanisms of security—developing motivation for safe behavior, personal self-regulation, and self-production of security as an internal feeling of protection.
Article
Full-text available
Word embeddings, the coefficients from neural network models predicting the use of words in context, have now become inescapable in applications involving natural language processing. Despite a few studies in political science, the potential of this methodology for the analysis of political texts has yet to be fully uncovered. This paper introduces models of word embeddings augmented with political metadata and trained on large-scale parliamentary corpora from Britain, Canada, and the United States. We fit these models with indicator variables of the party affiliation of members of parliament, which we refer to as party embeddings. We illustrate how these embeddings can be used to produce scaling estimates of ideological placement and other quantities of interest for political research. To validate the methodology, we assess our results against indicators from the Comparative Manifestos Project, surveys of experts, and measures based on roll-call votes. Our findings suggest that party embeddings are successful at capturing latent concepts such as ideology, and the approach provides researchers with an integrated framework for studying political language.
Article
Full-text available
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Article
Word vectorization is an emerging text-as-data method that shows great promise for automating the analysis of semantics—here, the cultural meanings of words—in large volumes of text. Yet successes with this method have largely been confined to massive corpora where the meanings of words are presumed to be fixed. In political science applications, however, many corpora are comparatively small and many interesting questions hinge on the recognition that meaning changes over time. Together, these two facts raise vexing methodological challenges. Can word vectors trace the changing cultural meanings of words in typical small corpora use cases? I test four time-sensitive implementations of word vectors ( word2vec ) against a gold standard developed from a modest data set of 161 years of newspaper coverage. I find that one implementation method clearly outperforms the others in matching human assessments of how public dialogues around equality in America have changed over time. In addition, I suggest best practices for using word2vec to study small corpora for time series questions, including bootstrap resampling of documents and pretraining of vectors. I close by showing that word2vec allows granular analysis of the changing meaning of words, an advance over other common text-as-data methods for semantic research questions.
Article
Political scientists can rely on a long tradition of applying unsupervised measurement models to estimate ideology and preferences from texts. However, in practice the hope that the dominant source of variation in their data is the quantity of interest is often not realized. In this paper, I argue that in the messy world of speeches we have to rely on supervised approaches that include information on party affiliation in order to produce meaningful estimates of polarization. To substantiate this argument, I introduce a validation framework that may be used to comparatively assess supervised and unsupervised methods, and estimate polarization on the basis of 6.2 million records of parliamentary speeches from the UK House of Commons over the period 1811–2015. Beyond introducing several important adjustments to existing estimation approaches, the paper’s methodological contribution therefore consists of outlining the challenges of applying unsupervised estimation techniques to speech data, and arguing in detail why we should instead rely on supervised methods to measure polarization.
Conference Paper
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Chapter
This chapter focuses on the role of political parties in the government formation process. Swedish governments have had a clear-cut bloc political character, with the 'socialist' parties in one camp and the 'non-socialist' parties in the other. Other features of the historical record also stand out. One example is that many postwar governments have been minority cabinets, often single-party governments. These have often been Social Democratic. The Social Democrats have ruled with the support of the Left Party and, more recently, the Greens. In the period 1998-2006, there were even written policy agreements (contracts) between the governing Social Democrats and the two 'support' parties. When coalitions form, the parties divide the ministerial portfolios in a way that is proportional to the size of each party, and during the last decades, a gender balance in the cabinet has become an explicit ambition.