ArticlePDF Available

Analyzing Polarization of Social Media Users and News Sites during Political Campaigns

Authors:

Abstract and Figures

Social media analysis is a fast growing research area aimed at extracting useful information from social networks. Recent years have seen a great interest from academic and business world in using social media to measure public opinion. This paper presents a methodology aimed at discovering the behavior of social network users and how news sites are used during political campaigns characterized by the rivalry of different factions. As a case study, we present an analysis on the constitutional referendum that was held in Italy on December 4, 2016. A first goal of the analysis was to study how Twitter users expressed their voting intentions about the referendum in the weeks before the voting day, so as to understand how the voting trends have evolved before the vote, e.g., if there have been changes in the voting intentions. According to our study, 48% of Twitter users were polarized toward no, 25% toward yes, and 27% had a neutral behavior. A second goal was to understand the effects of news sites on the referendum campaign. The analysis has shown that some news sites had a strong polarization toward yes (unita.tv, ilsole24ore.it and linkiesta.it), some others had a neutral position (lastampa.it, corriere.it, huffingtonpost.it and repubblica.it) and others were oriented toward no (ilfattoquotidiano.it, ilgiornale.it and beppegrillo.it).
Content may be subject to copyright.
Social Network Analysis and Mining manuscript No.
(will be inserted by the editor)
Analyzing Polarization of Social Media Users and News Sites
during Political Campaigns
Fabrizio Marozzo ·Alessandro Bessi
Received: date / Accepted: date
Abstract Social media analysis is a fast growing research area aimed at extracting useful
information from social networks. Recent years have seen a great interest from academic
and business world in using social media to measure public opinion. This paper presents
a methodology aimed at discovering the behavior of social network users and how news
sites are used during political campaigns characterized by the rivalry of different factions.
As a case study, we present an analysis on the constitutional referendum that was held in
Italy on 4th December 2016. A first goal of the analysis was to study how Twitter users
expressed their voting intentions about the referendum in the weeks before the voting day,
so as to understand how the voting trends have evolved before the vote, e.g., if there have
been changes in the voting intentions. According to our study, 48% of Twitter users were
polarized towards no, 25% towards yes, and 27% had a neutral behavior. A second goal was
to understand the effects of news sites on the referendum campaign. The analysis has shown
that some news sites had a strong polarization towards yes (unita.tv, ilsole24ore.it and
linkiesta.it), some others had a neutral position (lastampa.it, corriere.it, huffingtonpost.it
and repubblica.it) and others were oriented towards no (ilfattoquotidiano.it, ilgiornale.it and
beppegrillo.it).
Keywords Social media analysis ·Public opinion ·Online information ·News sites ·Users’
polaritazion ·Social networks ·Political events
1 Introduction
In the last years, the production rate of digital data has increased exponentially, with a
great contribution from social networks such as Facebook, Twitter, Qzone and Instagram.
The large volumes of data generated and gathered by social media platforms can be used
to extract valuable information regarding human dynamics and behaviors [4]. Social media
analysis is a fast growing research area aimed at extracting useful information from this big
amount of data [31]. For example, it is used for the analysis of collective sentiments [30], for
understanding the behavior of groups of people [9][10], and to improve the communication
between companies and customers [1].
Recently, there has been a great interest from academic and business world for using
social media to measure public opinion [2]. Several researchers have used social media data
for predicting election results [14], measuring how public opinion changes after important
political debates [13] or studying the effects of social media during important recent histor-
ical events (e.g., Arab Spring [20]). Other researchers have examined the impact of social
Fabrizio Marozzo
DIMES, University of Calabria
via P. Bucci, 41C, 87036, Rende, Italy
E-mail: fmarozzo@dimes.unical.it
Alessandro Bessi
Information Sciences Institute, University of Southern California
Marina del Rey, CA, USA
E-mail: bessi@isi.edu
2 Fabrizio Marozzo, Alessandro Bessi
media spaces on news consumption [18] and on how information spreads through social
networks [24].
This paper presents a methodology aimed at discovering the behavior of social network
users and how news sites are used during political campaigns characterized by the rivalry
of different factions. The methodology is composed of five steps: i) definition of the factions
and collection of the keywords associated to a political event; ii) collection of all the posts
generated by social network users containing one or more keywords defined at first step;
iii) pre-processing of the posts and creation of the input dataset; iv) data analysis and
mining; and v) results visualization. From one hand, the methodology allows to study the
users’ polarization before a political event, what arguments they used to support their voting
intentions, and if such intentions change in the weeks preceding the vote. On the other hand,
the methodology allows to study the effects of news sites on a political event, e.g. how many
users used information from news sites to support their voting intentions and what news
sites can be considered in favor, against or neutral to a given faction.
Unlike works in literature that classify a post manually [17] or with text mining tech-
niques [8,7,33], the methodology exploits keywords contained in a post to classify it in favor
of a faction. In this way, a post is classified in favor of a faction only if it shows a clear
voting indication for a such faction, otherwise we consider the post as neutral. With regard
to studying the polarization of news sites, different works in literature use a direct approach
that analyzes the contents of articles published by such news sites to understand their po-
litical orientation [35,12]. Our approach instead uses a novel approach that analyzes how
users referred to these news sites in their posts for supporting their voting intentions. Other
aspects of novelty of the methodology are some analyses we have proposed such as statistical
significance of collected data, mobility flows and polarization prediction.
Although the methodology is able to analyze political events characterized by n-factions,
in this paper we focus on a subset of political events distinguished by the rivalry of only two
factions (i.e., two-faction political events). This subset includes salient political events, such
as referenda that see the opposition of two factions (e.g., in favor of yes or no) or ballots
(run-off voting) that see the opposition of two candidates competing for the final victory.
These events are characterized by some interesting features that makes them interesting to
study: i) people show a special attention and sensitivity to these events as they are very
important for a nation; ii) people present a strong polarization in favor of one of the two
factions, and this allows separating them in two distinct groups; iii) accurate analysis can
be done since each user can choose only between two values.
As a case study, we present an analysis on the constitutional referendum that was held
in Italy on 4th December 2016. The Italian voters were asked whether they approve a
constitutional law that amends the Italian Constitution to reform the composition and
powers of the Parliament of Italy. The main supporter of yes was the Democratic Party and
its leader and Italian Prime Minister Matteo Renzi, whereas in favor of no were the main
opposition parties and several citizen committees. The referendum saw a high voter turnout
(approximately 65% of voters) and a clear victory of no (59% of the expressed preferences).
In the weeks before the referendum, we identified a number of keywords (i.e., hashtags)
that were used in Twitter to publish neutral posts on the referendum, for supporting either
yes or no. We collected 338,592 tweets (1,165,176 if we also consider retweets) that contained
those hashtags from 23rd October (5 weeks before the voting day) to 3rd December 2016
(one day before). The number of Twitter users under analysis is 50,717 (139,066 considering
also those who published a re-tweet).
A first goal of the analysis was to study how Twitter users expressed their voting inten-
tions about the referendum in the weeks before the voting day, so as to understand how the
voting trends have evolved before the vote, e.g., if there have been changes in the voting
intentions. According to our study, 48% of Twitter users were polarized towards no, 25%
towards yes, and 27% had a neutral behaviour. Regarding the change of opinion in the
weeks preceding the vote, the majority of users categorized as supporters of no have never
changed during the weeks preceding the vote, while a consistent part of the neutral users
moved towards no. A second goal was to understand the effects of news sites on the referen-
dum campaign. The 22% of tweets contained URLs to news related to the referendum. The
analysis has shown that some news sites had a strong polarization towards yes (unita.tv,
ilsole24ore.it and linkiesta.it), some others had a neutral position (lastampa.it, corriere.it,
Analyzing Polarization during Political Campaigns 3
huffingtonpost.it and repubblica.it) and others towards no (ilfattoquotidiano.it, ilgiornale.it
and beppegrillo.it).
The structure of the paper is as follows. Section 2 describes the methodology proposed
in this paper. Section 3 and Section 4 describe respectively how the methodology has been
exploited on the Italian constitutional referendum and which results have been achieved.
Section 5 discusses related work. Finally, Section 6 concludes the paper.
2 Methodology
Given a political event Pto be analyzed, five are the main steps of the proposed methodology:
1. Definition of the factions Fand collection of the keywords Kassociated to P;
2. Collection of all the posts Pgenerated by social network users containing one or more
keywords in K;
3. Pre-processing of Pand creation of the input dataset D;
4. Data analysis and mining of dataset D;
5. Results visualization.
2.1 Definition of the factions Fand collection of the keywords Kassociated to P
The political event Pis characterized by the rivalry of different factions F={f1, f2, ..., fn}.
Examples of political events and relative factions are: i) municipal election, in which a faction
supports a mayor candidate; ii) political election, in which a faction supports a party; iii)
presidential election, in which a faction supports a presidential candidate. In this step, we
collect the main keywords Kused by social network users to write posts associated to P.
The keywords can be divided in different subsets, e.g., K=Kneutral Kf1... Kfn as
described below:
The general keywords that can be associated to Pbut cannot be associated to any
factions in F(i.e., are neutral) are assigned to Kneutral.
For each faction fiF,Kfi contains the keywords used to support fi.
In this paper we focus on a subset of political events characterized by the rivalry of
only two factions F={f1, f2}. Examples of two-faction events are: i)referendum, in which
a faction supports a position (e.g., in favor of yes or no); ii)ballot (or run-off voting), in
which a faction is one of two candidates competing for the final victory. For these events, the
keywords are divided in three subsets, K=Kneutral Kf1Kf2, where Kneutral contains
the neutral keywords, Kf1and Kf2the keywords associated respectively to f1and f2.
2.2 Collection of all the posts Pgenerated by social network users containing one or more
keywords in K
Through the API provided by social networks, we download all the posts containing one or
more keywords in K. The posts are not collected in real time, but downloaded a given time
after their publication (e.g., 24 hours). In this way, we are able to get some statistics related
to the popularity of a post. For example: i) number of shares, which indicates how many
users shared a post with their friends; ii) number of likes, which indicates how many users
found a post useful. Each collected post has at least one key in K, but may have also other
keywords (co-keywords) that are useful to understand the arguments used to support the
voting intentions.
2.3 Pre-processing of Pand creation of the input dataset D
The goal of this phase is to pre-process the posts in Pto make them ready for the subsequent
analysis. Specifically, after pre-processing each post pPis structured as a tuple huser,
text, timestamp, keywords, statistics, URLs, domains, classiwhere
4 Fabrizio Marozzo, Alessandro Bessi
user is the identification of the user who published p;
text is the text of the post;
timestamp is the timestamp indicating when pwas published;
statistics contains some statistic data about p;
keywords contains the keywords of p;
URLs contains all the URLs present in p;
domains contains, for each uURLs, the corresponding domain names;
class is a label that indicates how a post is classified.
The following operations are performed to pre-process the keywords,U RLs and domains
fields: i) all the keywords are transformed to be lowercase and without accented letters (e.g.,
IOVOTOSI or iovotos´ı iovotosi); ii) all the short URLs are changed into the corresponding
long URLs (for example larep.it repubblica.it); iii) all the alias domains are changed into
a single domain (e.g., beppegrillo.it and beppegrillo.com beppegrillo.it).
The class label is computed by analyzing the keywords of a post. A post may be labeled
as one of classes {neutral,f1, f2, ..., fn}. Considering the keywords {Kneutral, Kf1, ..., Kf n },
Table 1 reports how a post pis associated to one of the faction f1, ..., fnor classified as
neutral. A post is classified as fiif it contains at least one keywords in Kf i, possibly some
keywords in Kneutral, but no one in other factions {Kf1, ..., Kf i1, Kf i+1, ..., Kf n }. A post is
categorized as neutral if has keywords in Kneutral and/or keywords in two or many factions
{Kf1, ..., Kf n}. Although there are other approaches in the literature for classifying a post
(e.g., manually or with text mining techniques), through our approach, a post is classified
in favor of a given faction by analyzing the keywords contained in it, i.e. only if it shows a
clear voting indication for that faction, otherwise we consider the post as neutral.
In the case of two-faction events, a post may be labeled as one of three classes {neutral,
f1, f2}. Considering the keywords {Kneutral, Kf1, Kf2}, Table 2 reports how a post pis
associated to one of the two factions f1,f2or classified as neutral. A post is classified as
f1if it contains at least one keywords in Kf1and possibly some keywords in Kneutral.
Similarly, a post is classified as f2if it contains at least one keywords in Kf2and possibly
some keywords in Kneutral. A post is categorized as neutral if it has keywords in Kneutral
and/or keywords in all the two factions {Kf1, Kf2}.
Table 1: Classification of a post by analyz-
ing its keywords in an n-factions event.
Kneutral Kf1... Kf n Class
- X - - f1
X X - - f1
- - - X fn
X - - X fn
X - - - neutral
- X - X neutral
X X - X neutral
Table 2: Classification of a post by analyz-
ing its keywords in a two-faction event.
Kneutral Kf1Kf2Class
- X - f1
X X - f1
- - X f2
X - X f2
X - - neutral
- X X neutral
X X X neutral
2.4 Data analysis and mining of dataset D
After having built the input dataset D, it is analyzed through algorithms and techniques for
discovering the polarization of social network users and news sites during political campaigns
characterized by the rivalry of different factions. In particular, the main goals of this step
are as follows.
1. Analysis of aggregate data. Dis analyzed to derive statistics about data and to discover
the main arguments used by the different factions whose posts are present in P.
2. Statistical significance of collected data The goal is to assess the significance of D.
3. Temporal analysis. The goal is to analyze how the number of posts supporting the dif-
ferent factions vary along time.
4. Polarization of users. Collected data are analyzed to discover how users are polarized
towards the different factions.
Analyzing Polarization during Political Campaigns 5
5. Mobility flows. The evolution of users’ polarization is studied in the weeks preceding the
political event.
6. Polarization prediction. The goal is to predict the polarization of users before the political
event.
7. Polarization of news sites. Collected data are analyzed to discover how news site are
polarized towards the different factions.
2.5 Results visualization
Results visualization is performed by the creation of info-graphics aimed at presenting the
results in a way that is easy to understand to the general public, without providing complex
statistical details that may be hard to understand to the intended audience. The graphic
project is grounded on some of the most acknowledged and ever-working principles under-
pinning a ’good’ info-graphic piece. In particular, we follow three main design guidelines:
i) preferring a visual representation of the quantitative information to the written one; ii)
minimizing the cognitive efforts necessary to decoding each system of signs; iii) structuring
the whole proposed elements into graphic hierarchies [11].
Displaying quantitative information by visual means instead of just using numeric sym-
bols - or at least a combination of the two approaches - has been proven extremely useful
in providing a kind of sensory evidence to the inherent abstraction of numbers, because this
allows everybody to instantly grasp similarities and differences among values. In fact, basic
visual metaphors (e.g., the largest is the greatest, the thickest is the highest) enable more
natural ways of understanding and relating sets of quantities [32].
3 Case of study: Italian constitutional referendum, 2016
We applied the methodology described in the previous section to the constitutional referen-
dum that was held in Italy on 4th December 2016. The Italian voters were asked whether
they approve a constitutional law that amends the Italian Constitution to reform the com-
position and powers of the Parliament of Italy, as well as the division of powers between
the State, regions, and administrative entities1. The main supporter of the referendum (i.e.,
in favor of yes) was the Democratic Party (in Italian Partito Democratico, PD) and its
leader and Italian Prime Minister Matteo Renzi, on the other hand, in favor of no the main
opposition parties (e.g., Movimento 5 Stelle, Forza Italia) and different citizen committees.
The referendum saw a high voter turnout (approximately 65% of voters) and a majority
of the votes opposed to the reform (i.e., voting no), which exceeded 59% of the expressed
preferences. A political effect of the referendum’s result was the resignation of the Italian
prime minister.
The political event under analysis Pis a two-faction event F={yes, no}. We collected
the main keywords Kused as hashtags in tweets related to P. Such keywords have been
grouped as follows:
Kneutral ={#referendumcostituzionale, #siono, #riformacostituzionale, #referendum,
#4dicembre, #referendum4dicembre}
Kyes ={#bastaunsi, #iovotosi,#italiachedicesi, #iodicosi, #leragionidelsi}
Kno ={#iovotono, #iodicono, #bastaunno, #famiglieperilno,
#leragionidelno}
Given the keywords K=Kneutral Kyes Kno , we collected 338,592 tweets containing
at least one of these keywords posted from 23rd October (5 weeks before the voting day) to
3rd December 2016 (one day before). The tweets were not collected in real time, but with a
delay of 24 hours after their publications so as to capture: i) the number of retweet, which
indicates how many users shared a tweet with their friends; ii) the number of favorites,
which indicates how many users found a tweet useful.
Collected tweets were pre-processed as described in Section 2.3. For instance, Table 3
shows 3 tweets published by a user uibefore the voting day (translated in English for the
Reader’s convenience). For each tweet the main fields have been reported in the table.
1http://www.interno.gov.it/it/italiani-voto-referendum-costituzionale
6 Fabrizio Marozzo, Alessandro Bessi
Table 3: Examples of tweets on the Italian constitutional referendum.
Text Timestamp Keywords URLs Class
Why is important to be well
informed on
#ReferendumCostituzionale
25 Oct 2016
08:00:00
#Referendum
Costituzionale youtube.com/... neutral
#IoVotoNO: all the reasons
to vote against this reform
15 Nov 2016,
09:00:00 #iovotono ilfattoquotidiano.it/...
ilgiornale.it/... no
Now, wait the results!
#referendum4dicembre
#iovotoNO #democrazia
3 Dec 2016
10:00:00
#referendum
4dicembre
#iovotono
#democrazia
- no
In the first tweet, uiexpresses the importance of going to vote by using a neutral hash-
tag (#ReferendumCostituzionale) and including a Youtube URL. This tweet is classified
as neutral. In the second tweet, uishows his/her dissatisfaction with the reform by us-
ing a hashtag supporting no (#iovotono) and two news sites for motivating his/her voting
intention. It is classified as in favor of no. The third tweet contains a neutral hashtag (#ref-
erendum4dicembre), a hashtag supporting no (#iovotono) and a co-hashtag (#democrazia).
This tweet is classified as in favor of no.
4 Analysis and results
4.1 Analysis of aggregate data
Table 4 reports some statistics about the tweets collected: 338,592 are tweets, 826,584 are
retweets and 987,010 are favorites. Filtering the data, we discovered that 43% of tweets
contain co-hashtags (e.g., #democrazia, #renzi) and 22% contain URLs. Co-hashtags are
useful to understand the arguments used in favor of one or another position. The URLs allows
understanding what news site were used by users to support their voting intentions. The
number of users under analysis is 50,717 (139,066 considering also the retweets). Figure 1
shows that more than half (54%) of the users published only one tweet on the referendum,
14% two tweets, 7% three, 4% four and 21% five or more tweets.
Table 4: Statistics about collected tweets.
Filter #Tweets #Retweets #Favorites Total
None 338,592 826,584 987,010 2,152,186
Contains co-hashtags 146,687 449,198 518,088 1,113,973
Contains URL 74,973 139,417 148,888 363,278
0
5000
10000
15000
20000
25000
30000
1 2 3 4 >=5
No. of users
No. of tweets
Fig. 1: No. of tweets posted by users.
Analyzing Polarization during Political Campaigns 7
Table 5 reports some statistics about the main hashtags used for collecting tweets,
grouped in yes,neutral and no. Next to each hashtag, the number of tweets, retweets and
favorites containing such hashtags are reported. The percentage of tweets published with
yes or neutral hashtags are similar (respectively 23% and 24%), and are both half of those
in favor of no (53%). We also studied how users have used these hashtags to write their
tweets: 88% of tweets contain only one or more hashtags of a group (yes,neutral or no),
11% of tweets contain hashtags of two groups (yes/neutral, no/neutral or yes/no), and 1%
of tweets contain hashtags of all the groups (yes/neutral/no).
Table 5: Main hashtags related to yes, neutral and no.
Hashtag #Tweets #Retweets #Favorites Total
#bastaunsi 37,268 94,730 133,774 265,773
#iovotosi 38,373 64,419 95,479 198,273
[All hashtags supporting yes] 76,257 161,306 231,875 469,445
#referendumcostituzionale 36,283 61,940 68,967 167,191
#siono 14,678 28,958 44,460 88,096
#riformacostituzionale 12,233 29,232 30,248 71,715
#referendum 9,727 26,440 27,241 63,409
#4dicembre 7,028 24,715 29,889 61,633
[All neutral hashtags] 81,764 175,123 205,157 462,050
#iovotono 152,638 379,988 430,268 962,895
#iodicono 26,574 107,669 117,233 251,476
[All hashtags supporting no] 180,562 490,147 549,972 1,220,684
Table 6 shows the main ten co-hashtags used by Twitter users, divided into yes,neutral
and no. We note that, in many cases, users who supported yes did it by posting tweets
reporting Prime Minister’s statements (e.g., #matteorisponde), the opportunity to improve
the political system (e.g., #avanti), or information propagated by opponents (e.g., #lebu-
faledelno). On the other hand, users supporting no posted tweets reporting positions from
the political opposition (e.g., #m5s, #salvini), willing to leave the constitution as it is (e.g.,
#costituzione), or hoping to send the prime minister home (e.g., #renziacasa). The neutral
co-hashtags highlight topics that were treated during the referendum campaign.
Table 6: Main co-hashtags related to yes, neutral and no tweets.
Category Co-hashtags
yes #renzi, #sivainpiazza, #matteorisponde, #leopolda7, #avanti,
#midiconoche, #m5s, #matteorenzi, #pd, #lebufaledelno,
neutral #agcom, #serracchiani, #renzi, #pd, #mafiacapitale,
#mafia, #accozzagliachi, #bufale, #bastapocochecevo, #themancettacandidate
no #renzi, #salvini, #m5s, #movimentonesti, #trenotour,
#costituzione, #nonrubo, #pd, #renziacasa, #deluca,
4.2 Statistical significance of collected data
The goal is to assess the statistical significance of the input dataset. Specifically, we studied
whether the Twitter users captured in our analysis were actual voters of the referendum,
i.e., whether they were Italian citizens able to vote (at least 18 years old).
From the metadata present in the tweets used in our analysis, we extracted aggregate
information on the language used to write them and on the location of users who wrote
them. Specifically, from the tweet metadata we analyzed the lang field2, which is a language
identifier corresponding to the machine-detected language of the Tweet text (e.g., “en” for
English, “it” for Italian, “und” if no language could be detected). In addition, from the user
metadata we analyzed the location field3, which indicates the user-defined location for the
accounts profile (e.g., San Francisco, CA).
By analyzing the metadata described above, we can say that:
2Twitter API, https://dev.twitter.com/overview/api/tweets
3Twitter API, https://dev.twitter.com/overview/api/users
8 Fabrizio Marozzo, Alessandro Bessi
All the tweets under analysis have the lang field equals to “it” (Italian). The Italian
language is mainly used by Italians who reside in Italy (60 million) or abroad (about 4
million). Italian is used as first language4only by a small part of Swiss (about 640,000
people), and a very small part of Croats and Slovenes (about 22.000 people).
98% of users who have defined the location in their profile live in Italy.
To further show the statistical value of user locations, in Table 7 we compared the number
of Twitter users captured in our analysis with the total number of citizens grouped by Italian
regions. There is a strong correlation (Pearson coefficient 0.9) between these sets of data.
Similar results are obtained by comparing the number of users and the total number of
citizen grouped by Italian cities. Also in this case, as shown in Table 8 there is a very strong
correlation between these sets of data (Pearson coefficient 0.96).
These statistics give us strong indications about the users analyzed in our case study: it is
highly likely that they are voters of the referendum, that is adult Italians citizen. Regarding
the last point, statistics show that 96% of Italian Twitter users are adults5.
Table 7: Comparison of the number of
users and the total number of citizens
grouped by region.
Region N. of
users
N. of
citizen
Lazio 4,169 5,893,935
Lombardy 4,129 10,014,304
Campania 1,739 5,840,219
Tuscany 1,628 3,743,370
Emilia-Romagna 1,621 4,447,419
Sicily 1,431 5,055,838
Veneto 1,331 4,907,284
Piedmont 1,186 4,394,580
Apulia 1,174 4,066,819
Sardinia 675 1,654,587
Liguria 671 1,565,566
Calabria 565 1,966,819
Friuli-Venezia G. 449 1,218,068
Marches 396 1,539,316
Abruzzo 380 1,322,585
Umbria 310 889,817
Trentino-S. Tyrol 255 1,061,318
Basilicata 202 571,133
Aosta Valley 76 126,732
Molise 73 310,685
Table 8: Comparison of the number of
users and the total number of citizens
grouped by cities (only 20 of the major
Italian cities).
City N. of
users
N. of
citizen
Rome 3,499 2,874,529
Milan 2,221 1,353,467
Naples 747 969,456
Turin 548 885,651
Florence 486 382,346
Bologna 452 388,567
Palermo 348 672,398
Genoa 313 582,870
Bari 215 323,503
Catania 198 312,895
Cagliari 188 154,194
Padua 185 209,475
Venice 177 261,496
Verona 172 257,815
Bergamo 168 120,358
Brescia 159 196,205
Modena 127 184,642
Trieste 125 203,974
Udine 123 99,245
Salerno 119 134,857
4.3 Temporal Analysis
Figure 2 shows the time series of the number of tweets published during the five weeks
preceding the referendum. The tweets in the figure are classified as supporting yes (solid
blue line), neutral (black dashed line), or no (solid red line). A fourth time series on all the
tweets is represented as a solid black line.
4Italian language, https://it.wikipedia.org/wiki/Lingua italiana
5Digital in 2017:Italy, http://www.assocom.org/wp-content/uploads/2017/02/digital-in-2017-italy-we-
are-social-and-hootsuite.pdf
Analyzing Polarization during Political Campaigns 9
0
5000
10000
15000
20000
25000
23/10 30/10 06/11 13/11 20/11 27/11 03/12
No. of tweets
Day
All Yes Neutral No
Fig. 2: Time series of tweets published from 23rd October to 3rd December 2016.
All four time series have a similar growing trends (Pearson coefficients of the yes,neutral
and no series versus the all series range from 0.87 to 0.97) and show some peaks in the
following dates:
29th October: It was the day after a major television confrontation between Matteo
Renzi in favor of yes, and the former PM Ciriaco De Mita in favor of no6;
12th and 23rd November: Debates and discussions in different cities of Italy in favor of
yes or no;
2nd December: The last day to make propaganda before the election silence day (3rd
December).
We observe that, during the whole observation period, tweets supporting no were more
than those supporting yes or neutral. Statistically, every day the number of tweets supporting
neutral or yes are similar, and they both are half of the tweets supporting no.
Figure 3 shows the number of tweets aggregated by week day. The interest on referendum
increases from Monday to Friday, and then decreases during the weekend.
0
10000
20000
30000
40000
50000
60000
70000
Mon. Tue. Wed. Thu. Fri. Sat. Sun.
No. of tweets
Day of the week
Fig. 3: No. of tweets per week day.
4.4 Polarization of users
Polarization of a user ρu[1,1] is defined as
ρu= 2 ×|yesu|
|yesu|+|nou|1,
where |yesu|and |nou|represent, respectively, the number of tweets published by u
classified as yes and no [5]. A value of ρuclose to 1 means that user utends to be polarized
towards yes, while when ρuis close to 1 it means that user uis polarized towards no. In
all the analyses of our paper, we focused on users who showed a strong polarization towards
6http://www.ilgiornale.it/news/politica/de-mita-attacca-renzi-tv-io-cambio-partito-tu-amici-
1324745.html
10 Fabrizio Marozzo, Alessandro Bessi
a given faction. For this reason, we chose a high threshold (0.9) to select users with strong
polarization in favor of yes or no. Specifically, we consider users with ρu>0.9 as polarized
towards yes, users with ρu<0.9 as polarized towards no, otherwise neutral. Figure 4
shows the probability density function of the users’ polarization. We observe a trimodal
distribution, indicating that a group of users are polarized towards yes, another one has a
neutral polarization, and another one polarized towards no. Specifically, the 48% of users
under analysis have a strong polarization towards no, 25% towards yes, and 27% are neutral.
Fig. 4: Probability density function of the users’ polarization.
Figure 5 illustrates production patterns of polarized users. In particular, the figure shows
the complementary cumulative distribution function (CCDF) of the number of tweets pub-
lished by users polarized towards yes and towards no. Both curves point out very similar
production patterns between users polarized towards yes and users polarized towards no [25].
The number of tweets posted by a user does not depend on its polarization: there are a simi-
lar number of users who have published at least xtweets among users polarized both towards
no and towards yes.
Fig. 5: Complementary cumulative distribution function of the number of tweets published
by users polarized towards yes and towards no.
4.5 Mobility flows
Figure 6 represents the evolution of users’ polarization in the five weeks preceding the
referendum. To study the mobility flows of users, we restricted our analysis on users who
have published at least 5 tweets (i.e., 10,436 users). The figure shows how vary the number
Analyzing Polarization during Political Campaigns 11
of users polarized towards yes (blue circles), the number of users polarized towards no
(red circles), and the number of neutral users (gray circles) in the five weeks preceding
the referendum. Arrows in the figure show the percentage of users who after one week are
polarized as in the previous week and the percentage of neutral users who move towards
yes or no. We do not report the moving from yes towards no (and viceversa) because they
are low numbers (less than 3%). Notice that the number of users under analysis increase
from week to week (from 1,520 to 10,436), because by collecting new tweets we are able to
categorize new users.
We observe that, over the five weeks preceding the referendum, the vast majority of users
polarized towards yes and no tend to maintain their polarization. The biggest changes occur
only among users categorized as neutral: 10% of neutral users moves towards the yes and
20% towards the no.
We can conclude that almost all users polarized towards no have not changed position
during the weeks preceding the vote, and one fifth of the neutral users moved towards no.
Also supporters of the yes were very compact, while a lower number of neutral users have
moved to yes.
428 687 1001 1453 1997 2766
98%
377 521 717 1001 1386 1943
715 1441 2380 3354 4394 5727
86%
98%
97%
87%
97%
98%
89%
98%
98%
91%
98%
98%
91%
98%
10% 10% 8% 5% 6%
3% 3% 4% 3%
4%
yes
neutral
no
2016-10-29
(-5 weeks)
2016-11-05
(-4 weeks)
2016-11-12
(-3 weeks)
2016-11-19
(-2 weeks)
2016-11-26
(-1 weeks)
2016-12-03
Fig. 6: Evolution of users’ polarization in the five weeks preceding the referendum day: users
polarized in favor of yes (blue circles), in favor of no (red circles), and neutral (gray circles).
4.6 Polarization prediction
The goal of this section is to predict the polarization of users before the referendum day.
Different machine learning techniques has been studied to evaluate their appropriateness in
the considered domain. Among those, some classification algorithms have been tested and
the Random Forest (RF) [6] algorithm was selected as it achieved the best performance
in terms of accuracy and recall, with limited model building time. Other research works
exploited RF for social media analysis due to its high level of accuracy (e.g., see [36], [15],
[26]), [23].
Random Forests have been trained for predicting the polarization that a user will have
before the voting day, by using information available nweeks before the referendum, where
nvaries from 5 to 1. Specifically, we trained five Random Forest models (one for each value
of n), each of them trained from this information:
The input is composed by aggregate information contained in tweets posted by a user
at least nweeks before the referendum. This information is: i) number of tweets con-
taining yes hashtags, ii) number of tweets containing no hashtags, iii) number of tweets
containing neutral hashtags, iv) total number of tweets, and v) number of hashtags used.
The class is a label that indicates the final polarization of a user (yes ,no or neutral)
calculated by our methodology using all the information contained in all the tweets
posted by a user (i.e., it does not depend on n).
To fine-tune the model, we performed a grid search over the parameters’ space and we
found that the best results are provided by a Random Forest using the entropy criterion
and 128 estimators.
12 Fabrizio Marozzo, Alessandro Bessi
Figure 7 shows the classification performance achieved by RF models at different times.
In particular, we show micro and macro averaging [34] of the area under the curve (AUC)
computed for the model trained with information available at different times. Results are
averaged over 10 Monte Carlo cross validation iterations and indicate that the informa-
tion available 5 weeks before the referendum day provide a classification performance of
0.849±0.006 (micro AUC) and 0.83±0.006 (macro AUC). Such a classification performance
increases with the amount of information available, reaching the value of 0.962±0.002 (micro
AUC) and 0.949 ±0.001 (micro AUC) one week before the referendum day.
2016-10-29
(-5 weeks)
2016-11-05
(-4 weeks)
2016-11-12
(-3 weeks)
2016-11-19
(-2 weeks)
2016-11-26
(-1 week)
0.0
0.2
0.4
0.6
0.8
1.0
score
AUC micro AUC macro
Fig. 7: User polarization prediction achieved by a Random Forest model using information
posted by users from 5to 1week before the referendum day.
4.7 Polarization of news sites
Table 9 reports some statistics about tweets containing URLs from the main Italian news
sites. Almost 3/4 of such tweets contain URLs from five of the major news sites: beppegrillo.it
(36%), ilfattoquotidiano (17%), repubblica.it (12%), huffingtonpost.it (8%) and corriere.it
(5%). Since we have registered a greater presence of tweets supporting no, the popularity
of news sites was been affected if the magazine has written articles close to the positions of
no.
Table 9: Top 15 news sites used by Twitter users during the referendum campaign.
Hashtag #Tweets #Retweets #Favorites Total
beppegrillo.it 4,244 12,990 13,575 30,810
ilfattoquotidiano.it 2,027 8,935 7,495 18,457
repubblica.it 1,468 2,537 2,571 6,576
huffingtonpost.it 957 3,150 2,763 6,870
corriere.it 558 1,083 1,235 2,876
unita.tv 509 1,992 2,716 5,218
ilgiornale.it 482 873 764 2,120
ansa.it 269 668 606 1,543
ilsole24ore.it 268 386 349 1,004
formiche.net 216 243 204 664
movimento5stelle.it 206 709 593 1,508
lastampa.it 189 438 526 1,153
possibile.com 173 804 627 1,604
linkiesta.it 173 428 358 959
affaritaliani.it 143 606 541 1,290
Given a news site s, we compute its polarization as follows
ρs= 2 ×|yess|
|yess|+|nos|1,
where |yess|and |nos|represent, respectively, the number of tweets classified as yes and
no that contain a URL linking to the news site s. Figure 8 shows the polarization of the main
Italian news sites for each category (yes,neutral and no). The figure highlights that some
journals had a strong polarization towards yes (unita.tv, ilsole24ore.it and linkiesta.it), some
Analyzing Polarization during Political Campaigns 13
others had a neutral position (lastampa.it, corriere.it, huffingtonpost.it and repubblica.it)
and others towards no (ilfattoquotidiano.it, ilgiornale.it and beppegrillo.it). This result can
be explained in two ways: news sites that for editorial choices have supported the campaign
of yes or no, or readers of a certain news site that for political reasons are closer to a certain
position.
Figure 9 shows the evolution of the polarization of four representative news sites over the
five weeks preceding the referendum day. The figure clearly indicates that the polarization
of news sites do not show relevant changes over time.
1.00.5 0.0 0.5 1.0
polarization
unita.tv
ilsole24ore.it
linkiesta.it
lastampa.it
corriere.it
huffingtonpost.it
repubblica.it
ilfattoquotidiano.it
ilgiornale.it
beppegrillo.it
Fig. 8: Polarization of the main Italian news sites for each category (yes, neutral and no).
1.0
0.5
0.0
0.5
1.0
polarization
repubblica.it unita.tv ilfattoquotidiano.it ilgiornale.it
Oct 23 2016 Oct 30 2016 Nov 06 2016 Nov 13 2016 Nov 20 2016 Nov 27 2016
0
50
100
volume
Fig. 9: Time series of the polarization of four representative Italian news sites.
5 Related work
In recent years, the use of social media for measuring public opinion has become one of the
hot topics in social network research [2]. In particular, two are the main areas of research
related to this paper: i) the use of social media to measure public opinion and predict election
results; ii) the impact of social media on news consumption and on how information spreads
through social networks. For each of these research areas, the main related work has been
described.
Murphy et al. [27] examined the potential impact of social media on public opinion
research, as an important way for facilitating and/or replacing traditional survey research
14 Fabrizio Marozzo, Alessandro Bessi
methods. The authors highlighted several problems related to this topic, for example: i) not
every member of the public uses social network platforms; ii) incomplete and not accurate
information published by social users; iii) legal regulations about data collected. OConnor
et al. [29] correlated Twitter data with several public opinion time series; Anstead and
O’Loughlin [3], by analyzing the 2010 United Kingdom election, suggested the use of social
media as a new way to understand public opinion. Others related work attempted to measure
the publics evolving response to stimuli, examining both short term events such as TV
political debates [13] or long term events such as economic downturns [16]. An in-depth
survey on this topic can be found in [21].
Hermida et al. [18] have examined the impact of social media on news consumption,
based on an on-line survey of 1600 Canadians. The study highlights that social networks are
a significant source of news for Canadians: two-fifths of users under analysis said that they
received news from users who they follow, while a fifth got information from news organiza-
tions and individual journalists who they follow. Lerman and Ghosh [24] studied the spread
of information on social networks and if their network structure affect how information is
disseminated. Specifically, they extracted the active users and track how the interest in news
spreads among them. Howard et al. [20] studied the effects of social media during the Arab
Spring7. By analyzing users posts from different social networks, the authors have reached
three main conclusions: i) social media played a central role in guiding the political debates
during such event; ii) a spike in on-line conversations often preceded major events in the
real world; and iii) social media helped to accelerate the spreading of news and ideas in the
world.
To highlight the level of novelty of the methodology we proposed, in the following we
review some of the most related research works by discussing differences and similarities with
our work. Ceron et al. [8] used a text analysis approach [19] for studying the voting intention
of French Internet users in both the 2012 Presidential ballot and the subsequent legislative
election. The authors mainly present the results of their analysis by comparing them with
official data and predictions made by survey companies. Very few are the implementation
details, e.g. is not clear how the statistical value of data was assessed and how tweets and
users were classified. Gruzd and Roy. [17] investigated the political polarization of social
network users during the Canadian Federal Election, 2011. A sample of tweets posted by
1,492 Twitter users were manually classified based on their self-declared political views
and affiliations. The methodology we proposed allows to classify a user automatically by
analyzing the posts he/she published - and the keywords he/she used - in the weeks preceding
the vote. Nulty et al. [28] surveyed the European landscape of social media using tweets
originating from and referring to political actors during the 2014 European Parliament
election campaign. With respect to our paper, these authors do not present a methodology
but only a hashtag analysis per languages, political parties and candidates. Burnap et al. [7]
used Twitter data to forecast the outcome of UK General Election, 2015. The authors
applied an automated sentiment analysis tool for classifying tweets. Differently from this
methodology, we classified user posts - and consequently users who wrote these posts -
by taking advantage of the keywords related to the political event under analysis. Similar
considerations can be made for [33], that analyzed about 100.000 political tweets on 2009
German federal election using a text analysis software. Kagan et al. [22] exploited Twitter
data for predicting the electoral results of 2013 Pakistan and 2014 Indian elections. The
authors studied how the support for a candidate (or opposition to a candidate) was spreading
through Internet. In fact, the diffusion model proposed classifies a user by taking into account
also the percentage of his/her neighbors (i.e., friends) that have expressed a positive/negative
opinion on a candidate/faction. With respect to this work, our methodology evaluates only
the content of posts published by a user, but it could be extended considering the opinion
of friends of such user. Wagner [35] studied the 2014 Scottish independence referendum for
understanding how local newspapers supported the campaign of the referendum. Specifically,
the author has analyzed the political position of two local Scottish newspapers (i.e., The
Courier and Evening Telegraph), by counting how many stories were neutral, in favor of, and
opposed to Scottish independence. With respect to our work, this is a traditional approach
that analyzes the textual content of articles published by a news site. We proposed to
evaluate the political position of a news site by analyzing how users referred to such news
7https://en.wikipedia.org/wiki/Arab Spring
Analyzing Polarization during Political Campaigns 15
site for supporting their voting intentions. Similar considerations can be made for [12], that
analyzed the behavior of four leading German on-line newspapers over a timespan of four
years.
In summary, this paper presents a methodology aimed at analyzing the polarization of
social network users and news sites during political campaigns characterized by the rivalry
of two factions (e.g., referenda and ballots). Unlike works in literature that classify a post
manually [17] or with text mining techniques [8,7, 33], our methodology exploits keywords
(e.g., hashtags) contained in a post to classify it in favor of a faction. In this way, a post is
classified in favor of a faction only if it shows a clear voting indication for a such faction,
otherwise we consider the post as neutral. With regard to studying the polarization of news
sites, different works in literature use a direct approach that analyzes the contents of articles
published by such news sites to understand their political orientation [35,12]. Our approach
instead uses a novel approach that analyzes how users referred to these news sites in their
posts for supporting their voting intentions. Other aspects of novelty of the methodology
are some analyses we have proposed:
Statistical significance of collected data, to study the statistical significance of data used
in our analysis. It gives strong indications about the users and if they are voters of the
political event under analysis.
Mobility flows, to analyze the evolution of users’ polarization in the weeks preceding the
political event. It allows to study if users maintained the same polarization or if they
changed their opinion.
Polarization prediction, to predict the polarization of users before the political event.
This allows understanding with what precision the polarization of a user can be predict,
using information available some weeks before the vote.
The whole methodology and all its analysis have been applied to a real application case
such as the Italian constitutional referendum, 2016. We studied the behavior of 50,717
Twitter users by analyzing the 338,592 tweets posted on the referendum by them in the five
weeks preceding the vote. The results demonstrate the applicability of our methodology in
discovering the behavior of social network users and how news sites are used during political
campaigns.
6 Conclusions
Social media analysis is an important research area aimed at extracting useful information
from the big amount of data gathered from social networks. Recent years have seen a great
interest from academic and business world in using social media to measure public opinion.
This paper presents a methodology aimed at analyzing the polarization of social network
users and news sites during political campaigns characterized by the rivalry of different
factions. From one hand, the methodology allows to study the users’ polarization before a
political event, what arguments they used to support their voting intentions, and if such
intentions change in the weeks preceding the vote. From the other hand, the methodology
permits to analyze the effects of news sites on important political events, that is, how many
users used information from news sites and what news sites can be considered in favor,
against or neutral to a given faction.
The methodology has been validated with an important case study as the Italian con-
stitutional referendum, 2016. According to our study, 48% of Twitter users were polarized
towards no, 25% towards yes, and 27% had a neutral behavior. Regarding the change of
opinion in the weeks preceding the vote, the majority of users categorized as supporters of
no or yes have never changed during the weeks preceding the vote, while a consistent part
of the neutral users moved towards no (20%) and towards yes (10%). A second goal was to
understand the effects of news sites on the referendum campaign. The analysis has shown
that some news sites had a strong polarization towards yes (unita.tv, ilsole24ore.it and linki-
esta.it), some others had a neutral position (lastampa.it, corriere.it, huffingtonpost.it and
repubblica.it) and others were oriented towards no (ilfattoquotidiano.it, ilgiornale.it and
beppegrillo.it). The polarization of news sites has remained almost unchanged in the weeks
preceding the vote.
16 Fabrizio Marozzo, Alessandro Bessi
References
1. Were all connected: The power of the social media ecosystem. Business Horizons 54(3), 265 – 273 (2011)
2. Anstead, N., O’Loughlin, B.: Social media analysis and public opinion: The 2010 uk general election.
Journal of Computer-Mediated Communication 20(2), 204–220 (2015). DOI 10.1111/jcc4.12102. URL
http://dx.doi.org/10.1111/jcc4.12102
3. Anstead, N., O’Loughlin, B.: Social media analysis and public opinion: The 2010 uk general election.
Journal of Computer-Mediated Communication 20(2), 204–220 (2015)
4. Belcastro, L., Marozzo, F., Talia, D., Trunfio, P.: Big data analysis on clouds. In: S. Sakr, A. Zomaya
(eds.) Handbook of Big Data Technologies, pp. 101–142. Springer (2017). URL http://dx.doi.org/10.
1007/978-3- 319-49340- 4_4. ISBN: 978-3-319-49339-8
5. Bessi, A., Coletto, M., Davidescu, G.A., Scala, A., Caldarelli, G., Quattrociocchi, W.: Science vs con-
spiracy: Collective narratives in the age of misinformation. PloS one 10(2), e0118,093 (2015)
6. Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)
7. Burnap, P., Gibson, R., Sloan, L., Southern, R., Williams, M.: 140 characters to victory?: Using twitter
to predict the uk 2015 general election. Electoral Studies 41, 230–233 (2016)
8. Ceron, A., Curini, L., Iacus, S.M., Porro, G.: Every tweet counts? how sentiment analysis of social media
can improve our knowledge of citizens political preferences with an application to italy and france. New
Media & Society 16(2), 340–358 (2014)
9. Cesario, E., Congedo, C., Marozzo, F., Riotta, G., Spada, A., Talia, D., Trunfio, P., Turri, C.: Following
soccer fans from geotagged tweets at fifa world cup 2014. In: Proc. of the 2nd IEEE Conference on
Spatial Data Mining and Geographical Knowledge Services, pp. 33–38. Fuzhou, China (2015). ISBN
978-1- 4799-7748-2
10. Cesario, E., Iannazzo, A.R., Marozzo, F., Morello, F., Riotta, G., Spada, A., Talia, D., Trunfio, P.:
Analyzing social media data to discover mobility patterns at expo 2015: Methodology and results. In:
The 2016 International Conference on High Performance Computing and Simulation (HPCS 2016).
Innsbruck, Austria (2016). To appear
11. Cesario, E., Iannazzo, A.R., Marozzo, F., Morello, F., Riotta, G., Spada, A., Talia, D., Trunfio, P.:
Analyzing social media data to discover mobility patterns at expo 2015: Methodology and results. In:
The 2016 International Conference on High Performance Computing and Simulation (HPCS 2016).
Innsbruck, Austria (2016)
12. Dallmann, A., Lemmerich, F., Zoller, D., Hotho, A.: Media bias in german online newspapers. In:
Proceedings of the 26th ACM Conference on Hypertext & Social Media, pp. 133–137. ACM (2015)
13. Elmer, G.: Live research: Twittering an election debate. New Media & Society 15(1), 18–30 (2013).
DOI 10.1177/1461444812457328. URL http://dx.doi.org/10.1177/1461444812457328
14. Franch, F.: 2010 uk election prediction with social media. Journal of Information Technology & Politics
10(1), 57–71 (2013). DOI 10.1080/19331681.2012.705080
15. Gokulakrishnan, B., Priyanthan, P., Ragavan, T., Prasath, N., Perera, A.: Opinion mining and sentiment
analysis on a twitter data stream. In: Advances in ICT for emerging regions (ICTer), 2012 International
Conference on, pp. 182–188. IEEE (2012)
16. Gonzalez-Bailon, S., Banchs, R.E., Kaltenbrunner, A.: Emotional reactions and the pulse of pub-
lic opinion: Measuring the impact of political events on the sentiment of online discussions. CoRR
abs/1009.4019 (2010)
17. Gruzd, A., Roy, J.: Investigating political polarization on twitter: A canadian perspective. Policy &
Internet 6(1), 28–45 (2014)
18. Hermida, A., Fletcher, F., Korell, D., Logan, D.: Share, like, recommend. Journalism Studies 13(5-6),
815–824 (2012). DOI 10.1080/1461670X.2012.664430
19. Hopkins, D.J., King, G.: A method of automated nonparametric content analysis for social science.
American Journal of Political Science 54(1), 229–247 (2010)
20. Howard, P.N., Duffy, A., Freelon, D., Hussain, M.M., Mari, W., Maziad, M.: Opening closed regimes:
What was the role of social media during the arab spring? SSRN (2011). URL http://dx.doi.org/10.
2139/ssrn.2595096
21. Jungherr, A.: Twitter use in election campaigns: A systematic literature review. Journal of Information
Technology & Politics 13(1), 72–91 (2016). DOI 10.1080/19331681.2015.1132401
22. Kagan, V., Stevens, A., Subrahmanian, V.: Using twitter sentiment to forecast the 2013 pakistani election
and the 2014 indian election. IEEE Intelligent Systems 30(1), 2–5 (2015)
23. Kwon, S., Cha, M., Jung, K., Chen, W., Wang, Y.: Prominent features of rumor propagation in online
social media. In: Data Mining (ICDM), 2013 IEEE 13th International Conference on, pp. 1103–1108.
IEEE (2013)
24. Lerman, K., Ghosh, R.: Information contagion: an empirical study of the spread of news on digg and
twitter social networks. CoRR abs/1003.2664 (2010)
25. Lievrouw, L., Gillespie, T., Boczkowski, P., Foot, K.: Materiality and media in communication and
technology studies: An unfinished project. Media technologies: Essays on communication, materiality,
and society pp. 21–51 (2014)
26. Monti, C., Rozza, A., Zappella, G., Zignani, M., Arvidsson, A., Colleoni, E.: Modelling political disaf-
fection from twitter data. In: Proceedings of the Second International Workshop on Issues of Sentiment
Discovery and Opinion Mining, p. 3. ACM (2013)
27. Murphy, J., Link, M.W., Childs, J.H., Tesfaye, C.L., Dean, E., Stern, M., Pasek, J., Cohen, J., Callegaro,
M., Harwood, P.: Social media in public opinion researchexecutive summary of the aapor task force on
emerging technologies in public opinion research. Public Opinion Quarterly 78(4), 788 (2014). DOI
10.1093/poq/nfu053. URL +http://dx.doi.org/10.1093/poq/nfu053
28. Nulty, P., Theocharis, Y., Popa, S.A., Parnet, O., Benoit, K.: Social media and political communication
in the 2014 elections to the european parliament. Electoral studies 44, 429–444 (2016)
Analyzing Polarization during Political Campaigns 17
29. O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets to polls: Linking text
sentiment to public opinion time series. ICWSM 11(122-129), 1–2 (2010)
30. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information
Retrieval 2(12), 1–135 (2008). DOI 10.1561/1500000011. URL http://dx.doi.org/10.1561/1500000011
31. Talia, D., Trunfio, P., Marozzo, F.: Data Analysis in the Cloud. Elsevier (2015)
32. Tufte, E.R.: The Visual Display of Quantitative Information. Graphics Press, Cheshire, CT, USA (1986)
33. Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with twitter: What 140
characters reveal about political sentiment. ICWSM 10(1), 178–185 (2010)
34. Van Asch, V.: Macro-and micro-averaged evaluation measures. Tech. Rep. (2013)
35. Wagner, J.P.: The media and national identity: Local newspapers coverage of scottish independence
during the campaign of the 2014 scottish independence referendum. In: Dealing with the Local (2017)
36. Zhang, K., Cheng, Y., Xie, Y., Honbo, D., Agrawal, A., Palsetia, D., Lee, K., Liao, W.k., Choudhary,
A.: Ses: Sentiment elicitation system for social media data. In: Data Mining Workshops (ICDMW),
2011 IEEE 11th International Conference on, pp. 129–136. IEEE (2011)
... United States of America is the most commonly used country for social media network political polarization case studies [3,7,16,22], however, it is possible to find works that analyze the political scenario of other countries, such as Brazil [6,19,20], Italy [2,15], Mexico [13], South Korea [11], Finland [14], and Egypt [1]. ...
... As future work, we will continue the research by analyzing social networks during the pre-election and election for president of Brazil in 2022, expanding the research to Telegram, and using the Twitter API for real-time and complete data collection. We will also use tools like Botometer 15 and BotSentinel 16 to track social bot activity, and the Google's Perspective API 17 to detect tweets contaning toxic speech. ...
Conference Paper
Installed in April 2021, the COVID-19 Parliamentary Commission of Inquiry (PCI) aimed to investigate omissions and irregularities committed by the federal government during the COVID pandemic in Brazil, which resulted in the death of more than 660,000 Brazilians and placed it among the countries with the most deaths caused by COVID-19. The investigated government was elected in 2018, in one of the most polarized elections in Brazilian history, and social media played a prominent role in this polarization. Not far from that, the PCI also generated a great popular commotion on social media networks. This paper aims to analyze the public debate related to the PCI of COVID on Twitter, identifying groups, examining their characteristics and interactions, and verifying evidence of political polarization in this social network. For this, we collected 3,397,933 tweets over a period of 26 weeks, and analyzed four distinct networks, based on different types of users interactions, to identify the main actors and verify the presence of segregated groups. In addition, we use natural language preprocessing to detect group characteristics and toxic speech. As a result, we identified three users groups, based on their use of hashtags and using a community detection technique. The group against the PCI is made up of conservatives and supporters of the government targeted by the investigations and presents the highest internal homogeneity. The other two groups, moderated users and opposed to the government, are formed by actors from the most varied political spectrum, containing users from the political left, center, and right, in addition to the main media outlets in the country. Moreover, other evidences of political polarization were found even in less segregated networks, where users from different groups interact with each other, but with the presence of toxic speech.
... There was heightened sensitivity to the potentially corrosive political impacts of social media, and early research had ascribed some of the deep challenges to democracy to social media use (Tucker et al., 2017). These challenges can take many forms: polarisation of opinion (Marozzo & Bessi, 2018), microtargeting of political ads (Tromble et al., 2019;Zarouali et al., 2022), and the creation of echo chambers (Garimella et al., 2018). All of these concerns have been investigated, often during election campaigns, and some evidence of social media effects was identified. ...
Article
Full-text available
The role of social media at electoral events is much speculated upon. Wide-ranging effects, and often critical evaluations, are attributed to commentary, discussions, and advertising on Facebook, Twitter, Telegram, and many other platforms. But the specific effects of these social media during campaigns, especially referendum campaigns, remain under-studied. This thematic issue is a very valuable contribution for precisely this reason. Using the 2018 abortion referendum in Ireland as an illustrative case, this commentary argues for greater research on social media at referendum campaigns, more critical evaluation of the claims and counterclaims about social media effects, often aired widely without substantive evidence, and, finally, for robust, coordinated cross-national regulation of all digital platforms in line with global democratic norms.
... Social media plays a significant role in replacing traditional media, facilitating political engagement, strengthening strategic collaboration, and the potential to influence government decisions regarding politics [27]. Social media is currently widely used to influence socio-political situations, such as in the case of general elections, political campaigns, political movements, and protests [28], [29], [30]. Thus the political situation and discussions are relevant on social media. ...
Article
Full-text available
This study aims to show whether Twitter analysis can predict and forecast candidates in the Indonesian presidential election in 2024. This study was conducted long before the election began. To reduce the gap and utopian attitude in analyzing, two forms of analysis were used at once, namely sentiment analysis and text searches on Twitter data. This study uses a quantitative approach with descriptive content analysis. The data was obtained from Twitter social media, with Twitter Search focusing on official accounts and topics surrounding the 2024 presidential election. The search and data collection first adjusted to the trend of poll results spread in online news. The trend resulting from the poll is used to adjust the names of candidates to be searched for on Twitter search. The analysis tool used also utilizes the Nvivo 12 Plus analysis software. This study succeeded in mapping out three potential candidates in the 2024 election, namely Anies Baswedan, Ganjar Pranowo, and Prabowo Subianto. The mapping of potential candidates also has correspondence with the results of opinion polls in newspapers. From these findings, the information and data on Twitter help make predictions and an alternative to using the poll method. The drawback of this study lies in the limited use of time, so it is recommended that further research be carried out to collect and analyze similar data regularly until the election period. This may indicate that Twitter can predict earlier or better than polls.
... Many researchers focused their studies on the development of applications for big data analysis in various application fields, including trend discovery, social media analytics, pattern mining, sentiment analysis, and opinion mining. For example, from the analysis of large amounts of user data, we can understand human dynamics and behaviors, including the following: (i) the main tourist attractions and also the mobility patterns within a city [5]; (ii) the areas of a city where it is necessary to improve the means of transport [6] or where it is more suitable to open new businesses [7]; (iii) the behavior purchase of users while browsing an ecommerce [8]; (iv) the behavior of fans following important sporting events [9]; and (v) the political orientation of citizens and then estimates the outcome of a political event [10]. ...
Article
Full-text available
With the spread of the Internet of Things, large amounts of digital data are generated and collected from different sources, such as sensors, cameras, in-vehicle infotainment, smart meters, mobile devices, applications, and web services [...]
... Over the last decade, social media has impacted public discourse and communication, particularly in the political context (Kushin and Yamamoto 2010;Wattal et al. 2010;Ratkiewicz et al. 2011;Stieglitz and Dang-Xuan 2013;Jensen 2017;Marozzo and Bessi 2018;Badawy, Ferrara, and Lerman 2018;Ferrara et al. 2020;Sharma, Ferrara, and Liu 2021). Social media has a transformative effect on how political candidates interact with potential voters by adapting their messaging to different demographic groups' specific concerns and interests. ...
Preprint
Full-text available
Social media platforms are currently the main channel for political messaging, allowing politicians to target specific demographics and adapt based on their reactions. However, making this communication transparent is challenging, as the messaging is tightly coupled with its intended audience and often echoed by multiple stakeholders interested in advancing specific policies. Our goal in this paper is to take a first step towards understanding these highly decentralized settings. We propose a weakly supervised approach to identify the stance and issue of political ads on Facebook and analyze how political campaigns use some kind of demographic targeting by location, gender, or age. Furthermore, we analyze the temporal dynamics of the political ads on election polls.
Article
The purpose of influence maximization problem is to select a small seed set to maximize the number of nodes influenced by the seed set. For viral marketing, the problem of influence maximization plays a vital role. Current works mainly focus on the unsigned social networks, which include only positive relationship between users. However, the influence maximization in the signed social networks including positive and negative relationships between users is still a challenging issue. Moreover, the existing works pay more attention to the positive influence. Therefore, this paper first analyzes the positive maximization influence in the signed social networks. The purpose of this problem is to select the seed set with the most positive influence in the signed social networks. Afterwards, this paper proposes a model that incorporates the state of node, the preference of individual and polarity relationship, called Independent Cascade with the Negative and Polarity (ICWNP) propagation model. On the basis of the ICWNP model, this paper proposes a Greedy with ICWNP algorithm. Finally, on four real social networks, experimental results manifest that the proposed algorithm has higher accuracy and efficiency than the related methods.
Article
Lifestyles of individuals have changed drastically in the last two decades with the impact of social media platforms which transforms individuals from being users into an asset of social media. The assets now become very precious and seriously attract who can generate useful or harmful values. In this context, studies conducted in the last 5 years are analyzed based on the methodology covering implementation areas, data sources, data size, methods and tools. The studies were classified and summarized under nine main “research fields,” and a “purpose‐based” classification under three main purposes was investigated. The results have shown that even if data obtained from social media platforms are often preferred in the studies, issues such as compliance with legal regulations, data processing, confidentiality and privacy of data also bring difficulties; collection and processing of social big data are a serious obstacle to the realization of many studies; not enough data sources provided by public or private enterprises; most of the studies carried out on text data, and the rest focused on location and image data; mostly machine learning methods are preferred in applications. This study differs from previous literature reviews by revealing comprehensively how social big data can be transformed into practice with a holistic perspective.
Article
Full-text available
In recent years, political polarization saw a significant rise in many political systems. This revamped a scientific debate sparked decades ago, with different schools of thought debating on dynamics, factors, and causes of polarization itself. By looking at political elites' polarizing strategy-one of the factors on which various theories seem to converge-this article tackles the question concerning the impact of the COVID-19 pandemic in terms of political communication. More specifically, we look at the case of a highly polarizing leader in Italy-Matteo Salvini, leader of Lega-in two campaigns held in 2020 before and after the first wave of the pandemic. By analyzing his messages on Facebook and Twitter, we build on the literature on the causes of affective polarization to study Salvini's use of partisan identity and divisive issues, also considering other crucial elements, such as the attacks against others, and followers' engagement. The results highlight some changes between the two phases, but also a strong continuity in the polarizing strategy of Salvini's political communication.
Article
Full-text available
One of today’s most controversial and consequential issues is whether the global uptake of digital media is causally related to a decline in democracy. We conducted a systematic review of causal and correlational evidence (N = 496 articles) on the link between digital media use and different political variables. Some associations, such as increasing political participation and information consumption, are likely to be beneficial for democracy and were often observed in autocracies and emerging democracies. Other associations, such as declining political trust, increasing populism and growing polarization, are likely to be detrimental to democracy and were more pronounced in established democracies. While the impact of digital media on political systems depends on the specific variable and system in question, several variables show clear directions of associations. The evidence calls for research efforts and vigilance by governments and civil societies to better understand, design and regulate the interplay of digital media and democracy.
Article
Full-text available
La campaña peruana en Twitter. Análisis de la polarización afectiva durante la segunda vuelta de las elecciones generales 2021 The Peruvian campaign on Twitter. Analysis of affective polarization during the second round of the 2021 general elections A campanha peruana no Twitter. Análise da polarização afetiva durante o segundo turno das eleições gerais de 2021 forma de citar Ponte Torrel, J. (2022). La campaña peruana en Twitter. Análisis de la polarización afectiva durante la segunda vuelta de las elecciones generales 2021. Cuadernos.info, (53), 138-161. https://doi. org/10.7764/cdi.53.49539 resumen | Twitter es uno de los espacios digitales que genera un gran atractivo para las candidaturas políticas en épocas de elecciones por su utilidad para difundir propuestas y generar debates coyunturales. Esta investigación analiza la polarización afectiva en los sentimientos y emociones contenidos en las menciones a las candidaturas presidenciales-Pedro Castillo y Keiko Fujimori-durante la segunda vuelta de las elecciones de 2021 en Perú. Se recurrió a la fuente de datos elaborada por el Monitoreo de Redes Sociales de la Dirección Nacional de Educación y Formación Cívica Ciudadana del Jurado Nacional de Elecciones. Se analizó un total de 1.202.297 tuits con el software estadístico R. Los resultados indican que la polarización afectiva se presenta en los sentimientos extremos de los usuarios hacia las candidaturas, e incluso que las cargas emocionales tienden a la inestabilidad temporal de apoyo o del rechazo.
Chapter
Full-text available
The huge amount of data generated, the speed at which it is produced, and its heterogeneity in terms of format, represent a challenge to the current storage, process and analysis capabilities. Those data volumes, commonly referred as Big Data, can be exploited to extract useful information and to produce helpful knowledge for science, industry, public services and in general for humankind. Big Data analytics refer to advanced mining techniques applied to Big Data sets. In general, the process of knowledge discovery from Big Data is not so easy, mainly due to data characteristics, as size, complexity and variety, that require to address several issues. Cloud computing is a valid and cost-effective solution for supporting Big Data storage and for executing sophisticated data mining applications. Big Data analytics is a continuously growing field, so novel and efficient solutions (i.e., in terms of platforms, programming tools, frameworks, and data mining algorithms) spring up everyday to cope with the growing scope of interest in Big Data. This chapter discusses models, technologies and research trends in Big Data analysis on Clouds. In particular, the chapter presents representative examples of Cloud environments that can be used to implement applications and frameworks for data analysis, and an overview of the leading software tools and technologies that are used for developing scalable data analysis on Clouds.
Conference Paper
Full-text available
Social media posts are often tagged with geographical coordinates or other information that allows identifying user positions, this way enabling mobility pattern analysis using trajectory mining techniques. This paper presents a methodology and discusses results of a study aimed at discovering behavior and mobility patterns of Instagram users who visited EXPO 2015, the Universal Exposition hosted in Milan, Italy, from May to October 2015. We collected and analyzed geotagged posts published by about 238,000 Instagram users who visited EXPO 2015, including more than 570,000 posts published during the visits, and 2.63 million posts published by them from one month before to one month after their visit to EXPO. To cope with this large amount of data, the whole process - from data collection to data mining - was implemented on a high-performance cloud platform that provided the necessary storage and compute resources. The analysis allowed us to discover how the number of visitors changed over time, which were the sets of most frequently visited pavilions, which countries the visitors came from, and the main flows of destination of visitors towards Italian cities and regions in the days after their visit to EXPO. A strong correlation (Pearson coefficient 0.7) was measured between official visitor numbers and the visit trends produced by our analysis, which assessed the effectiveness of the proposed methodology and confirmed the reliability of results.
Article
Full-text available
Social media play an increasingly important part in the communication strategies of political campaigns by reflecting information about the policy preferences and opinions of political actors and their public followers. In addition, the content of the messages provides rich information about the political issues and the framing of those issues during elections, such as whether contested issues concern Europe or rather extend pre-existing national debates. In this study, we survey the European landscape of social media using tweets originating from and referring to political actors during the 2014 European Parliament election campaign. We describe the language and national distribution of the messages, the relative volume of different types of communications, and the factors that determine the adoption and use of social media by the candidates. We also analyze the dynamics of the volume and content of the communications over the duration of the campaign with reference to both the EU integration dimension of the debate and the prominence of the most visible list-leading candidates. Our findings indicate that the lead candidates and their televised debate had a prominent influence on the volume and content of communications, and that the content and emotional tone of communications more reflects preferences along the EU dimension of political contestation rather than classic national issues relating to left-right differences.
Article
Full-text available
Public opinion research is entering a new era, one in which traditional survey research may play a less dominant role. The proliferation of new technologies, such as mobile devices and social-media platforms, is changing the societal landscape across which public opinion researchers operate. As these technologies expand, so does access to users� thoughts, feelings, and actions expressed instantaneously, organically, and often publicly across the platforms they use. The ways in which people both access and share information about opinions, attitudes, and behaviors have gone through a greater transformation in the past decade than perhaps in any previous point in history, and this trend appears likely to continue. The ubiquity of social media and the opinions users express on social media provide researchers with new data-collection tools and alternative sources of qualitative and quantitative information to augment or, in some cases, provide alternatives to more traditional data-collection methods. The reasons to consider social media in public opinion and survey research are no different than those of any alternative method. We are ultimately concerned with answering research questions, and this often requires the collection of data in one form or another. This may involve the analysis of data to obtain qualitative insights or quantitative estimates. The quality of data and the ability to help accurately answer research questions are of paramount concern. Other practical considerations include the cost efficiency of the method and the speed at which the data can be collected, analyzed, and disseminated. If the combination of data quality, cost efficiency, and timeliness required by a study can best be achieved through the use of social media, then there is reason to consider these methods for research. An additional reason to consider social media in public opinion and survey research is its explosion in popularity over the past several years. �
Book
Data Analysis in the Cloud introduces and discusses models, methods, techniques, and systems to analyze the large number of digital data sources available on the Internet using the computing and storage facilities of the cloud. Coverage includes scalable data mining and knowledge discovery techniques together with cloud computing concepts, models, and systems. Specific sections focus on map-reduce and NoSQL models. The book also includes techniques for conducting high-performance distributed analysis of large data on clouds. Finally, the book examines research trends such as Big Data pervasive computing, data-intensive exascale computing, and massive social network analysis. Introduces data analysis techniques and cloud computing concepts Describes cloud-based models and systems for Big Data analytics Provides examples of the state-of-the-art in cloud data analysis Explains how to develop large-scale data mining applications on clouds Outlines the main research trends in the area of scalable Big Data analysis
Conference Paper
Online newspapers have been established as a crucial information source, at least partially replacing traditional media like television or print media. As all other media, online newspapers are potentially affected by media bias.This describes non-neutral reporting of journalists and other news producers, e.g. with respect to specific opinions or political parties. Analysis of media bias has a long tradition in political science. However, traditional techniques rely heavily on manual annotation and are thus often limited to the analysis of small sets of articles. In this paper, we investigate a dataset that covers all political and economical news from four leading German online newspapers over a timespan of four years. In order to analyze this large document set and compare the political orientation of different newspapers, we propose a variety of automatically computable measures that can indicate media bias. As a result, statistically significant differences in the reporting about specific parties can be detected between the analyzed online newspapers.
Article
Twitter has become a pervasive tool in election campaigns. Candidates, parties, journalists, and a steadily increasing share of the public are using Twitter to comment on, interact around, and research public reactions to politics. These uses have met with growing scholarly attention. As of now, this research is fragmented, lacks a common body of evidence, and shared approaches to data collection and selection. This article presents the results of a systematic literature review of 127 studies addressing the use of Twitter in election campaigns. In this systematic review, I will discuss the available research with regard to findings on the use of Twitter by parties, candidates, and publics during election campaigns and during mediated campaign events. Also, I will address prominent research designs and approaches to data collection and selection.
Article
Social media played a central role in shaping political debates in the Arab Spring. A spike in online revolutionary conversations often preceded major events on the ground. Social media helped spread democratic ideas across international borders.No one could have predicted that Mohammed Bouazizi would play a role in unleashing a wave of protest for democracy in the Arab world. Yet, after the young vegetable merchant stepped in front of a municipal building in Tunisia and set himself on fire in protest of the government on December 17, 2010, democratic fervor spread across North Africa and the Middle East.Governments in Tunisia and Egypt soon fell, civil war broke out in Libya, and protestors took to the streets in Algeria, Morocco, Syria, Yemen and elsewhere. The Arab Spring had many causes. One of these sources was social media and its power to put a human face on political oppression. Bouazizi’s self-immolation was one of several stories told and retold on Facebook, Twitter, and YouTube in ways that inspired dissidents to organize protests, criticize their governments, and spread ideas about democracy. Until now, most of what we have known about the role of social media in the Arab Spring has been anecdotal.Focused mainly on Tunisia and Egypt, this research included creating a unique database of information collected from Facebook, Twitter, and YouTube. The research also included creating maps of important Egyptian political websites, examining political conversations in the Tunisian blogosphere, analyzing more than 3 million Tweets based on keywords used, and tracking which countries thousands of individuals tweeted from during the revolutions. The result is that for the first time we have evidence confirming social media’s critical role in the Arab Spring.