Conference PaperPDF Available

Political Polarization on Twitter

Political Polarization on Twitter
M. D. Conover, J. Ratkiewicz, M. Francisco, B. Gonc¸alves, A. Flammini, F. Menczer
Center for Complex Networks and Systems Research
School of Informatics and Computing
Indiana University, Bloomington, IN, USA
In this study we investigate how social media shape the
networked public sphere and facilitate communication be-
tween communities with different political orientations. We
examine two networks of political communication on Twit-
ter, comprised of more than 250,000 tweets from the six
weeks leading up to the 2010 U.S. congressional midterm
elections. Using a combination of network clustering algo-
rithms and manually-annotated data we demonstrate that the
network of political retweets exhibits a highly segregated par-
tisan structure, with extremely limited connectivity between
left- and right-leaning users. Surprisingly this is not the case
for the user-to-user mention network, which is dominated by
a single politically heterogeneous cluster of users in which
ideologically-opposed individuals interact at a much higher
rate compared to the network of retweets. To explain the dis-
tinct topologies of the retweet and mention networks we con-
jecture that politically motivated individuals provoke inter-
action by injecting partisan content into information streams
whose primary audience consists of ideologically-opposed
users. We conclude with statistical evidence in support of this
1 Introduction
Social media play an important role in shaping political dis-
course in the U.S. and around the world (Bennett 2003;
Benkler 2006; Sunstein 2007; Farrell and Drezner 2008;
Aday et al. 2010; Tumasjan et al. 2010; O’Connor et al.
2010). According to the Pew Internet and American Life
Project, six in ten U.S. internet users, nearly 44% of Amer-
ican adults, went online to get news or information about
politics in 2008. Additionally, Americans are taking an ac-
tive role in online political discourse, with 20% of internet
users contributing comments or questions about the politi-
cal process to social networking sites, blogs or other online
forums (Pew Internet and American Life Project 2008).
Despite this, some empirical evidence suggests that politi-
cally active web users tend to organize into insular, homoge-
nous communities segregated along partisan lines. Adamic
and Glance (2005) famously demonstrated that political
blogs preferentially link to other blogs of the same politi-
cal ideology, a finding supported by the work of Hargittai,
Copyright c
2011, Association for the Advancement of Artificial
Intelligence ( All rights reserved.
Gallo, and Kane (2007). Consumers of online political in-
formation tend to behave similarly, choosing to read blogs
that share their political beliefs, with 26% more users do-
ing so in 2008 than 2004 (Pew Internet and American Life
Project 2008).
In its own right, the formation of online communities is
not necessarily a serious problem. The concern is that when
politically active individuals can avoid people and informa-
tion they would not have chosen in advance, their opinions
are likely to become increasingly extreme as a result of being
exposed to more homogeneous viewpoints and fewer credi-
ble opposing opinions. The implications for the political pro-
cess in this case are clear. A deliberative democracy relies on
a broadly informed public and a healthy ecosystem of com-
peting ideas. If individuals are exposed exclusively to people
or facts that reinforce their pre-existing beliefs, democracy
suffers (Sunstein 2002; 2007).
In this study we examine networks of political commu-
nication on the Twitter microblogging service during the
six weeks prior to the 2010 U.S. midterm elections. Sam-
pling data from the Twitter ‘gardenhose’ API, we identi-
fied 250,000 politically relevant messages (tweets) produced
by more than 45,000 users. From these tweets we isolated
two networks of political communication — the retweet
network, in which users are connected if one has rebroad-
cast content produced by another, and the mention network,
where users are connected if one has mentioned another in a
post, including the case of tweet replies.
We demonstrate that the retweet network exhibits a highly
modular structure, segregating users into two homogenous
communities corresponding to the political left and right. In
contrast, we find that the mention network does not exhibit
this kind of political segregation, resulting in users being ex-
posed to individuals and information they would not have
been likely to choose in advance.
Finally, we provide evidence that these network structures
result in part from politically motivated individuals annotat-
ing tweets with multiple hashtags whose primary audiences
consist of ideologically-opposed users, a behavior also doc-
umented in the work of Yardi and boyd (2010). We argue
that this process results in users being exposed to content
they are not likely to rebroadcast, but to which they may
respond using mentions, and provide statistical evidence in
support of this hypothesis.
The major contributions of this work are:
Creation and release of a network and text dataset derived
from more than 250,000 politically-related Twitter posts
authored in the weeks preceeding the 2010 U.S. midterm
elections (§2).
Cluster analysis of networks derived from this corpus
showing that the network of retweet exhibits clear seg-
regation, while the mention network is dominated by a
single large community (§3.1).
Manual classification of Twitter users by political align-
ment, demonstrating that the retweet network clusters cor-
respond to the political left and right. These data also
show the mention network to be politically heteroge-
neous, with users of opposing political views interacting
at a much higher rate than in the retweet network (§3.3).
An interpretation of the observed community structures
based on injection of partisan content into ideologically
opposed hashtag information streams (§4).
2 Data and Methods
2.1 The Twitter Platform
Twitter is a popular social networking and microblogging
site where users can post 140-character messages, or tweets.
Apart from broadcasting tweets to an audience of followers,
Twitter users can interact with one another in two primary
public ways: retweets and mentions. Retweets act as a form
of endorsement, allowing individuals to rebroadcast content
generated by other users, thereby raising the content’s vis-
ibility (boyd, Golder, and Lotan 2008). Mentions function
differently, allowing someone to address a specific user di-
rectly through the public feed, or, to a lesser extent, refer
to an individual in the third person (Honeycutt and Herring
2008). These two means of communication —retweets and
mentions— serve distinct and complementary purposes, to-
gether acting as the primary mechanisms for explicit, public
user-user interaction on Twitter.
Hashtags are another important feature of the Twitter plat-
form. They allow users to annotate tweets with metadata
specifying the topic or intended audience of a communica-
tion. For example, #dadt stands for “Don’t Ask Don’t Tell”
and #jlot for “Jewish Libertarians on Twitter.” Each hash-
tag identifies a stream of content, with users’ tag choices de-
noting participation in different information channels.
The present analysis leverages data collected from the
Twitter ‘gardenhose’ API (
streaming_api) between September 14th and Novem-
ber 1st, 2010 — the run-up to the November 4th U.S.
congressional midterm elections. During the six weeks of
data collection we observed approximately 355 million
tweets. Our analysis utilizes an infrastructure and website
( designed to analyze the spread
of information on Twitter, with special focus on political
content (Ratkiewicz et al. 2011).
2.2 Identifying Political Content
Let us define a political communication as any tweet con-
taining at least one politically relevant hashtag. To identify
Table 1: Hashtags related to #p2,#tcot, or both. Tweets
containing any of these were included in our sample.
Just #p2 #casen #dadt #dc10210 #democrats #du1
#fem2 #gotv #kysen #lgf #ofa #onenation
#p2b #pledge #rebelleft #truthout #vote
#vote2010 #whyimvotingdemocrat #youcut
Both #cspj #dem #dems #desen #gop #hcr
#nvsen #obama #ocra #p2 #p21 #phnm
#politics #sgp #tcot #teaparty #tlot
#topprog #tpp #twisters #votedem
Just #tcot #912 #ampat #ftrs #glennbeck #hhrs
#iamthemob #ma04 #mapoli #palin
#palin12 #spwbt #tsot #tweetcongress
#ucot #wethepeople
Table 2: Hashtags excluded from the analysis due to ambigu-
ous or overly broad meaning.
Excl. from #p2 #economy #gay #glbt #us #wc #lgbt
Excl. from both #israel #rs
Excl. from #tcot #news #qsn #politicalhumor
an appropriate set of political hashtags and to avoid intro-
ducing bias into the sample, we performed a simple tag
co-occurrence discovery procedure. We began by seeding
our sample with the two most popular political hashtags,
#p2 (“Progressives 2.0”) and #tcot (“Top Conservatives
on Twitter”). For each seed we identified the set of hashtags
with which it co-occurred in at least one tweet, and ranked
the results using the Jaccard coefficient. For a set of tweets S
containing a seed hashtag, and a set of tweets Tcontaining
another hashtag, the Jaccard coefficient between Sand Tis
σ(S, T ) = |ST|
Thus, when the tweets in which both seed and hashtag oc-
cur make up a large portion of the tweets in which either
occurs, the two are deemed to be related. Using a similar-
ity threshold of 0.005 we identified 66 unique hashtags (Ta-
ble 1), eleven of which we excluded due to overly-broad or
ambiguous meaning (Table 2). This process resulted in a cor-
pus of 252,300 politically relevant tweets. There is substan-
tial overlap between streams associated with different po-
litical hashtags because many tweets contain multiple hash-
tags. As a result, lowering the similarity threshold leads to
only modest increases in the number of political tweets in
our sample — which do not substantially affect the results
of our analysis — while introducing unrelated hashtags.
2.3 Political Communication Networks
From the tweets containing any of the politically rele-
vant hashtags we constructed networks representing political
communication among Twitter users. Focusing on the two
primary modes of public user-user interaction, mentions and
retweets, we define communication links in the following
ways. In the retweet network an edge runs from a node rep-
resenting user Ato a node representing user Bif Bretweets
content originally broadcast by A, indicating that informa-
tion has propagated from Ato B. In the mention network
an edge runs from Ato Bif Amentions Bin a tweet, in-
dicating that information may have propagated from Ato
B(a tweet mentioning Bis visible in B’s timeline). Both
networks therefore represent potential pathways for infor-
mation to flow between users.
The retweet network consists of 23,766 non-isolated
nodes among a total of 45,365. The largest connected com-
ponent accounts for 18,470 nodes, with 102 nodes in the
next-largest component. The mention network is smaller,
consisting of 10,142 non-isolated nodes out of 17,752 to-
tal. It has 7,175 nodes in its largest connected component,
and 119 in the next-largest. Because of their dominance we
focus on the largest connected components for the rest of our
analysis. We observe that the retweet and mention networks
exhibit very similar scale-free topology (power-law degree
distribution not shown), with a number of users receiving or
spreading a huge amount of information.
3 Cluster Analysis
Initial inspection of the retweet suggested that users pref-
erentially retweet other users with whom they agree politi-
cally, while the mention network appeared to form a bridge
between users of different ideologies. We explore this hy-
pothesis in several stages. In §3.1 we use network clustering
algorithms to demonstrate that the retweet network exhibits
two highly segregated communities of users, while the men-
tion network does not. In §3.2 we describe a statistical anal-
ysis of political tweet content, showing that messages pro-
duced by members of the same community are more similar
to each other than messages produced by users in different
communities. Finally, in §3.3, by manually annotating users,
we show that the retweet network is polarized on a partisan
basis, while the mention network is much more politically
3.1 Community Structure
To establish the large-scale political structure of the retweet
and mention networks we performed community detection
using a label propagation method for two communities.1
Label propagation (Raghavan, Albert, and Kumara 2007)
works by assigning an initial arbitrary cluster membership
to each node and then iteratively updating each node’s label
according to the label that is shared by most of its neighbors.
Ties are broken randomly when they occur. Label propaga-
tion is a greedy hill-climbing algorithm. As such it is ex-
tremely efficient, but can easily converge to different sub-
optimal clusters dependent on initial label assignments and
random tie breaking. To improve its effectiveness and stabil-
ity, we seeded the algorithm with initial node labels deter-
mined by the leading-eigenvector modularity maximization
method for two clusters (Newman 2006).
To confirm that we can produce consistent clusters across
different runs we executed the algorithm one hundred times
1While the partisan nature of U.S. political discourse makes two
a natural number of clusters, in §3.3 we describe the effect on our
analysis of increasing the target number of communities.
Table 3: Minimum, maximum, and average ARI similarities
between 4,950 pairs of cluster assignments computed by la-
bel propagation on the mention and retweet networks.
Network Min Max Mean
Mention 0.80 1.0 0.89
Retweet 0.94 0.98 0.96
for each network and compared the label assignments pro-
duced by every run. Table 3 reports the high average agree-
ment between the resulting cluster assignments for each
graph, as computed by the Adjusted Rand Index (Hubert
and Arabie 1985). Such a high agreement suggests that the
clusters are consistent, and therefore we avoid resorting to
consensus clustering for simplicity.
Figure 1 shows the retweet and mention networks, laid
out using a force-directed layout algorithm (Fruchterman
and Reingold 1991), with node colors determined by the
assigned communities. The retweet network exhibits two
distinct communities of users, while the mention network
is dominated by a single massive cluster of interconnected
users. Modularity (Newman and Girvan 2004) resulting
from the cluster assignments offers a first measure of seg-
regation, and reinforces the qualitative finding above. The
modularity induced by the communities in the retweet and
mention networks have values of 0.48 and 0.17, respectively.
A direct comparison of the modularity values is how-
ever problematic because of the different size and overall
connectivity of the two networks. We need a way to com-
pare the ‘goodness’ of cluster assignments across different
graphs. To this end we generate, for both retweet and men-
tion graphs, N= 1000 shuffled versions of the graph that
preserve the original degree sequence.
Each randomized network is clustered with the method
described above for the original graphs and associated with
the resulting modularity value. We use the distribution of
these values as a baseline against which to compare the qual-
ity of the clusters in the original graph. The intuition be-
hind this approach is that the degree to which the actual
graphs are more modular than the shuffled graphs tells us
how amenable each is to being split into two clusters —
a measure of segregation. The modularities of the shuffled
graphs can be viewed as observed values of a random vari-
able. We can use these values to compute z-scores for the
modularities of the original networks; they are zr= 11.02
and zm= 2.06 for the retweet and mention networks, re-
spectively. We conclude that the community structure found
in the retweet network is significantly more segregated than
that found in the mention network.2
In summary, the retweet network contains two clusters of
users who preferentially propagate content within their own
communities. However, we do not find such a structure in
the network of mentions and replies among politically ac-
2This discussion assumes that the modularities of the shuffled
graph cluster assignments are distributed normally, which is not
true in general. See Appendix A for an argument that does not need
this assumption, and reaches the same conclusion.
Figure 1: The political retweet (left) and mention (right) networks, laid out using a force-directed algorithm. Node colors reflect
cluster assignments (see §3.1). Community structure is evident in the retweet network, but less so in the mention network. We
show in §3.3 that in the retweet network, the red cluster A is made of 93% right-leaning users, while the blue cluster B is made
of 80% left-leaning users.
tive Twitter users. This structural difference is of particular
importance with respect to political communication, as we
now have statistical evidence to suggest that mentions and
replies may serve as a conduit through which users are ex-
posed to information and opinions they might not choose in
advance. Despite this promising finding, the work of Yardi
and boyd (2010) suggests that cross-ideological interactions
may reinforce pre-existing in-group/out-group identities, ex-
acerbating the problem of political polarization.
3.2 Content Homogeneity
The clustering described above was based only on the net-
work properties of the retweet and mention graphs. An inter-
esting question, therefore, is whether it has any significance
in terms of the actual content of the discussions involved.
To address this issue we associate each user with a profile
vector containing all the hashtags in her tweets, weighted by
their frequencies. We can then compute the cosine similari-
ties between each pair of user profiles, separately for users
in the same cluster and users in different clusters. Figure 2
shows that in the mention network, users placed in the same
cluster are not likely to be much more similar to each other
than users in different clusters. On the other hand, in the
retweet network, users in cluster A are more likely to have
very similar profiles than users in cluster B, and users in dif-
ferent clusters are the least similar to each other. As a result
the average similarity within retweet clusters is higher than
across clusters. Further, we note that in both mention and
retweet networks, one of the clusters is more cohesive than
the other — meaning the tag usage within one community is
more homogeneous.
Retweet Mention
AA0.31 0.31
BB0.20 0.22
AB0.13 0.26
0 0.2 0.4 0.6 0.8 1
P(cos(a, b))
cos(a, b)
Clusters A
Cluster B
Different clusters
Figure 2: Cosine similarities among user profiles. The table
on the left shows the average similarities in the retweet and
mention networks for pairs of users both in cluster A, both in
cluster B, and for users in different clusters. All differences
are significant at the 95% confidence level. The plot on the
right displays the actual distributions of cosine similarities
for the retweet network.
3.3 Political Polarization
Given the communities of the retweet network identified in
§3.1, their content homogeneity uncovered in §3.2, and
the findings of previous studies, it is natural to investigate
whether the clusters in the retweet network correspond to
groups of users of similar political alignment.
To accomplish this in a systematic, reproducible way we
used a set of techniques from the social sciences known
as qualitative content analysis (Krippendorff 2004; Kolbe
1991). Similar to assigning class labels to training data in su-
pervised machine learning, content analysis defines a set of
practices that enable social scientists to define reproducible
categories for qualitative features of text. Next we outline
our annotation categories, and then explain the procedures
used to establish the rigor of these category definitions.
Our coding goals were simple: for a given user we wanted
to identify whether his tweets express a ‘left’ or ‘right’
political identity, or if his identity is ‘undecidable.’ The
groups primarily associated with a ‘left’ political identity are
democrats and progressives; those primarily associated with
a ‘right’ political identity are republicans, conservatives, lib-
ertarians and the Tea Party. A user coded as ‘undecidable’
may be taking part in a political dialogue, but from the con-
tent of her tweets it is difficult to make a clear determination
about political alignment. Irrelevant non-English and spam
accounts constitute less than 3% of the total corpus and were
excluded from this analysis. We experimented with more de-
tailed categorization rubrics but the simple definitions de-
scribed above yielded the highest inter-annotator agreement
in early trials of the coding process.
Using this coding scheme one author first annotated 1,000
random users who appeared in both the retweet and mention
networks. Annotations were determined solely on the basis
of the tweets present in the six week sample. In line with
the standards of the field, we had a non-author judge with a
broad knowledge of politics annotate 200 random users from
the set of 1,000 to establish the reproducibility of this anno-
tation scheme. The judge was provided a brief overview of
the study and introduced to the coding guidelines described
above, but did not have any other interaction with the authors
during the coding process.
The statistic typically used in the social sciences to mea-
sure the extent to which a coders’ annotations agree with an
objective judge is Cohen’s Kappa, defined as
where P(α)is the observed rate of agreement between an-
notators, and P()is the expected rate of random agreement
given the relative frequency of each class label (Krippen-
dorff 2004; Kolbe 1991). For agreement between the ‘left’
and ‘right’ categories we report κ= 0.80 and κ= 0.82
respectively, both of which fall in the “nearly perfect agree-
ment” range (Landis and Koch 1977). For the undecidable
category we found “fair to moderate” agreement (κ= 0.42),
indicating that there are users for whom a political iden-
tity might be discernible in the context of specific domain
knowledge. To address this issue of context-sensitive ambi-
guity we had a second author also annotate the entire set of
1,000 users. This allowed us to assign a label to a user when
either author was able to determine a political alignment, re-
solving ambiguity in 15.4% of users.
For completeness we also report binomial p-values for ob-
served agreement, treating annotation pairs as observations
from a series of Bernoulli trials. Similar to the Kappa statis-
tic results, inter-annotator agreement for the ‘left’ and ‘right’
categories is very high (p < 1012). Agreement on the ‘un-
decidable’ category is again lower (p= 0.18).
Based on this analysis it is clear that a majority of polit-
ically active users on Twitter express a political identity in
their tweets. Both annotators were unable to determine a po-
litical identity in only 8% of users. A more conservative ap-
proach to label assignment does not change this story much;
Table 4: Partisan composition and size of network clusters
as determined by manual inspection of 1,000 random user
Network Clust. Left Right Undec. Nodes
Retweet A 1.19% 93.4% 5.36% 7,115
B 80.1% 8.71% 11.1% 11,355
Mention A 39.5% 52.2% 8.18% 7,021
B 9.52% 85.7% 4.76% 154
if we assign a political identity only to users for whom both
annotators agree, we report unambiguous political valences
for more than 75% of users.
Using these annotations we can infer the expected politi-
cal makeup of the network communities identified in §3.1.
As shown in Table 4, the network of political retweets ex-
hibits a highly partisan community structure with two ho-
mogenous clusters of users who tend to share the same po-
litical identity. Surprisingly, the mention network does not
exhibit a clear partisan community structure. Instead we find
that it is dominated by a politically heterogeneous cluster ac-
counting for more than 97% of the users, suggests that po-
litically active Twitter users may be exposed to views with
which they do not agree in the form of cross-ideological
Increasing the number of target communities in the men-
tion network does not reveal a more fine-grained ideological
structure, but instead results in smaller yet politically hetero-
geneous clusters. Similarly, the retweet network communi-
ties are maximally homogenous in the case of two clusters.
4 Interaction Analysis
The strong segregation evident in the retweet network and
the fact that the two clusters correspond to political ideolo-
gies suggest that, when engaging in political discourse, users
often retweet just other users with whom they agree politi-
cally. The dominance of the mention network by a single
heterogeneous cluster of users, however, suggests that indi-
viduals of different political alignments may interact with
one another much more frequently using mentions. Let us
test these conjectures, and propose an explanation based on
selective hashtag use by politically motivated individuals.
4.1 Cross-Ideological Interactions
To investigate cross-ideological mentions, we compare the
observed number of links between manually-annotated
users with the value we would expect in a graph where users
connect to one another without any knowledge of political
alignment. The intuition for the expected number of links is
as follows: for a set of users with kdirected edges among
them, we preserve the source of each edge and assign the
target vertex to a random user in the graph, simulating a sce-
nario in which users connected irrespective of political ide-
ology. For example, if there are a total of kRlinks originat-
ing from right-leaning users, and the numbers of left-leaning
and right-leaning users are ULand URrespectively, then the
Table 5: Ratios between observed and expected number of
links between users of different political alignments in the
mention and retweet networks.
Mention Retweet
Left Right Left Right
Left 1.23 0.68 1.70 0.05
Right 0.77 1.31 0.03 2.32
expected number of edges going from right-leaning to left-
leaning users is given by:
E[RL] = kR·UL
We compute the other expected numbers of edges (RR,
LR,LL) in the same way.
In Table 5 we report the ratio between the observed and
expected numbers of links between users of each political
alignment. We see that for both means of communication,
users are more likely to engage people with whom they
agree. This effect, however, is far less pronounced in the
mention network, where we observe significant amounts of
cross-ideological interaction.
4.2 Content Injection
Any Twitter user can select arbitrary hashtags to annotate
his or her tweets. We observe that users frequently produce
tweets containing hashtags that target multiple politically
opposed audiences, and we propose that this phenomenon
may be responsible in part for the network structures de-
scribed in this study.
As a thought experiment, consider an individual who
prefers to read tweets produced by users from the political
left. This user would frequently see the popular hashtag #p2
(“Progressives 2.0”) in the body of tweets produced by other
left-leaning users, as shown in Table 6. However, if this user
clicked on the #p2 hashtag hyperlink in one of these tweets,
or searched for it explicitly, she would be exposed to content
from users on both sides of the political spectrum. In fact,
because of the disproportionate number of tweets produced
by left- and right-leaning users, nearly 30% of the tweets
in the #p2 search feed would originate from right-leaning
A natural question is why a user would annotate tweets
with hashtags strongly associated with ideologically
opposed users. One explanation might be that he seeks
to expose those users to information that reinforces his
political views. Consider the following tweets:
User A: Please follow @Username for
an outstanding progressive voice! #p2
#dems #prog #democrats #tcot
User B: Couple Aborts Twin Boys For
Being Wrong Gender..
#tcot #hhrs #christian #tlot #teaparty
#sgp #p2 #prolife
Table 6: The ten most popular hashtags produced by left- and
right-leaning users in the manually annotated set of users,
including frequency of use in the two retweet communities
and ideological valence.
Rank Hashtag Left Right Valence
1#tcot 2,949 13,574 0.384
2#p2 6,269 3,153 -0.605
3#teaparty 1,261 5,368 0.350
4#tlot 725 2,156 0.184
5#gop 736 1,951 0.128
6#sgp 226 2,563 0.694
7#ocra 434 1,649 0.323
8#dems 953 194 -0.818
9#twisters 41 990 0.843
10 #palin 200 838 0.343
Total 26,341 53,880
These tweets were selected from the first page of the re-
altime search results for the #tcot (“Top Conservatives on
Twitter”) and #p2 hashtags, respectively, and messages in
this style make up a substantial portion of the results.
This behavior does not go unnoticed by users, as under-
scored by the emergence of the left-leaning hashtag #p21.
According to a crowdsourced hashtag definition site (www., #p21 is a hashtag for “Progressives sans
RWNJs” and “Political progressives w/o all the RWNJ spam
that #p2 has,” where RWNJ is an acronym for “Right Wing
NutJob.” This tag appears to have emerged in response to
the efforts by right-leaning users to inject messages into the
high-profile #p2 content stream, and ostensibly serves as a
place where progressives can once again be exposed only to
content aligned with their views.
We propose that when a user is exposed to ideologically
opposed content in this way, she will be unlikely to rebroad-
cast it, but may choose to respond directly to the origina-
tor in the form of a mention. Consequently, the network of
retweets would exhibit ideologically segregated community
structure, while the network of mentions would not.
4.3 Political Valence
To explore the content injection phenomenon in more detail
let us introduce the notion of political valence, a measure
that encodes the relative prominence of a tag among left- and
right-leaning users. Let N(t, L)and N(t, R)be the numbers
of occurrences of tag tin tweets produced by left- and right-
leaning users, respectively. Then define the valence of tas
V(t) = 2 N(t, R)/N(R)
[N(t, L)/N(L)] + [N(t, R)/N(R)] 1(4)
where N(R) = PtN(t, R)is the total number of occur-
rences of all tags in tweets by right-leaning users and N(L)
is defined analogously for left-leaning users. The translation
and scaling constants serve to bound the measure between
1for a tag only used by the left, and +1 for a tag only used
by the right. Table 7 illustrates the usefulness of this measure
by listing hashtags sampled from valence quintiles ranging
Table 7: Hashtags in tweets by users across the political spectrum, grouped by valence quintiles.
Far Left Moderate Left Center Moderate Right Far Right
#judaism #hollywood
#capitalism #recession
#security #dreamact
#aarp #women
#banksters #energy
#stopbeck #iraq
#democrats #social
#seniors #dnc
#budget #political
#goproud #christian
#media #nobel
#rangel #waste
#american #gold
#repeal #mexico
#terrorism #gopleader
#912project #twisters
#gop2112 #israel
#foxnews #mediabias
#patriots #rednov
-10 %
0 %
10 %
20 %
30 %
40 %
50 %
60 %
-1 -0.5 0 0.5 1
Percentage of Mentions
Mean User Valence
Figure 3: Proportion of mentions a user sends and receives
to and from ideologically-opposed users relative to her va-
lence. Points represent binned averages. Error bars denote
95% confidence intervals.
from the far left to the far right, where valence is computed
only for hashtags produced by manually-annotated users.
If hashtag-based content injection is related to the com-
paratively high levels of cross-ideological communication
observed in the mention network, we expect users who use
hashtags in this way to receive proportionally more men-
tions from users with opposing political views. Using com-
munity identities in the retweet network as a proxy for politi-
cal alignment, we plot in Figure 3 the average proportions of
mentions users receive from and direct toward members of
the other community versus the mean valence of all tags pro-
duced by those users. A key finding of this study, these re-
sults indicate that users contributing to a politically balanced
combination of content streams on average receive and pro-
duce more inter-ideological communication than those who
use mostly partisan hashtags. Moreover, Table 6 shows that
the most popular hashtags do not have neutral valence, rul-
ing out that neutral-valence users are simply using the most
popular hashtags.
5 Conclusions
In this study we have demonstrated that the two major mech-
anisms for public political interaction on Twitter — men-
tions and retweets — induce distinct network topologies.
The retweet network is highly polarized, while the mention
network is not. To explain these observations we highlight
the role of hashtags in exposing users to content they would
not likely choose in advance. Specifically, users who apply
hashtags with neutral or mixed valence are more likely to
engage in communication with opposing communities.
Although our findings could be interpreted as encouraging
evidence of cross-ideological political discourse, we empha-
size that these interactions are almost certainly not a panacea
for the problem of political polarization. While we know
for certain that ideologically-opposed users interact with
one another, either through mentions or content injection,
they very rarely share information from across the divide
with other members of their community. It is possible that
these users are unswayed by opposing arguments and facts,
or that the social pressures that lead to group polarization
are too strong for most users to overcome (Sunstein 2002).
Whatever the case, political segregation, as manifested in the
topology of the retweet network, persists in spite of substan-
tial cross-ideological interaction.
Qualitatively speaking, our experience with this body of
data suggests that the content of political discourse on Twit-
ter remains highly partisan. Many messages contain senti-
ments more extreme than you would expect to encounter in
face-to-face interactions, and the content is frequently dis-
paraging of the identities and views associated with users
across the partisan divide. If Yardi and boyd (2010) are cor-
rect, and our experience suggests this may be the case, these
interactions might actually serve to exacerbate the problem
of polarization by reinforcing pre-existing political biases.
Further study of the content of inter-ideological communi-
cation, including sentiment analysis, as well as studies of
network topology that include the follower network, could
help to illuminate this issue.
The fractured nature of political discourse seems to be
worsening, and understanding the social and technological
dynamics underlying this trend will be essential to atten-
uating its effect on the public sphere. We have released a
public dataset based on the information accumulated dur-
ing the course of this study, in hopes that it will help others
explore the role of technologically-mediated political inter-
action in deliberative democracy. The dataset is available at
We are grateful to A. Vespignani, J. Bollen, T. Metaxas and
E. Mustafaraj for helpful discussions, A. Ratkiewicz for pro-
viding annotations to help evaluate our qualitative content
analysis methodology, and S. Patil for assistance in the de-
velopment of the Truthy website. We acknowledge support
from NSF (grant No. IIS-0811994), Lilly Foundation (Data
to Insight Center Research Grant), the Center for Complex
Networks and Systems Research, and the IUB School of In-
formatics and Computing.
A Modularity Distributions
In §3.1 we computed the z-score of our modularity values with re-
spect to the distribution of modularity values that arise from clus-
tering degree-preserving shuffles of the original graph. We used
these z-scores to argue that the retweet network is significantly
more segregated than the mention network. However, this argu-
ment relied on the assumption that the modularities are normally
distributed. By testing the sampled modularities with the omnibus
K2statistic (D’Agostino 1971), we find that this assumption is not
true: we have p < 1010 that variations between the observed data
and the best-fit normal distribution are due to random chance. This
is the case for the modularities sampled based on both the retweet
and mention networks.
Fortunately we can reach the same statistical conclusion without
relying on the assumption of normality. Let us use Chebyshev’s
P(|Xµ| ≥ ) = P(|z| ≥ k)1
which gives us a very conservative bound on the probability that the
random variable X, being the modularity of a sampled graph, will
take on the value of the original graph’s modularity. We therefore
compute zmand zrfor the modularities of clusters in the 1,000
shuffled versions of each of the the mention and retweet networks,
respectively. Using Equation 5 and the modularities of the original
graph clusters, we find:
= 0.24 (6)
= 0.008 (7)
for zm= 2.06 and zr= 11.02 (§3.1). Thus, since a network that
can be clustered as well as the retweet network is much less likely
to arise randomly (relative to the mention network), we confirm
that the retweet network is much more segregated.
Adamic, L., and Glance, N. 2005. The political blogosphere
and the 2004 U.S. election: Divided they blog. In Proc. 3rd
Intl. Workshop on Link Discovery (LinkKDD), 36–43.
Aday, S.; Farrel, H.; Lynch, M.; Sides, J.; Kelly, J.; and
Zuckerman, E. 2010. Blogs and bullets: New media in con-
tentious politics. Technical report, U.S. Institute of Peace.
Benkler, Y. 2006. The Wealth of Networks: How Social Pro-
duction Transforms Markets and Freedom. Yale University
Bennett, L. 2003. New media power: The Internet and global
activism. In Couldry, N., and Curran, J., eds., Contesting
media power: Alternative media in a networked world. Row-
man and Littlefield. 17–37.
boyd, d.; Golder, S.; and Lotan, G. 2008. Tweet, tweet,
retweet: Conversational aspects of retweeting on twitter.
Proc. Hawaii Intl. Conf. on Systems Sciences 1–10.
D’Agostino, R. 1971. An omnibus test of normality for
moderate and large size samples. Biometrika 58(2):341–
Farrell, H., and Drezner, D. 2008. The power and politics of
blogs. Public Choice 134(1):15–30.
Fruchterman, T. M. J., and Reingold, E. M. 1991. Graph
drawing by force-directed placement. Software Practice and
Experience 21(11):1129–1164.
Hargittai, E.; Gallo, J.; and Kane, M. 2007. Cross-
ideological discussions among conservative and liberal
bloggers. Public Choice 134(1):67–86.
Honeycutt, C., and Herring, S. C. 2008. Beyond microblog-
ging: Conversation and collaboration via Twitter. In Proc.
42nd Hawaii Intl Conf. on System Sciences.
Hubert, L., and Arabie, P. 1985. Comparing partitions. Jour-
nal of Classification 2(1):193–218.
Kolbe, R. H. 1991. Content analysis research: An examina-
tion of applications with directives for improving research
reliability and objectivity. The Journal of Consumer Re-
search 18(2):243–250.
Krippendorff, K., ed. 2004. Content Analysis: An Introduc-
tion to Its Methodology. Sage Publications.
Landis, J., and Koch, G. 1977. The measurement of observer
agreement for categorical data. Biometrics 33(1):154–165.
Newman, M. E. J., and Girvan, M. 2004. Finding and eval-
uating community structure in networks. Physical review E
Newman, M. E. J. 2006. Finding community structure in
networks using the eigenvectors of matrices. Physical Re-
view E 74(3):036104.
O’Connor, B.; Balasubramanyan, R.; Routledge, B. R.; and
Smith, N. A. 2010. From tweets to polls: Linking text sen-
timent to public opinion time series. In Proc. 4th Intl. AAAI
Conf. on Weblogs and Social Media (ICWSM).
Pew Internet and American Life Project. 2008. Social net-
working and online videos take off: Internet’s broader role
in campaign 2008. Technical report, Pew Research Center.
Raghavan, U. N.; Albert, R.; and Kumara, S. 2007. Near lin-
ear time algorithm to detect community structures in large-
scale networks. Physical Review E 76(3):036106.
Ratkiewicz, J.; Conover, M.; Meiss, M.; Gonc¸alves, B.;
Patil, S.; Flammini, A.; and Menczer, F. 2011. Truthy: Map-
ping the spread of astroturf in microblog streams. In Proc.
20th Intl. World Wide Web Conf. (WWW).
Sunstein, C. 2002. The law of group polarization. Journal
of Political Philosophy 10(2):175–195.
Sunstein, C. R. 2007. 2.0. Princeton Univer-
sity Press.
Tumasjan, A.; Sprenger, T. O.; Sandner, P. G.; and Welpe,
I. M. 2010. Predicting Elections with Twitter: What 140
Characters Reveal about Political Sentiment. In Proc. 4th
Intl. AAAI Conf. on Weblogs and Social Media (ICWSM).
Yardi, S., and boyd, d. 2010. Dynamic debates: An anal-
ysis of group polarization over time on twitter. Bulletin of
Science, Technology and Society 20:S1–S8.
... Though many studies have shown the existence of polarization on social media, prior studies are limited in two main ways. First, previous research mainly observed polarization on social networks collected for more than a month or years [9,14]. These findings are helpful but too late to create alerts immediately. ...
... Manual partitioning means classifying users based on the analyst's intent, in most cases, political leaning. They are often used by predefined accounts [9,14], use of predefined hashtags [33], or news URLs [2]. The other approach is network-based methods, utilizing graph partitioning methods [13] on retweet or mention networks to obtain groups. ...
... These models assume that users interact more with people of similar opinions. Interactions can be retweets [28,38], followings [4], mentions [9] or multi-relational network (combination of retweets, mentions, likes and follows) [39]. A previous study reported those networkbased text clustering potentially yields better results rather than topic modeling [37]. ...
In recent years, social media has been criticized for yielding polarization. Identifying emerging disagreements and growing polarization is important for journalists to create alerts and provide more balanced coverage. While recent studies have shown the existence of polarization on social media, they primarily focused on limited topics such as politics with a large volume of data collected in the long term, especially over months or years. While these findings are helpful, they are too late to create an alert immediately. To address this gap, we develop a domain-agnostic mining method to identify polarized topics on Twitter in a short-term period, namely 12 hours. As a result, we find that daily Japanese news-related topics in early 2022 were polarized by 31.6\% within a 12-hour range. We also analyzed that they tend to construct information diffusion networks with a relatively high average degree, and half of the tweets are created by a relatively small number of people. However, it is very costly and impractical to collect a large volume of tweets daily on many topics and monitor the polarization due to the limitations of the Twitter API. To make it more cost-efficient, we also develop a prediction method using machine learning techniques to estimate the polarization level using randomly collected tweets leveraging the network information. Extensive experiments show a significant saving in collection costs compared to baseline methods. In particular, our approach achieves F-score of 0.85, requiring 4,000 tweets, 4x savings than the baseline. To the best of our knowledge, our work is the first to predict the polarization level of the topics with low-resource tweets. Our findings have profound implications for the news media, allowing journalists to detect and disseminate polarizing information quickly and efficiently.
... A different line of work with Twitter data, relevant to this study, has addressed political polarization on the social network, but not necessarily electoral outcomes, rather focusing broadly on phenomena such as manipulation, the effects of coordinated behavior and disinformation campaigns [10][11][12] . These studies find that coordinated behavior both for disaster response and disinformation campaigns affects the structural properties of the social network and therefore influences the flow of information [10][11][12][13] . ...
... A different line of work with Twitter data, relevant to this study, has addressed political polarization on the social network, but not necessarily electoral outcomes, rather focusing broadly on phenomena such as manipulation, the effects of coordinated behavior and disinformation campaigns [10][11][12] . These studies find that coordinated behavior both for disaster response and disinformation campaigns affects the structural properties of the social network and therefore influences the flow of information [10][11][12][13] . Importantly, these studies (among others) show that disinformation campaigns exacerbate political polarization, resulting in problems of coordination and even in physical violence between groups with opposing political views [10][11][12][13] . ...
... These studies find that coordinated behavior both for disaster response and disinformation campaigns affects the structural properties of the social network and therefore influences the flow of information [10][11][12][13] . Importantly, these studies (among others) show that disinformation campaigns exacerbate political polarization, resulting in problems of coordination and even in physical violence between groups with opposing political views [10][11][12][13] . Some of these studies have been produced by research groups of multidisciplinary nature, which combine qualitative and quantitative methods to characterize interactions on the social network and compare data from different countries, political processes as well as natural disasters 10-13 . ...
Full-text available
Literature on social networks and elections has focused on predicting electoral outcomes rather than on understanding how the discussions between users evolve over time. As a result, most studies focus on a single election and few comparative studies exist. In this article, a framework to analyze Twitter conversations about the election candidates is proposed. Using DeGroot’s consensus model (an assumption that all users are attempting to persuade others to talk about a candidate), this framework is useful to identify the structure and strength of connections of the mention networks on the months before an election day. It also helps to make comparisons between elections and identify patterns in different contexts. In concrete, it was found that elections in which the incumbent was running have slower convergence (more closed communities with fewer links between them) and that there is no difference between parliamentary and presidential elections. Therefore, there is evidence that the political system and the role of the incumbent in the election influences the way conversations on Twitter occur.
... Twitter is a well explored social media platform when it comes to social media analysis and political polarization studies. M. D. Conover et al. [5] examine two networks of political communication on Twitter, using as a case study the period of the six weeks prior to the 2010 U.S. congressional midterm elections. Even with different contexts and countries, some of the results obtained by the authors are similar to ours. ...
... Since our data collection process started in August 2021, four months after the installation of the PCI, we could not use the Twitter API, as its retrieval time window is more restrictive. Alternatively, we used a scraper of social networking services called snscrape 5 . This Python library allows the collection of Twitter historical data, with no time window limitation. ...
Conference Paper
Installed in April 2021, the COVID-19 Parliamentary Commission of Inquiry (PCI) aimed to investigate omissions and irregularities committed by the federal government during the COVID pandemic in Brazil, which resulted in the death of more than 660,000 Brazilians and placed it among the countries with the most deaths caused by COVID-19. The investigated government was elected in 2018, in one of the most polarized elections in Brazilian history, and social media played a prominent role in this polarization. Not far from that, the PCI also generated a great popular commotion on social media networks. This paper aims to analyze the public debate related to the PCI of COVID on Twitter, identifying groups, examining their characteristics and interactions, and verifying evidence of political polarization in this social network. For this, we collected 3,397,933 tweets over a period of 26 weeks, and analyzed four distinct networks, based on different types of users interactions, to identify the main actors and verify the presence of segregated groups. In addition, we use natural language preprocessing to detect group characteristics and toxic speech. As a result, we identified three users groups, based on their use of hashtags and using a community detection technique. The group against the PCI is made up of conservatives and supporters of the government targeted by the investigations and presents the highest internal homogeneity. The other two groups, moderated users and opposed to the government, are formed by actors from the most varied political spectrum, containing users from the political left, center, and right, in addition to the main media outlets in the country. Moreover, other evidences of political polarization were found even in less segregated networks, where users from different groups interact with each other, but with the presence of toxic speech.
... A major focus of climate-related study of social media has been on ideological polarisation and segregation of differing views on the topic , Cann et al., 2021. Building on earlier work that studied online polarisation and echo chambers in US politics (Adamic and Glance, 2005, Sunstein, 2009, Conover et al., 2011, Bakshy et al., 2015, similar methods were applied to show polarisation affecting online climate discourse, with both environmentalist and climate denial perspectives being put forward in largely isolated echo chambers (Elgesem et al., 2015, Walter et al., 2018, Cann et al., 2021. Such divisions arise out of a long history of contrarian argumentation by climate 'deniers' or 'sceptics' (Hulme, 2009), which has been updated for the digital age with novel aspects such as online bots, artificial amplification and platform manipulation (Treen et al., 2020). ...
Online communication about climate change is central to public discourse around this contested issue. Facebook is a dominant social media platform known to be a major source of information and online influence, yet discussion of climate change on the platform has remained largely unstudied due to difficulties in accessing data. This paper utilises Facebook's repository of social/political ads to study how climate change is framed as an issue in adverts placed by different actors. Sponsored content is a strategic investment and presumably intended to be persuasive, so patterns of who pays for adverts and how those adverts frame the issue can reveal large-scale trends in public discourse. We show that most money spent on climate-related messaging is targeted at users in the US, GB and CA. While the number of advert impressions correlates with total spend by an actor, there is a secondary effect of unpaid social sharing which can substantially affect the number of impressions per dollar spent. Most spend in the US is by political actors, while environmental non-governmental organisations dominate spend in GB. Analysis shows that climate change solutions are well represented in GB, while climate change impacts such as extreme weather events are strongly represented in the US and CA. Different actor types frame the issue of climate change in different ways; political actors position the issue as party political and a point of difference between candidates, whereas environmental NGOs frame climate change as the focus of collective action and social mobilisation. Overall, our study provides a first empirical exploration of climate-related advertising on Facebook. It shows the diversity of actors seeking to use Facebook as a platform for their campaigns and how they utilise different topic frames to persuade users to act.
... While much of the literature on polarisation in social media focuses on political issues [1,2,3], there is no reason why the principles cannot be extended to discourse on disease control. In this paper we will examine the indicators of controversy and polarisation in two agricultural diseases: bovine tuberculosis and bovine viral diarrhoea. ...
Full-text available
Approaches to disease control are influenced by and reflected in public opinion, and the two are intrinsically entwined. Bovine tuberculosis (bTB) in British cattle and badgers is one example where there is a high degree of polarisation in opinion. Bovine viral diarrhoea (BVD), on the other hand, does not have the same controversy. In this paper we examine how language subjectivity on Twitter differs when comparing the discourses surrounding bTB and BVD, using a combination of network analysis and language and sentiment analysis. That data used for this study was collected from the Twitter public API over a two-year period. We investigated the network structure, language content, and user profiles of tweets featuring both diseases. While analysing network structure showed little difference between the two disease topics, elements of the structure allowed us to better investigate the language structure and profile of users. We found distinct differences between the language and sentiment used in tweets about each disease, and in the profile of the users who were doing the tweeting. We hope that this will guide further investigation and potential avenues for surveillance or the control of misinformation.
... In the real world, opinions do not always reach a consensus, as evidenced by political division in numerous countries in the last decade [19,1,26]. Theories from psychology and data from social media platforms both suggest that individuals interact preferentially with people whose opinions are similar to their own [3,4,8,34]. From a mathematical modelling point of view, the first mechanism shown to produce polarization on networks was bounded confidence, in models proposed by Krause [25] and Deffuant et al. [9]. ...
Full-text available
Mean-field equations have been developed recently to approximate the dynamics of the Deffuant model of opinion formation. These equations can describe both fully-mixed populations and the case where individuals interact only along edges of a network. In each case, interactions only occur between individuals whose opinions differ by less than a given parameter, called the confidence bound. The size of the confidence bound parameter is known to strongly affect both the dynamics and the number and location of opinion clusters. In this work we carry out a mathematical analysis of the mean-field equations to investigate the role of the confidence bound and boundaries on these important observables of the model. We consider the limit in which the confidence bound interval is small, and identify the key mechanisms driving opinion evolution. We show that linear stability analysis can predict the number and location of opinion clusters. Comparison with numerical simulations of the model illustrates that the early-time dynamics and the final cluster locations can be accurately approximated for networks composed of two degree classes, as well as for the case of a fully-mixed population.
... On the other hand, our theory suggests that empirical networks inferred on the basis of digital trace data [67] may be inherently biased by the activity of users who learnt that interaction on the media is rewarding. In fact, our model suggests that retrieved interaction patterns such as retweet networks [68][69][70] may render a situation more polarized than it actually is, because public expression is less rewarding for actors who maintain relations across different opinion camps. Research on Twitter has also shown that retweets and replies give rise to very different global patterns of group interaction [61] suggesting that they serve rather different communicative functions. ...
Full-text available
What are the mechanisms by which groups with certain opinions gain public voice and force others holding a different view into silence? Furthermore, how does social media play into this? Drawing on neuroscientific insights into the processing of social feedback, we develop a theoretical model that allows us to address these questions. In repeated interactions, individuals learn whether their opinion meets public approval and refrain from expressing their standpoint if it is socially sanctioned. In a social network sorted around opinions, an agent forms a distorted impression of public opinion enforced by the communicative activity of the different camps. Even strong majorities can be forced into silence if a minority acts as a cohesive whole. On the other hand, the strong social organisation around opinions enabled by digital platforms favours collective regimes in which opposing voices are expressed and compete for primacy in public. This paper highlights the role that the basic mechanisms of social information processing play in massive computer-mediated interactions on opinions.
... According to this principle, a tendency to form a friendship between like-minded people gets higher. Following homophily principle, previous studies asserted that online social media users tend to polarize their opinions and form partisan political communities (Conover et al., 2011). Socio-political systems are complex due to multi-part, multi-dimensional, non-trivial relationship patterns. ...
Full-text available
Large-scale human social network structure is typically inferred from digital trace samples of online social media platforms or mobile communication data. Instead, here we investigate the social network structure of a complete population, where people are connected by high-quality links sourced from administrative registers of family, household, work, school, and next-door neighbors. We examine this multilayer social opportunity structure through three common concepts in network analysis: degree, closure, and distance. Findings present how particular network layers contribute to presumably universal scale-free and small-world properties of networks. Furthermore, we suggest a novel measure of excess closure and apply this in a life-course perspective to show how the social opportunity structure of individuals varies along age, socio-economic status, and education level. Our work provides new entry points to understand individual socio-economic failure and success as well as persistent societal problems of inequality and segregation.
Full-text available
The digital revolution and the widespread use of the internet have changed many realms of empirical social science research. In this paper, we discuss the use of big data in the context of development sociology and highlight its potential as a new source of data. We provide a brief overview of big data and development research, discuss different data types, and review example studies, before introducing our case study on active citizenship in Tanzania which expands on an Oxfam-led impact evaluation. The project aimed at improving community-driven governance and accountability through the use of digital technology. Twitter and other social media platforms were introduced to community animators as a tool to hold national and regional key stakeholders accountable. We retrieve the complete Twitter timelines up to October 2021 from all ~200 community animators and influencers involved in the project (over 1.5 million tweets). We find that animators have started to use Twitter as part of the project, but most have stopped tweeting in the long term. Employing a dynamic difference-in-differences design, we also do not find effects of Oxfam-led training workshops on different aspects of animators' tweeting behavior. While most animators have stopped using Twitter in the long run, a few have continued to use social media to raise local issues and to be part of conversations to this day. Our case study showcases how (big) social media data can be part of an intervention, and we end with recommendations on how to use digital data in development sociology.
Full-text available
The principle of homophily says that people associate with other groups of people who are mostly like themselves. Many online communities are structured around groups of socially similar individuals. On Twitter, however, people are exposed to multiple, diverse points of view through the public timeline. The authors captured 30,000 tweets about the shooting of George Tiller, a late-term abortion doctor, and the subsequent conversations among pro-life and pro-choice advocates. They found that replies between like-minded individuals strengthen group identity, whereas replies between different-minded individuals reinforce in-group and out-group affiliation. Their results show that people are exposed to broader viewpoints than they were before but are limited in their ability to engage in meaningful discussion. They conclude with implications for different kinds of social participation on Twitter more generally.
Full-text available
The rise of bloggers raises the vexing question of why blogs have any influence at all, given their relatively low readership and lack of central organization. We argue that to answer this question we need to focus on two key factors—the unequal distribution of readers across weblogs, and the relatively high readership of blogs among journalists and other political elites. The unequal distribution of readership, combined with internal norms and linking practices allows interesting news and opinions to rise to the “top” of the blogosphere, and thus to the attention of elite actors, whose understanding of politics may be changed by frames adopted from the blogosphere.
Conference Paper
Full-text available
The microblogging service Twitter is in the process of being appropriated for conversational interaction and is starting to be used for collaboration, as well. In an attempt to determine how well Twitter supports user-to-user exchanges, what people are using Twitter for, and what usage or design modifications would make it (more) usable as a tool for collaboration, this study analyzes a corpus of naturally-occurring public Twitter messages (tweets), focusing on the functions and uses of the @ sign and the coherence of exchanges. The findings reveal a surprising degree of conversationality, facilitated especially by the use of @ as a marker of addressivity, and shed light on the limitations of Twitter's current design for collaborative use.
Conference Paper
Full-text available
Twitter - a microblogging service that enables users to post messages ("tweets") of up to 140 characters - supports a variety of communicative practices; participants use Twitter to converse with individuals, groups, and the public at large, so when conversations emerge, they are often experienced by broader audiences than just the interlocutors. This paper examines the practice of retweeting as a way by which participants can be "in a conversation." While retweeting has become a convention inside Twitter, participants retweet using different styles and for diverse reasons. We highlight how authorship, attribution, and communicative fidelity are negotiated in diverse ways. Using a series of case studies and empirical data, this paper maps out retweeting as a conversational practice.
Conference Paper
Full-text available
We connect measures of public opinion measured from polls with sentiment measured from text. We analyze several surveys on consumer confidence and political opinion over the 2008 to 2009 period, and find they correlate to sentiment word frequencies in contempora- neous Twitter messages. While our results vary across datasets, in several cases the correlations are as high as 80%, and capture important large-scale trends. The re- sults highlight the potential of text streams as a substi- tute and supplement for traditional polling.
How and Why Groups Polarize
We present a test of normality based on a statistic D which is up to a constant the ratio of Downton's linear unbiased estimator of the population standard deviation to the sample standard deviation. For the usual levels of significance Monte Carlo simulations indicate that Cornish-Fisher expansions adequately approximate the null distribution of D if the sample size is 50 or more. The test is an omnibus test, being appropriate to detect deviations from normality due either to skewness or kurtosis. Simulation results of powers for various alternatives when the sample size is 50 indicate that the test compares favourably with the Shapiro-Wilk W test,√1, b2 and the ratio of range to standard deviation.
In this paper, we study the linking patterns and discussion topics of political bloggers. Our aim is to measure the degree of interaction between liberal and conservative blogs, and to uncover any differences in the structure of the two communities. Specifically, we analyze the posts of 40 "A-list" blogs over the period of two months preceding the U.S. Presidential Election of 2004, to study how often they referred to one another and to quantify the overlap in the topics they discussed, both within the liberal and conservative communities, and also across communities. We also study a single day snapshot of over 1,000 political blogs. This snapshot captures blogrolls (the list of links to other blogs frequently found in sidebars), and presents a more static picture of a broader blogosphere. Most significantly, we find differences in the behavior of liberal and conservative blogs, with conservative blogs linking to each other more frequently and in a denser pattern.
In a striking empirical regularity, deliberation tends to move groups, and the individuals who compose them, toward a more extreme point in the direction indicated by their own predeliberation judgments. For example, people who are opposed to the minimum wage are likely, after talking to each other, to be still more opposed; people who tend to support gun control are likely, after discussion, to support gun control with considerable enthusiasm; people who believe that global warming is a serious problem are likely, after discussion, to insist on severe measures to prevent global warming. This general phenomenon -- group polarization -- has many implications for economic, political, and legal institutions. It helps to explain extremism, "radicalization," cultural shifts, and the behavior of political parties and religious organizations; it is closely connected to current concerns about the consequences of the Internet; it also helps account for feuds, ethnic antagonism, and tribalism. Group polarization bears on the conduct of government institutions, including juries, legislatures, courts, and regulatory commissions. There are interesting relationships between group polarization and social cascades, both informational and reputational. Normative implications are discussed, with special attention to political and legal institutions.