ChapterPDF Available


Microblogging is a new form of communication in which users describe their current status in short posts distributed by instant messages, mobile phones, email or the Web. We present our observations of the microblogging phenomena by studying the topological and geographical properties of the social network in Twitter, one of the most popular microblogging systems. We find that people use microblogging primarily to talk about their daily activities and to seek or share information. We present a taxonomy characterizing the the underlying intentions users have in making microblogging posts. By aggregating the apparent intentions of users in implicit communities extracted from the data, we show that users with similar intentions connect with each other.
Why We Twitter: Understanding Microblogging
Usage and Communities
Akshay Java
University of Maryland Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250, USA
Xiaodan Song
NEC Laboratories America
10080 N. Wolfe Road, SW3-350
Cupertino, CA 95014, USA
Tim Finin
University of Maryland Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250, USA
Belle Tseng
NEC Laboratories America
10080 N. Wolfe Road, SW3-350
Cupertino, CA 95014, USA
Microblogging is a new form of communication in which
users can describe their current status in short posts dis-
tributed by instant messages, mobile phones, email or the
Web. Twitter, a popular microblogging tool has seen a lot
of growth since it launched in October, 2006. In this paper,
we present our observations of the microblogging phenom-
ena by studying the topological and geographical properties
of Twitter’s social network. We find that people use mi-
croblogging to talk about their daily activities and to seek
or share information. Finally, we analyze the user intentions
associated at a community level and show how users with
similar intentions connect with each other.
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Information
Search and Retrieval - Information Filtering; J.4 [Computer
Applications]: Social and Behavioral Sciences - Economics
General Terms
Social Network Analysis, User Intent, Microblogging, Social
Microblogging is a relatively new phenomenon defined as “a
form of blogging that lets you write brief text updates (usu-
ally less than 200 characters) about your life on the go and
send them to friends and interested observers via text mes-
saging, instant messaging (IM), email or the web.” 1.Itis
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Joint 9th WEBKDD and 1st SNA-KDD Workshop ’07 , August 12, 2007 ,
San Jose, California , USA . Copyright 2007 ACM 1-59593-444-8...$5.00.
provided by several services including Twitter2,Jaiku
more recently Pownce4. These tools provide a light-weight,
easy form of communication that enables users to broadcast
and share information about their activities, opinions and
status. One of the popular microblogging platforms is Twit-
ter [29]. According to ComScore, within eight months of its
launch, Twitter had about 94,000 users as of April, 2007 [9].
Figure 1 shows a snapshot of the first author’s Twitter home-
page. Updates or posts are made by succinctly describing
one’s current status within a limit of 140 characters. Top-
ics range from daily life to current events, news stories, and
other interests. IM tools including Gtalk, Yahoo and MSN
have features that allow users to share their current status
with friends on their buddy lists. Microblogging tools facili-
tate easily sharing status messages either publicly or within
Figure 1: An example Twitter homepage with up-
dates talking about daily experiences and personal
Compared to regular blogging, microblogging fulfills a need
for an even faster mode of communication. By encourag-
ing shorter posts, it lowers users’ requirement of time and
thought investment for content generation. This is also one
of its main differentiating factors from blogging in general.
The second important difference is the frequency of update.
On average, a prolific bloger may update her blog once ev-
ery few days; on the other hand a microblogger may post
several updates in a single day.
With the recent popularity of Twitter and similar microblog-
ging systems, it is important to understand why and how
people use these tools. Understanding this will help us
evolve the microblogging idea and improve both microblog-
ging client and infrastructure software. We tackle this prob-
lem by studying the microblogging phenomena and analyz-
ing different types of user intentions in such systems.
Much of research in user intention detection has focused on
understanding the intent of a search queries. According to
Broder [5], the three main categories of search queries are
navigational, informational and transactional. Understand-
ing the intention for a search query is very different from
user intention for content creation. In a survey of bloggers,
Nardi et al. [26] describe different motivations for “why
we blog”. Their findings indicate that blogs are used as a
tool to share daily experiences, opinions and commentary.
Based on their interviews, they also describe how bloggers
form communities online that may support different social
groups in real world. Lento et al. [21] examined the im-
portance of social relationship in determining if users would
remain active in a blogging tool called Wallop. A user’s re-
tention and interest in blogging could be predicted by the
comments received and continued relationship with other
active members of the community. Users who are invited by
people with whom they share pre-exiting social relationships
tend to stay longer and active in the network. Moreover, cer-
tain communities were found to have a greater retention rate
due to existence of such relationships. Mutual awareness in
a social network has been found effective in discovering com-
munities [23].
In computational linguists, researchers have studied the prob-
lem of recognizing the communicative intentions that un-
derlie utterances in dialog systems and spoken language in-
terfaces. The foundations of this work go back to Austin
[2], Stawson [32] and Grice [14]. Grosz [15] and Allen [1]
carried out classic studies in analyzing the dialogues be-
tween people and between people and computers in coopera-
tive task oriented environments. More recently, Matsubara
[24] has applied intention recognition to improve the per-
formance of automobile-based spoken dialog system. While
their work focusses on the analysis of ongoing dialogs be-
tween two agents in a fairly well defined domain, studying
user intention in Web-based systems requires looking at both
the content and link structure.
In this paper, we describe how users have adopted a spe-
cific microblogging platform, Twitter. Microblogging is rel-
atively nascent, and to the best of our knowledge, no large
scale studies have been done on this form of communication
and information sharing. We study the topological and geo-
graphical structure of Twitter’s social network and attempt
to understand the user intentions and community structure
in microblogging. From our analysis, we find that the main
types of user intentions are: daily chatter, conversations,
sharing information and reporting news. Furthermore, users
play different roles of information source, friends or informa-
tion seeker in different communities.
The paper is organized as follows: in Section 2, we describe
the dataset and some of the properties of the underlying
social network of Twitter users. Section 3 provides an anal-
ysis of Twitter’s social network and its spread across geogra-
phies. Next, in Section 4 we describe aggregate user behav-
ior and community level user intentions. Section 5 provides
a taxonomy of user intentions. Finally, we summarize our
findings and conclude with Section 6.
Twitter is currently one of the most popular microblogging
platforms. Users interact with this system by either using a
Web interface, IM agent or sending SMS updates. Members
may choose to make their updates public or available only to
friends. If user’s profile is made public, her updates appear
in a “public timeline” of recent updates. The dataset used
in this study was created by monitoring this public timeline
for a period of two months starting from April 01, 2007 to
May 30, 2007. A set of recent updates were fetched once
every 30 seconds. There are a total of 1,348,543 posts from
76,177 distinct users in this collection.
Twitter allows a user, A, to “follow” updates from other
members who are added as “friends”. An individual who is
not a friend of user A but “follows” her updates is known as
a “follower”. Thus friendships can either be reciprocated or
one-way. By using the Twitter developer API5, we fetched
the social network of all users. We construct a directed
graph G(V, E), where Vrepresents a set of users and E
represents the set of “friend” relations. A directed edge e
exists between two users uand vif user udeclares vas
a friend. There are a total of 87,897 distinct nodes with
829,053 friend relation between them. There are more nodes
in this graph due to the fact that some users discovered
though the link structure do not have any posts during the
duration in which the data was collected. For each user, we
also obtained their profile information and mapped their
location to a geographic coordinate, details of which are
provided in the following section.
This section describes some of the characteristic properties
of Twitter’s Social Network including it’s network topology
and geographical distribution.
3.1 Growth of Twitter
Since Twitter provides a sequential user and post identifier,
we can estimate the growth rate of Twitter. Figure 2 shows
the growth rate for users and Figure 3 shows the growth rate
for posts in this collection. Since, we do not have access to
historical data, we can only observe its growth for a two
month time period. For each day we identify the maximum
value for the user identifier and post identifier as provided
Max UserID
April - May 2007
Twitter Growth Rate (Users)
Growth of Users
Figure 2: Twitter User Growth Rate. Figure shows
the maximum userid observed for each day in the
dataset. After an initial period of interest around
March 2007, the rate at which new users are joining
Twitter has slowed.
by the Twitter API. By observing the change in these val-
ues, we can roughly estimate the growth of Twitter. It is
interesting to note that even though Twitter launched in
2006, it really became popular soon after it won the South
by SouthWest (SXSW) conference Web Awards6in March,
2007. Figure 2 shows the initial growth in users as a result
of interest and publicity that Twitter generated at this con-
ference. After this period, the rate at which new users are
joining the network has slowed. Despite the slow down, the
number of new posts is constantly growing, approximately
doubling every month indicating a steady base of users gen-
erating content.
Following Kolari et al. [18], we use the following definition
of user activity and retention:
Definition A user is considered active during a week if he
or she has posted at least one post during that week.
Definition An active user is considered retained for the
given week, if he or she reposts at least once in the following
X weeks.
Due to the short time period for which the data is available
and the nature of Microblogging we decided to use X as a
period of one week. Figure 4 shows the user activity and
retention for the duration of the data. About half of the
users are active and of these half of them repost in the fol-
lowing week. There is a lower activity recorded during the
last week of the data due to the fact that updates from the
public timeline are not available for two days during this
3.2 Network Properties
The Web, blogosphere, online social networks and human
contact networks all belong to a class of “scale-free net-
works” [3] and exhibit a “small world phenomenon” [33]. It
Max PostID
April - May 2007
Twitter Growth Rate (Posts)
Growth of Posts
Figure 3: Twitter Posts Growth Rate. Figure shows
the maximum post ID observed for each day in the
dataset. Although the rate at which new users are
joining the network has slowed, the number of posts
are increasing at a steady rate.
has been shown that many properties including the degree
distributions on the Web follow a power law distribution
[19, 6]. Recent studies have confirmed that some of these
properties also hold true for the blogosphere [31].
Property Twitter WWE
Tot a l N o des 87897 143,736
Tot a l L i n k s 829247 707,761
Average Degree 18.86 4.924
Indegree Slope -2.4 -2.38
Outdegree Slope -2.4 NA
Degree correlation 0.59 NA
Diameter 612
Largest WCC size 81769 107,916
Largest SCC size 42900 13,393
Clustering Coefficient 0.106 0.0632
Reciprocity 0.58 0.0329
Table 1: Twitter Social Network Statistics
Table 1 describes some of the properties for Twitter’s social
network. We also compare these properties with the corre-
sponding values for the Weblogging Ecosystems Workshop
(WWE) collection [4] as reported by Shi et al. [31]. Their
study shows a network with high degree correlation (also
shown in Figure 6) and high reciprocity. This implies that
there are a large number of mutual acquaintances in the
graph. New Twitter users often initially join the network
on invitation from friends. Further, new friends are added
to the network by browsing through user profiles and adding
other known acquaintances. High reciprocal links has also
been observed in other online social networks like Livejour-
nal [22]. Personal communication and contact network such
as cell phone call graphs [25] also have high degree corre-
lation. Figure 5 shows the cumulative degree distributions
[27, 8] of Twitter’s network. It is interesting to note that
the slopes γin and γout are both approximately -2.4. This
value for the power law exponent is similar to that found for
User Activity and Retention
Number of Users
Retained Users Active Users
Figure 4: Twitter User Activity and Retention
Continent Number of Users
North America 21064
Europe 7442
Asia 6753
Oceania 910
South America 816
Africa 120
Others 78
Unknown 38994
Table 2: Table shows the geographical distribution
of Twitter users. North America, Europe and Asia
have the highest adoption of Twitter.
the Web (typically -2.1 for indegree [11]) and blogosphere
(-2.38 for the WWE collection).
3.3 Geographical Distribution
Twitter provides limited profile information such as name,
bio, timezone and location. For the 76K users in our collec-
tion about 39K had specified locations that could be parsed
correctly and resolved to their respective latitude and longi-
tudinal coordinates (using Yahoo! Geocoding API7). Fig-
ure 7 and Table 2 shows the geographical distribution of
Twitter users and the number of users in each continent.
Twitter is most popular in US, Europe and Asia (mainly
Japan). Tokyo, New York and San Francisco are the major
cities where user adoption of Twitter is high [16].
Twitter’s popularity is global and the social network of its
users crosses continental boundaries. By mapping each user’s
latitude and longitude to a continent location we can extract
the origin and destination location for every edge. Table 3
shows the distribution of friendship relations across major
continents represented in the dataset. Oceana is used to
represent Australia, New Zealand and other island nations.
A significant portion (about 45%) of the Social Network still
lies within North America. Moreover, there are more intra-
101Indegree Distribution of Twitter Social Network
Indegree K
P(x K)
γin = −2.412
(a) Indegree Distribution
102Outdegree Distribution of Twitter Social Network
Outdegree K
P(x K)
γout = −2.412
(b) Outdegree Distribution
Figure 5: Twitter social network has a power law
exponent of about -2.4 which is similar to the Web
and blogosphere.
continent links than across continents. This is consistent
with observations that the probability of friendship between
two users is inversely proportionate to their geographic prox-
imity [22].
In Table 4, we compare some of the network properties
across these three continents with most users: North Amer-
ica, Europe and Asia. For each continent the social network
is extracted by considering only the subgraph where both the
source and destination of the friendship relation belong to
the same continent. Asian and European communities have
a higher degree correlation and reciprocity than their North
American counterparts. Language plays an important role
is such social networks. Many users from Japan and Span-
ish speaking world connect with others who speak the same
language. In general, users in Europe and Asia tend to have
higher reciprocity and clustering coefficient values in their
corresponding subgraphs.
1 10 100 1000 10000
Outdegree (follows)
Indegree (followed by)
Twitter Social Network Scatter Plot
Correlation Coefficient = 0.59
Figure 6: Scatter plot showing the degree correla-
tion of Twitter social network. A high degree corre-
lation signifies that users who are followed by many
people also have large number of friends.
Distribution of Twitter Users Across the World
180° W 135° W 90° W 45° W 0° 45° E 90° E 135° E 180° E
90° S
45° S
45° N
90° N
Figure 7: Figure shows the global distribution of
Twitter users. Though initially launched in US
Twitter is popular across the world.
In this paper, we propose a two-level framework for user in-
tention detection. First, we used the HITS algorithm [17] to
find the hubs and authorities in the network. Hubs and au-
thorities have a mutually reinforcing property and are com-
puted as follows: H(p) represents the hub value of the page
pandA(p) represents the authority value of a page p.
Table 5 shows a listing of top ten hubs and authorities. From
this list, we can see that some users have high authority
score, and also high hub score. For example, Scobleizer, Ja-
sonCalacanis, bloggersblog, and Webtickle who have many
followers and friends in Twitter are located in this category.
Some users with very high authority scores have relatively
low hub score, such as Twitterrific, ev, and springnet. They
have many followers while less friends in Twitter, and thus
are located in this category. Some other users with very
from-to Asia Europe Oceana N.A S.A Africa
Asia 13.45 0.64 0.10 5.97 0.005 0.01
Europe 0.53 9.48 0.25 6.16 0.17 0.02
Oceana 0.13 0.40 0.60 1.92 0.02 0.01
N.A 5.19 5.46 1.23 45.60 0.60 0.10
S.A 0.06 0.26 0.02 0.75 0.62 0.00
Africa 0.01 0.03 0.00 0.11 0.00 0.03
Table 3: Table shows the distribution of Twitter
social network links across continents. Most of the
social network lies within North America. (N.A =
North America, S.A = South America)
Property N.A Europe Asia
Tot a l N o des 16,998 5201 4886
Tot a l E d g e s 205,197 42,664 60519
Average Degree 24.15 16.42 24.77
Degree Correlation 0.62 0.78 0.92
Clustering Coefficient 0.147 0.54 0.18
Percent Reciprocity 62.64 71.62 81.40
Table 4: Network properties of social networks
within a continent. Europe and Asia have a higher
reciprocity indicating closer ties in these social net-
works. (N.A = North America)
high hub scores have relatively low authority scores, such as
dan7, startupmeme, and aidg. They follow many other users
while have less friends instead. Based on this rough cate-
gorization, we can see that user intention can be roughly
categorized into these 3 types: information sharing, infor-
mation seeking, and friendship-wise relationship.
After the hub/authority detection, we identify communi-
ties within friendship-wise relationships by only considering
the bidirectional links where two users regard each other as
friends. A community in a network can be vaguely defined
as a group of nodes more densely connected to each other
than to nodes outside the group. Often communities are
topical or based on shared interests. To construct web com-
munities, Flake et. al. [12] proposed a method using HITS
User Authority User Hub
Scobleizer 0.002354 Webti ckle 0.003655
Twitterrific 0.001765 Scobleizer 0.002338
ev 0.001652 dan7 0.002079
JasonCalacanis 0.001557 startupmeme 0.001906
springnet 0.001525 aidg 0.001734
bloggersblog 0.001506 lisaw 0.001701
chrispirillo 0.001503 bhartzer 0.001599
darthvader 0.001367 bloggersblog 0.001559
ambermacarthur 0.001348 JasonCalacanis 0.001534
Table 5: Top 10 Hubs and Authorities in Twitter.
Some of the top authorities are also popular blog-
gers. Top hubs include users like startupmeme and
aidg which are microblogging versions of a blogs and
other web sites.
and maximize flow/minimize cut to detect communities. In
social network area, Newman and Girvan [13, 7] proposed a
metric called modularity to measure the strength of the com-
munity structure. The intuition is that a good division of a
network into communities is not merely to make the num-
ber of edges running between communities small; rather, the
number of edges between groups is smaller than expected.
Only if the number of between group edges is significantly
lower than what would be expected purely by chance can we
justifiably claim to have found significant community struc-
ture. Based on the modularity measure of the network, op-
timization algorithms are proposed to find good divisions of
a network into communities by optimizing the modularity
over possible divisions. Also, this optimization process can
be related to the eigenvectors of matrices. However, in the
above algorithms, each node has to belong to one commu-
nity, while in real networks, communities often overlap. One
person can serve a totally different functionality in different
communities. In an extreme case, one user can serve as the
information source in one community and the information
seeker in another community.
People in friendship communities often know each other.
Prompted by this intuition, we applied the Clique Perco-
lation Method (CPM) [28, 10] to find overlapping commu-
nities in networks. The CPM is based on the observation
that a typical member in a community is linked to many
other members, but not necessarily to all other nodes in the
same community. In CPM, the k-clique-communities are
identified by looking for the unions of all k-cliques that can
be reached from each other through a series of adjacent k-
cliques, where two k-cliques are said to be adjacent if they
share k-1 nodes. This algorithm is suitable for detecting the
dense communities in the network.
Here we list a few specific examples of how communities form
in Twitter and why users consist of these communities - what
user intentions are in each community. Figure 8 illustrates a
representative community with 58 users closely communicat-
ing with each other through Twitter service. The key terms
they talk about include work, Xbox, game, and play. It
looks like some users with gaming interests getting together
to discuss the information about certain new products on
this topic or sharing gaming experience. When we go to
specific users website, we also find this type of conversation:
“BDazzler@Steve519 I don’t know about the Jap PS3’s. I
think they have region encoding, so you’d only be able to play
Jap games. Euro has no ps2 chip” or “BobbyBlackwolf Play-
ing with the PS3 firmware update, can’t get WMP11 to share
MP4’s and the PS3 won’t play WMV’s or AVI’s...Fail.” We
also noticed that users in this community also share with
each other their personal feeling and daily life experiences
in addition to comments on “gaming”. Based on our study
of the communities in Twitter dataset, we observed that this
is a representative community in Twitter network: people
in one community have certain common interests and they
also share with each other about their personal feeling and
daily experience.
Using CPM, we are able to find how communities connected
to each other by overlapped components. Figure 9 illustrates
two communities with podcasting interests where GSPN and
pcamarata are the ones who connected these two communi-
ties. In GSPN’s bio, he mentioned he is the Producer of the
Generally Speaking Podcast Network8; while in pcamarata’s
bio, he mentioned he is a family man, a neurosurgeon, and
a a podcaster. By looking at the top key terms of these two
communities, we can see that the focus of the green com-
munity is a little more diversified: people occasionally talk
about podcasting, while the topic of the red community is
a little more focused. In a sense, the red community is like
a professional community of podcasting while the green one
is a informal community about podcasting.
Figure 10 illustrates five communities connected by Scobleizer,
who is a Tech geek blogger. People follow his posts to
get technology news. People in different communities share
different interests with Scobleizer. Specifically, AndruEd-
wards, Scobleizer, daryn, and davidgeller get together to
share video related news. CaptSolo et al. have some inter-
ests on Semantic Web. AdoMatic et al. are engineers and
have interests with coding related issues.
Studying intentions at a community level, we observe users
participate in communities which share similar interests. In-
dividuals may have different intentions for joining these com-
munities. While some act as information providers, oth-
ers are merely looking for new and interesting information.
Next, we analyze aggregate trends across users spread over
many communities, we can identify certain distinct themes.
Often there are recurring patterns in word usages. Such
patterns may be observed over a day or a week. For exam-
ple Figure 11 shows the trends for the terms “friends” and
“school” in the entire corpus. While school is of interest
during weekdays, friends take over on the weekends. The
Figure 11: Daily Trends for terms “school” and
“friends”. The term school is more frequent during
the early week, friends take over during the week-
log-likelihood ratio is used to determine terms that are of
significant importance for a given day of the week. Using a
technique described by Rayson and Garside [30], we create
a contingency table of term frequencies for a given day and
the rest of the week.
Key Terms
just:273 com:225
work:185 like:172
good:168 going:157
got:152 time:142
live:136 new:133
xbox:125 tinyurl:122
today:121 game:115
playing:115 twitter:109
day:108 lol:10
play:100 halo:100
night:90 home:89
getting:88 need:86
think:85 gamerandy:85
ll:85 360:84
watching:79 want:78
Figure 8: An example of a “gaming” community who also share daily experiences.
Day Rest of the Week Tot a l
Fre q o f word a b a+b
Freq of other words c-a d-b c+d-a-b
Tot a l c d c+d
Comparing the terms that occur on a given day with the
histogram of terms for the rest of the week, we find the most
descriptive terms. The log-likelihood score is calculated as
LL =2(alog(a
where E1=ca+b
c+dand E2=da+b
Figure 12 shows the most descriptive terms for each day
of the week. Some of the extracted terms correspond to
recurring events and activities significant for a particular
day of the week for example “school” or “party”. Other
terms are related to current events like “easter” and “EMI”.
Following section presents a brief taxonomy of user inten-
tions on Twitter. The apparent intention of a Twitter post
was determined manually by the first author. Each post
was read and categorized. Posts that were highly ambigu-
ous or for which the author could not make a judgement were
placed in the category UNKNOWN. Based on this analysis
we have found following are some of the main user intentions
on Twitter:
Daily Chatter Most posts on Twitter talk about daily
routine or what people are currently doing. This is the
largest and most common user of Twitter
Conversations In Twitter, since there is no direct way
for people to comment or reply to their friend’s posts,
early adopters started using the @ symbol followed by
a username for replies. About one eighth of all posts
in the collection contain a conversation and this form
of communication was used by almost 21% of users in
the collection.
Key Terms
going:222 just:218
work:170 night:143
bed:140 time:139
good:137 com:130
lost:124 day:122
listening:111 today:100
new:98 got:97
watching:92 kids:88
morning:81 twitter:79
getting:77 tinyurl:75
lunch:74 like:72
podcast:72 watch:71
ready:70 tv:69
need:64 live:61
tonight:61 trying:58
love:58 cliff:58
Key Terms
just:312 com:180
work:180 time:149
listening:147 home:145
going:139 day:134
got:126 today:124
good:116 bed:114
night:112 tinyurl:97
getting:88 podcast:87
dinner:85 watching:83
like:78 mass:78
lunch:72 new:72
ll:70 tomorrow:69
ready:64 twitter:62
working:61 tonight:61
morning:58 need:58
great:58 finished:55
Figure 9: Example of how two communities connect to each other
Sharing information/URLs About 13% of all the posts
in the collection contain some URL in them. Due to
the small character limit a URL shortening service like
TinyURL9is frequently used to make this feature fea-
Reportin g new s Many users report latest news or com-
ment about current events on Twitter. Some auto-
mated users or agents post updates like weather re-
ports and new stories from RSS feeds. This is an in-
teresting application of Twitter that has evolved due
to easy access to the developer API.
Using the link structure, following are the main categories
of users on Twitter:
Information Source An information source is also a
hub and has a large number of followers. This user
may post updates on regular intervals or infrequently.
Despite infrequent updates, certain users have a large
number of followers due to the valuable nature of their
updates. Some of the information sources were also
found to be automated tools posting news and other
useful information on Twitter.
Fri e n d s Most relationships fall into this broad cate-
gory. There are many sub-categories of friendships
on Twitter. For example a user may have friends,
family and co-workers on their friend or follower lists.
Sometimes unfamiliar users may also add someone as
a friend.
Information Seeker An information seeker is a person
who might post rarely, but follows other users regu-
Our study has revealed different motivations and utilities
of microblogging platforms. A single user may have multi-
ple intentions or may even serve different roles in different
communities. For example, there may be posts meant to
com:175 twitter:134
just:133 like:86
good:82 tinyurl:75
time:74 new:74
jasona:73 going:68
day:63 don:61
work:58 think:56
ll:54 scottw:54
today:52 hkarthik:50
nice:49 getting:47
got:47 really:46
yeah:44 need:43
watching:41 love:41
night:40 home:40
com:93 twitter:74
just:35 new:32
tinyurl:29 going:24
ll:22 blog:21
jaiku:21 don:21
leo:21 flickr:21
like:19 video:18
google:18 today:18
feeds:18 getting:16
yeah:16 good:15
com:93 twitter:76 tinyurl:34
just:32 new:28 video:26
going:24 ll:22 jaiku:22
blog:21 leo:21 like:19
don:19 gamerandy:19 yeah:18
google:17 live:16 people:16
got:16 know:15 time:15
com:121 twitter:76 just:50
ustream:43 tv:42 live:42
today:39 hawaii:36 day:33
new:33 time:33 good:33
video:32 leo:30 work:30
like:28 watching:28 tinyurl:28
com:198 twitter:132 just:109
tinyurl:87 going:59 blog:56
like:55 good:51 new:50
url:50 day:49 people:46
time:45 today:45 google:42
don:41 think:40 night:38
ll:38 need:35 got:33
ireland:33 great:31 looking:29
work:29 thanks:28 video:26
Figure 10: Example Communities in Twitter Social Network. Key terms indicate that these communities
are talking mostly about technology. The user Scobliezer connects multiple communities in the network.
update your personal network on a holiday plan or a post
to share an interesting link with co-workers. Multiple user
intentions have led to some users feeling overwhelmed by
microblogging services [20]. Based on our analysis of user
intentions, we believe that the ability to categorize friends
into groups (e.g. family, co-workers) would greatly benefit
the adoption of microblogging platforms. In addition fea-
tures that could help facilitate conversations and sharing
news would be beneficial.
In this study we have analyzed a large social network in a
new form of social media known as microblogging. Such net-
works were found to have a high degree correlation and reci-
procity, indicating close mutual acquaintances among users.
While determining an individual user’s intention in using
such applications is challenging, by analyzing the aggregate
behavior across communities of users, we can describe the
community intention. Understanding these intentions and
learning how and why people use such tools can be helpful
in improving them and adding new features that would re-
tain more users. In this work, we have identified different
types of user intentions and studied the community struc-
tures. Currently, we are working on automated approaches
of detecting user intentions with related community struc-
We would like to thank Twitter Inc. for providing an API
to their service and Pranam Kolari, Xiaolin Shi and Amit
Karandikar for their suggestions.
[1] J. Allen. Recognizing intentions from natural language
utterances. Computational Models of Discourse, pages
107–166, 1983.
[2] J. Austin. How to Do Things with Words.Oxford
University Press Oxford, 1976.
[3] A.-L. Barabasi and R. Albert. Emergence of scaling in
random networks. Science, 286:509, 1999.
Figure 12: Distinctive terms for each day of the week
ranked using Log-likelihood ratio.
[4] Blogpulse. The 3rd annual workshop on weblogging
ecosystem: Aggregation, analysis and dynamics, 15th
world wide web conference, May 2006.
[5] A. Broder. A taxonomy of web search. SIGIR Forum,
36(2):3–10, 2002.
[6] A. Broder, R. Kumar, F. Maghoul, P. Raghavan,
S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener.
Graph structure in the web. In Proceedings of the 9t h
international World Wide Web conference on
Computer networks : the international journal of
compu ter and teleco mmu nicati ons netowrking , pages
309–320, Amsterdam, The Netherlands, The
Netherlands, 2000. North-Holland Publishing Co.
[7] A. Clauset, M. E. J. Newman, and C. Moore. Finding
community structure in very large networks. Physical
Review E, 70:066111, 2004.
[8] A. Clauset, C. R. Shalizi, and M. E. J. Newman.
Power-law distributions in empirical data, Jun 2007.
[9] Comscore.
[10] I. Derenyi, G. Palla, and T. Vicsek. Clique percolation
in random networks. Physical Review Letters,
94:160202, 2005.
[11] D. Donato, L. Laura, S. Leonardi, and S. Millozzi.
Large scale properties of the webgraph. European
Physical Journal B, 38:239–243, March 2004.
[12] G. W. Flake, S. Lawrence, C. L. Giles, and F. Coetzee.
Self-organization of the web and identification of
communities. IEEE Computer, 35(3):66–71, 2002.
[13] M. Girvan and M. E. J. Newman. Community
structure in social and biological networks, Dec 2001.
[14] H. Grice. Utterers meaning and intentions.
Philosophical Review, 78(2):147–177, 1969.
[15] B. J. Grosz. Focusing and Description in Natural
Language Dialogues. Cambridge University Press, New
York, New York, 1981.
[16] A. Java.
[17] J. M. Kleinberg. Authoritative sources in a
hyperlinked environment. Journal of the ACM,
46(5):604–632, 1999.
[18] P. Kolari, T. Finin, Y. Yesha, Y. Yesha, K. Lyons,
S. Perelgut, and J. Hawkins. On the Structure,
Properties and Utility of Internal Corporate Blogs. In
Proceedin gs of the I ntern ational Co nferen ce on
Weblogs and Social Media (ICWSM 2007),March
[19] R. Kumar, P. Raghavan, S. Rajagopalan, and
A. Tomkins. Trawling the Web for emerging
cyber-communities. Computer Networks (Amsterdam,
Netherlands: 1999), 31(11–16):1481–1493, 1999.
[20] A. Lavallee. Friends swap twitters, and frustration -
new real-time messaging services overwhelm some
users with mundane updates from friends, March 16,
[21] T. Lento, H. T. Welser, L. Gu, and M. Smith. The ties
that blog: Examining the relationship between social
ties and continued participation in the wallop
weblogging system, 2006.
[22] D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan,
and A. Tomkins. Geographic routing in social
networks. Proceedin gs of the Nationa l Acade my of
Sciences,, 102(33):11623–1162, 2005.
[23] Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, and
B. Tseng. Discovery of Blog Communities based on
Mutual Awareness. In Proceedings of the 3rd Annual
Workshop on Weblogging Ecosystem: Aggregation,
Analysis and Dynamics, 15th World Wid Web
Conference, May 2006.
[24] S. Matsubara, S. Kimura, N. Kawaguchi,
Y. Yamaguchi, and Y. Inagaki. Example-based Speech
Intention Understanding and Its Application to In-Car
Spoken Dialogue System. Proceed ing s of th e 19th
international conference on Computational
linguistics-Volume 1, pages 1–7, 2002.
[25] A. A. Nanavati, S. Gurumurthy, G. Das,
D. Chakraborty, K. Dasgupta, S. Mukherjea, and
A. Joshi. On the structural properties of massive
telecom call graphs: findings and implications. In
CIKM ’06: Proceedings of the 15th ACM international
conference on Information and knowledge
management, pages 435–444, New York, NY, USA,
2006. ACM Press.
[26] B. A. Nardi, D. J. Schiano, M. Gumbrecht, and
L. Swartz. Why we blog. Commun. ACM,
47(12):41–46, 2004.
[27] M. E. J. Newman. Power laws, pareto distributions
and zipf’s law. Contemporary Physics, 46:323, 2005.
[28] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek.
Uncovering the overlapping community structure of
complex networks in nature and society. Nature,
435:814, 2005.
[29] J. Pontin. From many tweets, one loud voice on the
internet. The New York Times, April 22, 2007.
[30] P. Rayson and R. Garside. Comparing corpora using
frequency profiling, 2000.
[31] X. Shi, B. Tseng, and L. A. Adamic. Looking at the
blogosphere topology through different lenses. In
Proceedings of the International Conference on
Weblogs and Social Media (ICWSM 2007),March
[32] P. Strawson. Intention and Convention in Speech
Acts. The Philosophical Review, 73(4):439–460, 1964.
[33] D. J. Watts and S. H. Strogatz. Collective dynamics of
‘small-world’ networks. Nature, 393(6684):440–442,
June 1998.
... Furthermore, Hsu and Lin [13] found that their community identification highly impacted a blog participant's desire to continue using blogs; users were eager to write because of their community identification. Moreover, people in one community on Twitter have particular common interests and share their personal feelings and daily experience [39]. Hence, Twitter users with the same interest in disaster events will intend to share the disaster information with others on social media. ...
... The intention is predicted by three constructs: attitude, subjective norm, and perceived behavioral control [42]. Java et al. [39] found that one of the main user intentions in Twitter posts is to share information, and about 13% of all the posts from Twitter users are for information sharing. Therefore, Twitter users will share disaster information with others because they intend to share it. ...
... Community identification was confirmed to be a powerful predictor of the intention of sharing disaster information on Twitter (H4: β =0.01, t=0.53, p=0.0047). People in one community with particular common interests share their personal feelings and daily experience with others [39]. Moreover, people who have the same interests, goals, objectives, and beliefs will respond similarly in a similar situation [13]. ...
Full-text available
Twitter has become a major platform for disseminating disaster news, providing people with disaster information quickly and precisely. A lot of essential and valuable information can be obtained from this online platform. Twitter users might be able to help with warnings and submit specific and accurate information in a disaster situation. This research attempts to examine factors that affect disaster information-sharing behavior. Furthermore, this study aims to integrate personal, environmental, and information platform factors to gain more insight into the factors influencing Twitter users' willingness to share disaster information. The hypotheses were tested using Partial Least Squares Structural Equation Modeling (PLS-SEM). The result showed that Altruism, Self-efficacy, Community Identity, and Information Platforms significantly influence people's decisions to share disaster information on Twitter.
... They interact with other users through retweets, @reply, and likes. The rate of retweets was approximately 3% and 12% of randomly selected tweets contained conversation [48]. In contrast to the random sample data, retweets comprised 49% of the dataset, @reply contributed 19%, and the rest were original tweets. ...
Full-text available
As firms and customers seek new synergies, value co-creation emerges as an imperative that straddles almost all sectors. This paper corroborates the manifestation of customer value co-creation behaviour in the context of the automobile industry. Distinct from conventional studies based on survey instruments, this study examines Twitter data, using data mining techniques (content, sentiment, and descriptive analysis) to analyse 82,236 customer tweets to study customer value co-creation behaviour. Content analysis was used to identify distinct themes. Further, the themes were mapped to the existing literature on value co-creation behaviour and additionally validated and categorised by eleven industry experts into relevant dimensions. The study verified the existence of value co-creation behaviour comprising participation and citizenship behaviour in the context of the automobile industry. There was evidence of two distinct customer cohorts, one who exhibited positive sentiments and others who were more negative regarding participation and citizenship behaviour. These groups demonstrated significant differences in loyalty towards the firm or brand. This exploratory study contributes significantly to understanding customer value co-creation behaviour, especially in the less researched context of the automobile sector. Theoretically, the results are meaningful as they validate the two-dimensional nature of customer value co-creation behaviour. Second, the findings of this study offer a strong argument for recognising value co-creation behaviour as a significant precursor to loyalty outcomes and proposing a conceptual framework based on the identified relationship network. For the practitioner, the findings validate the need to understand and encourage customer value co-creation behaviour to enhance customer stickiness and, consecutively, the firm’s profitability.
... Alternatively, Twitter is an "information intermediary" allowing users to create new information, compile existing information, and disseminate it to their followers. This includes daily chatter, conversations, sharing information, and reporting news (Java et al. 2009). The platform is also important for retail since, relative to other social networks, users are more likely to keep the brand central to a message and less likely to include selfpromotion (Smith et al. 2012). ...
Full-text available
This paper answers how changes in social media activity influence customers to visit nationally known, brick-and-mortar retail stores. We consider seven measures of social media activity within a Social Impact Theory framework and test under what context does online chatter about a brand lead to higher foot traffic to those brand stores. We use hierarchical linear regression to account for the random effects of brand and store heterogeneity, which is superior to ordinary linear regression. Despite wide variation, when brand mentions increase by one standard deviation—either in likes or disagreement—then next-day foot traffic to stores of that brand will increase by 0.04 standard deviations (3–4%). This modest but meaningful effect, however, fully dissipates within 1 week. The weak cross-brand effects show that social media has distinct and larger influence on brands individually rather than universally. Our approach is novel due to (1) the large scale of data, (2) the breadth of analysis, (3) the multi-level specification, and (4) in estimating global elasticities between changes in electronic word-of-mouth (WoM) communication about brands and changes in store visits of those brands.
... One of the most representative examples of social media is Twitter, a microblogging service that was launched in 2006 and that allows users to publish short messages, the so-called tweets, which are up to 140 characters long. Most people use the service to report latest news or to comment live events (Java et al. 2007). The messages posted by such kind of users tend to reflect a variety of events as they happen. ...
In this paper, we deal with the task of sub-event detection in evolving events using posts collected from the Twitter stream. By representing a sequence of successive tweets in a short time interval as a weighted graph-of-words, we are able to identify the key moments (sub-events) that compose an event using the concept of graph degeneracy. We then select a tweet to best describe each sub-event using a simple yet effective heuristic. We evaluated our approach using humangenerated summaries containing the actual important sub-events within each event and compare it to two baseline approaches using several performance metrics such as DET curves and precision/recall performance. Extensive experiments on recent sporting event streams indicate that our approach outperforms the dominant sub-event detection methods and constructs a humanreadable event summary by aggregating the most representative tweets of each sub-event.
Full-text available
The internet has made possible a number of powerful new forms of influence, some of which are invisible to users and leave no paper trails, which makes them especially problematic. Some of these effects are also controlled almost exclusively by a small number of multinational tech monopolies, which means that, for all practical purposes, these effects cannot be counteracted. In this paper, we introduce and quantify an effect we call the Targeted Messaging Effect (TME)-the differential impact of sending a consequential message, such as a link to a damning news story about a political candidate, to members of just one demographic group, such as a group of undecided voters. A targeted message of this sort might be difficult to detect, and, if it had a significant impact on recipients, it could undermine the integrity of the free-and-fair election. We quantify TME in a series of four randomized, controlled, counterbalanced, double-blind experiments with a total of 2,133 eligible US voters. Participants were first given basic information about two candidates who ran for prime minister of Australia in 2019 (this, to assure that our participants were "undecided"). Then they were instructed to search a set of informational tweets on a Twitter simulator to determine which candidate was stronger on a given issue; on balance, these tweets favored neither candidate. In some conditions, however, tweets were occasionally interrupted by targeted messages (TMs)-news alerts from Twitter itself-with some alerts saying that one of the candidates had just been charged with a crime or had been nominated for a prestigious award. In TM groups, opinions shifted significantly toward the candidate favored by the TMs, and voting preferences shifted by as much as 87%, with only 2.1% of participants in the TM groups aware that they had been viewing biased content.
One way to understand such formation is by using opinion dynamics model. Most opinion dynamics models are based in the interaction between two individuals as the means to spread information. This is no exception for opinion dynamics models in the context of social media. The interaction is essential for opinion and group formation as the main interest of many researches. However, most social media are not used for personal one-to-one interaction but instead one-to-many such as for news sharing or status updates in Twitter and Facebook.
Full-text available
This study explores the relationship between Twitter happiness and gold price in the US using wavelet analysis covering daily data from September 2008 to April 2019. We test our main hypothesis that investor attention from Twitter as a news and social medium has a nexus with the gold price. The results suggest that (i) Twitter happiness and gold price exhibit a strong correlation in both time and frequency domains; (ii) Twitter happiness leads the gold price suggesting the direction of causality from Twitter sentiment to gold price; (iii) in the post-crisis period, the gold price has experienced a stable rise and the Twitter sentiment is letting the gold price. Thus, we indicate that Twitter's mood can forecast the gold price. Our findings imply that investors can take a cue from Twitter sentiment in strategizing their gold investment decisions.
Background Citizen science is a growing practice in which volunteers, including non-scientists, conduct or contribute to research by collecting and analyzing data. The increasing importance of citizen science in the last years has led to an increased interest in detecting how citizen science can contribute to scientific advancements in different areas. Recent research shows that citizen science has become a means of engagement between scientist and the public, encouraging scientific curiosity and promoting scientific knowledge. Methods In this article, we report on how to apply computational analysis techniques to Twitter messages to reveal the impact of citizen science in health-related areas. The main objectives are (1) to characterize central topics of these discussions, and (2) to identify particularly important actors in these social media networks. Results For the topics, our findings suggest that sustainable development goals, technologies and health, and COVID-19 are those most addressed by the users. Other topics represented in the data are cancer, public health, mental health, and health and well being of sea and earth living creatures related to sustainable development goals. Conclusion Based on our results, those entities or actors who are most cited and retweeted are Twitter accounts of projects and not primarily individual professionals or citizen scientists.
Conference Paper
The study of the Web as a graph is not only fascinating in its own right, but also yields valuable insight into Web algorithms for crawling, searching and community discovery, and the sociological phenomena which characterize its evolution. We report on experiments on local and global properties of the Web graph using two AltaVista crawls each with over 200 million pages and 1.5 billion links. Our study indicates that the macroscopic structure of the Web is considerably more intricate than suggested by earlier experiments on a smaller scale.
Networks of coupled dynamical systems have been used to model biological oscillators, Josephson junction arrays, excitable media, neural networks, spatial games, genetic control networks and many other self-organizing systems. Ordinarily, the connection topology is assumed to be either completely regular or completely random. But many biological, technological and social networks lie somewhere between these two extremes. Here we explore simple models of networks that can be tuned through this middle ground: regular networks 'rewired' to introduce increasing amounts of disorder. We find that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. We call them 'small-world' networks, by analogy with the small-world phenomenon (popularly known as six degrees of separation. The neural network of the worm Caenorhabditis elegans, the power grid of the western United States, and the collaboration graph of film actors are shown to be small-world networks. Models of dynamical systems with small-world coupling display enhanced signal-propagation speed, computational power, and synchronizability. In particular, infectious diseases spread more easily in small-world networks than in regular lattices.
Sites for social butterflies
  • Comscore