Conference PaperPDF Available

Connecting the dots – deriving insights into social behaviour around metro transport nodes through social media data in Singapore and beyond


Abstract and Figures

There is increasing interest in understanding the use of urban space around heavily trafficked areas like transport interchanges; both in terms of their cultural social commercial use (Cha, 2001) but also in the context of 'last mile' journeys. However, this type of use is often not captured in traditional census, survey or transportation data. This research focuses on the use of social media data to develop insight into how spaces around public transportation stations are used in a comparative study of metro stations in several Asian cities. It focuses on finding similarities and differences between such spaces, including how different master planning and key urban amenities affect use. Specifically, this research aids in devising strategies to promote wider use and interconnectedness of area around stations and to understand the impact of large dense agglomerated building typologies, which are becoming a feature especially in Asian cities (Frampton et al., 2012).
Content may be subject to copyright.
Connecting the dots deriving insights into
social behaviour around metro transport nodes
through social media data in Singapore and
Verina Cristie*, Ate Poorthuis*, and Sam Conrad Joyce*
There is increasing interest in understanding the use of urban space around
heavily trafficked areas like transport interchanges; both in terms of their
cultural social commercial use (Cha, 2001) but also in the context of ‘last
mile’ journeys. However, this type of use is often not captured in traditional
census, survey or transportation data. This research focuses on the use of
social media data to develop insight into how spaces around public trans-
portation stations are used in a comparative study of metro stations in several
Asian cities. It focuses on finding similarities and differences between such
spaces, including how different master planning and key urban amenities
affect use. Specifically, this research aids in devising strategies to promote
wider use and interconnectedness of area around stations and to understand
the impact of large dense agglomerated building typologies, which are be-
coming a feature especially in Asian cities (Frampton et al., 2012).
*Singapore University of Technology and Design, 487372, Singapore
Email (corresponding):
1. Introduction
The collection of social media data represents a powerful insight into the
activities of people which would usually be opaque to researchers. As social
media usage has increased in popularity and grown throughout the world it
is becoming a tool to capture a significant proportion of a populations sen-
timent (Pak & Paroubek 2010). Indeed many of the services generate their
main revenue by leveraging their ability to use this data to help decision
makers, have been shown to be able to be successfully applied to marketing
products and services, finance (Bollen et al 2011) and even elections
(Tumasjan et al 2010).
Services such as Twitter offer even richer information as they also in
many cases record the user’s profiles age, gender, nationality but also the
location of each message giving a unique insight into the thoughts and ac-
tivities of different type of people or people in specific areas. Spatialized
data has been shown to have applications such as giving unique views on
people for detecting illnesses break outs faster than conventional methods
(Paul et al 2014) so that better preventative actions can be taken.
Whilst the examples covered refer to insight relevant to border policy
level decisions for government, companies or marketers; there is potential
to apply such methods to provide based on a more localized insight into a
neighborhood’s users activities. This could have significant to urban design-
ers and architects who look into how space is utilized. Conventional ap-
proaches to recording such activity whilst valuable is typically very expen-
sive and time consuming (Whyte 1980). Methods such as computer vision
offer automated ways to record urban use however this is not very scalable
over time or distance as it requires cameras to record every place, but also it
is currently difficult to work out population activities automatically en
masse in this way.
Social media however, offers messages with data potentially represent-
ing self-tagged activities or opinions by location and time. Whilst this infor-
mation is limited by it being provided by a set of self-selected individuals it
none the less provides are unique window into a socio-geographic area. One
which can be pointed to anywhere in the world with significant media users
and one which can be processed digitally to compare different places.
In this paper we look at the applicability of this as a data source to inform
planning around metro stations. We look at this in Asia specifically as Asian
cities are both those which are growing rapidly and modernizing building
new metro lines. The Asian context is especially interesting as there tends
to be much more concentration of heavily and intentionally programmed
space such as malls or business centers allowing for more focused analysis
of the relationship between zoning and actual activities being undertaken.
Here we look to see if we can correlate the urban configuration and the ac-
tivities and see if we can compare across different cities and stations. Inves-
tigating how insight in one place might be understood and applied else-
2. Methodology
A range of different approaches have been used in studies of social media
platform data before: focusing on message content, user, technology, and
concept (Williams et al., 2013). This paper applies geotagged messages sent
through the social media platform Twitter to gain insights into the social use
of urban spaces around transportation nodes in Singapore and beyond. We
develop an approach that takes the position and content of messages sent
within 1 kilometer of selected metro stations and compares this against local
urban planning features to find correlative relationships. Coding to the mes-
sage data to infer activities is applied (Crampton et al., 2013, Tomarchio et
al., 2016). Activity points are then plotted such that insight can be found
through the visualisation.
2.1 Plotting Activities around Interchanges
Nine metro stations in Singapore were selected to be investigated. They are
chosen based on identifying places with different characteristics. Specifi-
cally, (1) City Hall station as the city centre / central interchange, (2) Jurong
East station as the secondary interchange, (3) Pasir Ris station as the station
at the east end of the city, (4) Woodlands station as the station at the north
end of the city, (5) Tiong Bahru station as the station nearing the city centre,
but of an older region, (6) Punggol station as the station at the north east end
of the city, (7) Harbour Front station as the station that is located near the
sea, (8) Orchard station as the station inside an upscale shopping area, and
(9) Paya Lebar station as the smaller interchange at almost the east end of
the city.
For comparison we then chose four other cities in Asia with similarly
large, dense agglomerated transport hubs across the city. Hong Kong is se-
lected due to its density and similarity to Singapore as the financial center
of the region. Osaka (Japan) is chosen as it is a city with long urban history
and comparable size with Singapore. Japan in general also has an unique
urban typology and a huge Twitter user base. Twitter is Japan’s favourite
social network1; there is a huge base of Twitter users in Japan because in
Japanese kanji, one can convey a lot in 140 characters compared to Latin
based alphabet. Additionally, locations closer to South East Asia, Bangkok
(Thailand) and Kuala Lumpur (Malaysia) are chosen. Thailand has a large
base of social media users, accounted to 4.5 million users in 20132.
Metro stations with subjectively similar characteristics with 9 stations in
Singapore are then chosen in the other 4 cities. All of the metro stations’
latitude and longitude are recorded in a comma-separated values (.csv) for-
mat text file. Twitter data was extracted from the DOLLY system, which
records every geotagged tweet sent around the world (Poorthuis and Zook,
2016). We extracted all tweets sent in the year 2015 from within the bound-
aries of each of the five cities and then used the coordinates of metro stations
to further filter the data selecting all Twitter data within a radius of 1 kilo-
Figure 1. Active twitter user in 2015 and the city population
Local facilities can be categorised into ‘Life (residential, educational)’,
‘Work’, ‘Shopping’, ‘Eat’, and ‘Entertainment’ (Lee et al., 2013). Given the
3 Bounding box that are used for each city:
Singapore (lon:[103.605537 TO 104.030571] AND lat:[1.19296 TO 1.470976]),
Hong Kong (lon:[113.832779 TO 114.426041] AND lat:[22.172781 TO 22.519534]),
Bangkok (lon:[100.327814 TO 100.938408] AND lat:[13.494088 TO 13.955111]),
Kuala Lumpur (lon:[101.61545 TO 101.758529] AND lat:[3.033633 TO 3.243379]), and
Osaka (lon:[135.372886 TO 135.599171] AND lat:[34.586147 TO 34.768754]).
4 2010 census was used for all cities except Hong Kong (2011 census was used). Information taken
from Kuala Lumpur city population figure was taken from
‘’ due to subur-
ban area included in the bounding box.
of Tweets
Active Twit-
ter User
City Popula-
tion (2010
Active Twitter
User of the pop-
Kuala Lumpur
type of activities tweeted more often and could be found more commonly
around metro stations, shopping, eating, and entertainment are chosen. In
additional, exercise is separated from entertainment for a clearer definition.
Hence, four types of broad activity are investigated in this paper: (1) food
to record eating related activity, (2) entertainment to record activity related
to watching movies, concerts, music, cultural activities, performance, and
such, (3) shopping to record shopping/retail kind of activity, and (4) exer-
cise to record exercise, sports, or outdoor related activity.
To identify these activities based on the text of the tweet, we curated a
dictionary of keywords from ConceptNet5, a crowdsourced semantic net-
work originally built by MIT Media Lab to help computers understand the
meanings of words that people use. We then used Google Translate6 to get
the keywords in Thai and Japanese language as English are not used pre-
dominantly in Bangkok and Osaka. These keywords are saved in JavaScript
Object Notation (.json) format. Once the keyword(s) from a category is(are)
found in a tweet, the tweet is assigned to that category. In this research, one
tweet can only be assigned to one category.
2.2 Plotting Activities within a Land Plot
To understand the activities of people within a region of interest, we plot the
tweets within that land plot. Given tweet content categorised, it can be
matched against official land use and the overlap percentage could be cal-
culated (Frias-Martinez et al., 2012). In this paper, OpenStreetMap7 is used
as a readily available crowdsource map data as a proxy for land use. Bound-
ing box of a metro station is used and OpenStreetMap formatted data (.osm)
within that bounding box is extracted from the website.
The .osm file downloaded can be read in a plain eXtensible Markup Lan-
guage (.xml). It contains a tag with key and value that shows a certain usage
of a building or land plot. <way> tag shows the set of points that makes up
the polygon plot. Once we have known the points, we can redraw the poly-
gon on the map and check if a tweet is sent from within the bounding poly-
gon. <tag> and its key (k) and value (v) shows us what is the polygon plot
used for. If k is land use, and v is residential, for example, we can know that
the polygon corresponds to a residential plot. In a building scale, for exam-
ple, a boundary plot of a church can be found with the <tag> and k=building
and v=church, and k=amenity and v=place_of_worship.
3. Results
The tweets are plotted in different colours for each category for a quick
overview visually. Tweets that cannot be categorised into any of the four
categories are categorised as others and plotted in blue colour at the back.
Most of the tweets are categorised to Others category. For example, in Sin-
gapore, only about 3% of the tweets are categorisable as food, and even
lower percentage for other categories (entertainment, shopping, and exer-
cise). In Bangkok only about 2% are in the recognisable tweet category.
Latitude and longitude of each tweets are used as the position of the points
on a radial point plot. The centre of the plot is the coordinate of each metro
station. In the figures below, red represents food activity; green represents
entertainment activity; yellow represents shopping activity; and orange rep-
resents exercise/outdoor activity.
Below we show the data for Singapore, Hong Kong and Kuala Lumpur with
Osaka and Bangkok plots shown in subsequent sections.
Figure 2. Singapore Tweet Plot (English)
From the general pattern in figure 1, we can identify several characteris-
tics of Singapore urban life and the individual metro station. In general, out
of 839,297 tweets within 1 km radius of the 9 stations, food is most fre-
quently talked about, amounting to 25,820 out of categorise-able 128,092
tweets. In Punggol, known for its Waterway Park and recreation opportuni-
ties, exercise is the most tweeted category. Tweets remains centralised
around the metro in most of the metro stations, reflecting the transit-oriented
development in many of Singapore’s stations. This is less obvious in Tiong
Bahru and Punggol, again reflecting the underlying urban fabric. Tiong
Bahru is an older neighbourhood, often heralded as Singapore’s prime ex-
ample of a gentrified area. It has many smaller shops and cafes throughout
the area. Punggol has its Waterway Park directly adjacent to the station,
drawing people out and away from only the immediate vicinity of the station
In Orchard station, a clear linear stretch of tweets represents the row of
shopping malls in the Orchard Road. Punggol, Harbour Front, Jurong East,
and Pasir Ris all have areas covered by water, which explains the relative
lack of tweets in those areas.
Figure 3. Hong Kong Tweet Plot (English)
We collected 263,938 tweets around Hong Kong’s metro stations. Owing
to the geography many parts of the city include water (sea) or mountainous
areas. This explains the empty spaces found in many of the plots in figure 2.
While food remains the top tweet category, an exception is found in Wong
Tai Sin station where exercise category tops the tweet counts. Similar to
Pungol in Singapore, this is due to the many parks in Wong Tai Sin. We can
also see a stretch of shopping and eating activity around Mongkok. Yau
Tong, an area at almost the east end of Hong Kong, has the least amount of
tweets. It is not a busy area and also the nearest metro station to the big
Tseung Kwan O cemetery in Hong Kong.
Figure 4. Kuala Lumpur Tweet Plot (English)
Kuala Lumpur has a larger set of tweets, of 4,326,212 tweets. We can
observe that the tweets mostly form a central cluster at the metro station. In
area such as Chan Sow Lin, it is not so obvious as surrounding the metro
station is a college and administrative/office buildings. In Sri Petaling, KL
Sentral, Bandaraya, and Bandar Tasik Selatan there are a nearby parks
around the station (we observe the empty spaces). Instead of parks, empty
spaces around Mid Valley corresponds to residential clusters. While food
category remains the major category, we also observe a number of exercise
related tweets in Kuala Lumpur metro stations.
4. Discussion
4.1 The Tale of Five Cities
We compare the percentage of tweeted activity types around each metro sta-
tion across different cities to see their similarities and differences. In gen-
eral, the food category is the most tweeted activity across metro stations in
the five cities. Higher percentage for entertainment and shopping also seems
to be the major characteristic of Bangkok and Osaka. In the north eastern
metro station, food percentage is typically low and other categories are
higher, except for Kuala Lumpur we noted that we used Asia Jaya station
which is located on south west. Exercise category percentage generally re-
mains low across the stations; except on the north eastern and east end in-
terchange. Especially in upscale shopping metro station, Hong Kong, Sin-
gapore, and Kuala Lumpur had a very similar characteristic.
Figure 5. Comparison of Activity based on Tweets in Different Cities
4.1.1 English or Non-English?
Figure 6. Bangkok Tweet Plot (Thai and English)
In general for Bangkok, we can observe the points are scattered in more
sparse manner compared with the other cities, especially for those further
from the city center (Lat Krabang, Ramkhamhaeng, and Udom Suk), even
though there was a sizable 1,748,272 tweets. We can also observe a distinct
road line rather clearly, perhaps due to the road-alley typology and more
residential area within road block enclosure with alley access. We also do
not see as much activity detected in the central station in Hua Lamphong.
Hua Lamphong is an older train station for intercity connection. More activ-
ity can be detected in Sala Daeng, another interchange in the city centre.
Victory Monument and Siam also show a dense activity, as the former is a
famous tourist attraction, and the later a very popular shopping centre.
Interestingly, if we compare the twitter usage in Thai and English, we find
a strong contrast in what people tweet about (see Fig. 5 above). Thai the
predominant language used by locals (the national language), and perhaps
more tourists or expats tweet in English. Entertainment and Shopping cate-
gory are the top tweets in Thai. Given the tweet results, we can infer, for
example, more locals shop in Siam area, and more tourists shop in Mo Chit
area. Food is the dominant category followed by Exercise/Outdoor related
tweets in English. Walking/jogging tracks in the parks can be clearly iden-
tified in English plot (see Fig. 7).
Figure 7. Tweet Plots of Category ‘Exercise’ formed the shape of running tracks
on the park.
Figure 8. Osaka Tweet Plot (Japanese and English)
General Observation: In Osaka there was 3,174,925 tweets, a high number
especially around the major interchanges, where tweet plots are densely
populated. Not many activities are detected in the Ferry Terminal station, as
mostly it is surrounded by water body. Most tweets remain centralised at the
metro station. We do not see much activity in the center of Hommachi and
Moriguchi station itself however we can see more activity at the surrounding
commercial area.
We could also see the difference between tweets in Japanese and English.
Tweets in Japanese has more content related to Entertainment category,
while in English we see more coming from Food category. Especially in
Osaka, Hommachi, and Namba station, food tweets are in high percentage
in English. After food, exercise/outdoor related activity is most popular
tweets in English.
4.1.2 Keywords: between the most popular keywords and the
wrongly interpreted
Figure 9. Finding improper use of keywords in tweets (shown in grey)
Enjoy eating ..honey mustard chicken half 🍗🍗🍗 at
piri-piri #mk #mystyle #myworld #food #dinner…
set(['chicken', 'eating'])
Home sweet home💕 (@ ลาดกระบัง (Lat Krabang) in
Lat Krabang, Bangkok)
Japanese food for today 😜 🍴 Yummy mixed tem-
pura 🍤 #คือหิวมากพูดเลย @ Kin Donburi Cafe' At
I'm Park…
Kumamonned's cake @ Pancake House Si-
amsquare One
Famous food eat.. 😘🍰🎂 @ Mr. Jones' Orphanage
Siam Center
home sweet home (at @Siam_Paragon
(สยามพารากอน) in Pathum Wan, Bangkok)
Starter... The best eating with salted butter😂 @ The
House on Sathorn
Figure 10. Keywords count for Kuala Lumpur tweets (left) and Hong Kong tweets
'lunch', 7089
'sweet', 5827
'dinner', 5096
'food', 4236
'breakfast', 2654
'eat', 2533
'ice', 2031
'lunch', 830
'dinner', 821
'food', 673
'ice', 337
'breakfast', 330
'sweet', 308
'eat', 298
In curating the keywords, we also noted the possibility of more than one
meaning or context of words usage that might be limitation of the classifi-
cation method used in this paper. For example, the word ‘sweet’ has a high
occurrence in the Food category. It is used to classify food category for the
intention of having ‘sweet taste’ while eating. However, in reality we also
found tweets not related to food instead, such as ‘home sweet home’, denot-
ing place of stay or rest activity. Thus, people either really like sweet food
other usage of the word ‘sweet’ are contributing to the category count.
Figure 11. Tweet categorised as Exercise/Outdoor activity due to fit keyword
(3, 'This #Banking #job might be a great fit for you: Compliance Offi
cer, Private Bank - #Singapore #Hiring #Caree
(3, 'This #HR #job might be a great fit for you: Senior/ Compensation
and Benefits Analyst - #Singapore #Hiring')
4.2 Detecting Land Use
4.2.1 What do people do in the parks?
Figure 12. Tweets within land use of Park category gathered from OpenStreetMap
By using OpenStreetMap (OSM) as a data source, we can detect the lo-
cation of parks by finding land use tag with keywords such as ‘grass’,’green-
field’,’meadow’,’recreation_ground’, and ‘reservoir’. Although OSM is not
100% complete, we could redraw ones tagged in OpenStreetMap and plot
tweets whose geolocation coming from the park. In this manner, we get to
know the activity done on the plot. In Fig. 7 above, majority tweets are not
categoriseable however we also see corresponding tweets to the original pur-
pose of the land use. The orange points represent exercise or outdoor activ-
ity. Upon further checking, red and yellow points food and shopping are
coming from the underground shopping mall below the park. Some of the
green dots (entertainment) plotted represent the F1 concerts on the park.
4.2.2 The residential area
Figure 13. Tiong Bahru tweet points within identifiable land/building plot from
The figure above is a map of Tiong Bahru area, one of the older estates
in Singapore at the heart of Singapore. We plotted residential area in grey,
entertainment area in green, and parks in orange, and commercial in yellow.
Mixed activity can be found in the shopping mall, from shopping to eating,
to entertainment. We can also see exercise activity done in the parks.
In the residential area, there is a mix of eating, shopping, exercise, and
entertainment activity. While we can detect shopping and entertainment ac-
tivity, after further investigation we found that it is mostly keywords found
in the context of sending tweets from home. For example, we see tweet
about bag being tweeted from home, or people selling tickets from home.
Home is also where entertainment activity (for example, playing music) or
when people are going out for an entertainment activity they will tweet
about it from home. Shopping and eating could also happen in the common
space shared usually below or within the housing / residential plot in Singa-
pore, such as Market or Food court (see Fig. 13 below). A new method of
differentiating tweets coming from activity coming from public space and
private space could then be further developedperhaps then by measuring
distance from streets or another public space. Another limitation also is that
currently we couldn’t measure verticality of the origin of the tweets (this is
in relation to tweets from ground level public space, and also underground
shopping mall as in 4.2.1)
Figure 14. Shopping Activity (upper) and Entertainment Activity (lower)
from the Tiong Bahru residential area
(2, "Quite amusing to be sitting at a market with Arsenal legend
Ian Wright and Everton's Graham Stuart.\xe2\x80\xa6")
(2, "I like this store name. It's just so classic #beocrescent
#singapore #market @ Beo Cresent Market
(2, 'Tote bag day \n\n#whistlebeesg #handsinframe #vsco #vscocam
#vscophile #totebag #exploresingapore\xe2\x80\xa6')
(2, 'For Sale. Ticket FMFA 2015 for 2 days pass. Get your spe-
cial price here with me. Put your email here\xe2\x80\xa6')
(1, '@PinasMusicZone please play stuck by Darren Espanto, thanks')
(1, 'Headed to a school in the east tomorrow morning to play a show
with @thelioncityboy as part of the\xe2\x80\xa6')
We could also use Twitter data to detect emerging eating place previ-
ously not recorded in OpenStreetMap. In the figure above, it is obvious that
there is a cluster of red dots on the north eastern part above Jalan Bukit Ho
Swee. Openstreetmap doesn’t record this, but upon further investigation, we
find a foodcourt and Siam eating place. Many of the new eating places are
recorded at the outer part of the housing block (see Fig. 15). This is phe-
nomena of gentrification, especially in Tiong Bahru part of Singapore where
it is more obvious that more upscale cafés are emerging in the residential
area to suit the taste of the middle-class.
Figure 15. Finding new eating place in Tiong Bahru
Screenshot from OpenStreetMap (above) and Google Maps (below)
Figure 16. Cafes as result of gentrification at the outer part of residential block
5. Conclusion and Future Work
In this paper, we looked at tweet activities around metro stations in five
cities in Asia: Singapore, Bangkok, Kuala Lumpur, Hong Kong and Osaka.
The study is helpful to understand activities around metro station as the ag-
glomeration of crowds, to see a distinct character in each of metro station
by using a set of keywords. By comparatively looking at distinct metro sta-
tions in a city, one gets insight on how to do overall planning for whole city
better. One could also pin-point a metro station that would then need a spe-
cific attention. By comparing metro stations across different cities, a city
planner could observe a typical character from one city and possibly to adapt
it within the planner’s city. For example, food (eating) remains a major ac-
tivity across all stations in most cities and we could also see more exercises
done around the metro station at the outer part of the city center. Upscales
shopping centers across cities has also shown a distinct stretch of dense
tweets representing mainly shopping and eating along the main road.
Secondly, we also demonstrated cases to infer the change in usage of an
urban plot by utilizing twitter data and categorizing them, comparing the
result against the original urban plan. Implementation with Openstreetmap
is used in this research but given any map with boundary shape of a building,
a change in usage, or activity detection can be done.
Further study is also currently undertaking with plans to improve the dic-
tionary for keywords to better improve the detection of activity. To differ-
entiate double meaning of a word, for example, a machine learning method
could be used.
The characteristics of urban typology surrounding the metro could also
be quantified for an easier comparison of the different regions. A more struc-
tured and concise map of buildings and their usage will also be used to see
the trends at a bigger scale.
Cha, T. W., & Graduate School of Design. (2001). Project on the city. 2.
Harvard Design School guide to shopping. C. J. Chung (Ed.). Taschen.
Frampton, A., Solomon, J. D., & Wong, C. (2012). Cities without ground:
a Hong Kong guidebook. Oro Editions.
Williams, S. A., Terras, M. M., & Warwick, C. (2013). What do people
study when they study Twitter? Classifying Twitter related academic pa-
pers. Journal of Documentation, 69(3), 384-410.
Crampton, J. W., Graham, M., Poorthuis, A., Shelton, T., Stephens, M.,
Wilson, M. W., & Zook, M. (2013). Beyond the geotag: situating ‘big
data’and leveraging the potential of the geoweb. Cartography and geo-
graphic information science, 40(2), 130-139.
Tomarchio, L., Tuncer, B., You L., & Klein, B. (2016). Mapping Planned
and Emerging Art Places in Singapore through Social Media Feeds. Com-
plexity & Simplicity Proceedings of the 34th eCAADe ConferenceVol-
ume 2, 437-446.
Pak, A. & Paroubek, P., (2010), May. Twitter as a Corpus for Sentiment
Analysis and Opinion Mining. In LREc (Vol. 10, No. 2010).
Bollen, J., Mao, H. & Zeng, X., (2011). Twitter mood predicts the stock
market. Journal of computational science, 2(1), pp.1-8.
Tumasjan, A., Sprenger, T.O., Sandner, P.G. & Welpe, I.M., (2010). Pre-
dicting elections with twitter: What 140 characters reveal about political
senti-ment. ICWSM, 10(1), pp.178-185.
Paul, M.J., Dredze, M. & Broniatowski, D., (2014). Twitter improves in-
flu-enza forecasting. PLOS Currents Outbreaks.
Whyte, W.H., 1980. The social life of small urban spaces.
Lee, R., Wakamiya, S., & Sumiya, K. (2013). Urban area characterization
based on crowd behavioral lifelogs over Twitter. Personal and ubiquitous
computing, 17(4), pp.605-620.
Frias-Martinez, V., Soto, V., Hohwald, H., & Frias-Martinez, E. (2012),
September. Characterizing urban landscapes using geolocated tweets. In
Privacy, Security, Risk and Trust (PASSAT), 2012 International Confer-
ence on and 2012 International Confernece on Social Computing (Social-
Com) (pp. 239-248). IEEE.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
This paper presents a methodology to collect and visualize social media data about art, in order to map art locations in cities using geo-localized data, and comparing planning decisions with the actual use of spaces. As various social networks have penetrated into the daily life of people, these become one important and effective data source to understand how people perform 'arts' around the city [Shah, 2015]. The case study for this methodology is Singapore, a vibrant city where art and culture are being promoted in the light of an emerging creative economy. The Singapore government promotes art and creates 'art clusters', such as art districts, galleries, fairs and museums in the city. Additionally, artists, creative entrepreneurs, consumers, and critics seek and explore alternative spaces. Understanding where art and creativity are discussed, broadcasted and consumed in Singapore is a key point to have better insights into art space planning, and study its effects on the city.The paper will try to answer the following research question:Is it possible to discover, through social network data, spaces where art is produced, discussed, and broadcasted to an audience in Singapore? How?
Full-text available
Accurate disease forecasts are imperative when preparing for influenza epidemic outbreaks; nevertheless, these forecasts are often limited by the time required to collect new, accurate data. In this paper, we show that data from the microblogging community Twitter significantly improves influenza forecasting. Most prior influenza forecast models are tested against historical influenza-like illness (ILI) data from the U.S. Centers for Disease Control and Prevention (CDC). These data are released with a one-week lag and are often initially inaccurate until the CDC revises them weeks later. Since previous studies utilize the final, revised data in evaluation, their evaluations do not properly determine the effectiveness of forecasting. Our experiments using ILI data available at the time of the forecast show that models incorporating data derived from Twitter can reduce forecasting error by 17-30% over a baseline that only uses historical data. For a given level of accuracy, using Twitter data produces forecasts that are two to four weeks ahead of baseline models. Additionally, we find that models using Twitter data are, on average, better predictors of influenza prevalence than are models using data from Google Flu Trends, the leading web data source.
Full-text available
Purpose Since its introduction in 2006, messages posted to the microblogging system Twitter have provided a rich dataset for researchers, leading to the publication of over a thousand academic papers. This paper aims to identify this published work and to classify it in order to understand Twitter based research. Design/methodology/approach Firstly the papers on Twitter were identified. Secondly, following a review of the literature, a classification of the dimensions of microblogging research was established. Thirdly, papers were qualitatively classified using open coded content analysis, based on the paper's title and abstract, in order to analyze method, subject, and approach. Findings The majority of published work relating to Twitter concentrates on aspects of the messages sent and details of the users. A variety of methodological approaches is used across a range of identified domains. Research limitations/implications This work reviewed the abstracts of all papers available via database search on the term “Twitter” and this has two major implications: the full papers are not considered and so works may be misclassified if their abstract is not clear; publications not indexed by the databases, such as book chapters, are not included. The study is focussed on microblogging, the applicability of the approach to other media is not considered. Originality/value To date there has not been an overarching study to look at the methods and purpose of those using Twitter as a research subject. The paper's major contribution is to scope out papers published on Twitter until the close of 2011. The classification derived here will provide a framework within which researchers studying Twitter related topics will be able to position and ground their work.
Conference Paper
Full-text available
The pervasiveness of cell phones and mobile social media applications is generating vast amounts of geolocalized user-generated content. Since the addition of geotagging information, Twitter has become a valuable source for the study of human dynamics. Its analysis is shedding new light not only on understanding human behavior but also on modeling the way people live and interact in their urban environments. In this paper, we evaluate the use of geolocated tweets as a complementary source of information for urban planning applications. Our contributions are focussed in two urban planing areas: (1) a technique to automatically determine land uses in a specific urban area based on tweeting patterns, and (2) a technique to automatically identify urban points of interest as places with high activity of tweets. We apply our techniques in Manhattan (NYC) using 49 days of geolocated tweets and validate them using land use and landmark information provided by various NYC departments. Our results indicate that geolocated tweets are a powerful and dynamic data source to characterize urban environments.
Full-text available
This article presents an overview and initial results of a geoweb analysis designed to provide the foundation for a continued discussion of the potential impacts of ‘big data’ for the practice of critical human geography. While Haklay's (2012) observation that social media content is generated by a small number of ‘outliers’ is correct, we explore alternative methods and conceptual frameworks that might allow for one to overcome the limitations of previous analyses of user-generated geographic information. Though more illustrative than explanatory, the results of our analysis suggest a cautious approach toward the use of the geoweb and big data that are as mindful of their shortcomings as their potential. More specifically, we propose five extensions to the typical practice of mapping georeferenced data that we call going ‘beyond the geotag’: (1) going beyond social media that is explicitly geographic; (2) going beyond spatialities of the ‘here and now’; (3) going beyond the proximate; (4) going beyond the human to data produced by bots and automated systems, and (5) going beyond the geoweb itself, by leveraging these sources against ancillary data, such as news reports and census data. We see these extensions of existing methodologies as providing the potential for overcoming existing limitations on the analysis of the geoweb. The principal case study focuses on the widely reported riots following the University of Kentucky men's basketball team's victory in the 2012 NCAA championship and its manifestation within the geoweb. Drawing upon a database of archived Twitter activity – including all geotagged tweets since December 2011–we analyze the geography of tweets that used a specific hashtag (#LexingtonPoliceScanner) in order to demonstrate the potential application of our methodological and conceptual program. By tracking the social, spatial, and temporal diffusion of this hashtag, we show how large databases of such spatially referenced internet content can be used in a more systematic way for critical social and spatial analysis.
Conference Paper
Full-text available
Microblogging today has become a very popular communication tool among Internet users. Millions of users share opinions on different aspects of life everyday. Therefore microblogging web-sites are rich sources of data for opinion mining and sentiment analysis. Because microblogging has appeared relatively recently, there are a few research works that were devoted to this topic. In our paper, we focus on using Twitter, the most popular microblogging platform, for the task of sentiment analysis. We show how to automatically collect a corpus for sentiment analysis and opinion mining purposes. We perform linguistic analysis of the collected corpus and explain discovered phenomena. Using the corpus, we build a sentiment classifier , that is able to determine positive, negative and neutral se ntiments for a document. Experimental evaluations show that our proposed techniques are efficient and performs better than previousl y proposed methods. In our research, we worked with English, however, the proposed technique can be used with any other language.
Recent location-based social networking sites are attractively providing us with a novel capability of monitoring massive crowd lifelogs in the real-world space. In particular, they make it easier to collect publicly shared crowd lifelogs in a large scale of geographic area reflecting the crowd’s daily lives and even more characterizing urban space through what they have in minds and how they behave in the space. In this paper, we challenge to analyze urban characteristics in terms of crowd behavior by utilizing crowd lifelogs in urban area over the social networking sites. In order to collect crowd behavioral data, we exploit the most famous microblogging site, Twitter, where a great deal of geo-tagged micro lifelogs emitted by massive crowds can be easily acquired. We first present a model to deal with crowds’ behavioral logs on the social network sites as a representing feature of urban space’s characteristics, which will be used to conduct crowd-based urban characterization. Based on this crowd behavioral feature, we will extract significant crowd behavioral patterns in a period of time. In the experiment, we conducted the urban characterization by extracting the crowd behavioral patterns and examined the relation between the regions of common crowd activity patterns and the major categories of local facilities.
Conference Paper
Twitter is a microblogging website where users read and write millions of short messages on a variety of topics every day. This study uses the context of the German federal election to investigate whether Twitter is used as a forum for political deliberation and whether online messages on Twitter validly mirror offline political sentiment. Using LIWC text analysis software, we conducted a contentanalysis of over 100,000 messages containing a reference to either a political party or a politician. Our results show that Twitter is indeed used extensively for political deliberation. We find that the mere number of messages mentioning a party reflects the election result. Moreover, joint mentions of two parties are in line with real world political ties and coalitions. An analysis of the tweets' political sentiment demonstrates close correspondence to the parties' and politicians' political positions indicating that the content of Twitter messages plausibly reflects the offline political landscape. We discuss the use of microblogging message content as a valid indicator of political sentiment and derive suggestions for further research. Copyright © 2010, Association for the Advancement of Artificial Intelligence ( All rights reserved.
Behavioral economics tells us that emotions can profoundly affect individual behavior and decision-making. Does this also apply to societies at large, i.e., can societies experience mood states that affect their collective decision making? By extension is the public mood correlated or even predictive of economic indicators? Here we investigate whether measurements of collective mood states derived from large-scale Twitter feeds are correlated to the value of the Dow Jones Industrial Average (DJIA) over time. We analyze the text content of daily Twitter feeds by two mood tracking tools, namely OpinionFinder that measures positive vs. negative mood and Google-Profile of Mood States (GPOMS) that measures mood in terms of 6 dimensions (Calm, Alert, Sure, Vital, Kind, and Happy). We cross-validate the resulting mood time series by comparing their ability to detect the public's response to the presidential election and Thanksgiving day in 2008. A Granger causality analysis and a Self-Organizing Fuzzy Neural Network are then used to investigate the hypothesis that public mood states, as measured by the OpinionFinder and GPOMS mood time series, are predictive of changes in DJIA closing values. Our results indicate that the accuracy of DJIA predictions can be significantly improved by the inclusion of specific public mood dimensions but not others. We find an accuracy of 87.6% in predicting the daily up and down changes in the closing values of the DJIA and a reduction of the Mean Average Percentage Error by more than 6%.