Connecting the dots – deriving insights into
social behaviour around metro transport nodes
through social media data in Singapore and
Verina Cristie*, Ate Poorthuis*, and Sam Conrad Joyce*
There is increasing interest in understanding the use of urban space around
heavily trafficked areas like transport interchanges; both in terms of their
cultural social commercial use (Cha, 2001) but also in the context of ‘last
mile’ journeys. However, this type of use is often not captured in traditional
census, survey or transportation data. This research focuses on the use of
social media data to develop insight into how spaces around public trans-
portation stations are used in a comparative study of metro stations in several
Asian cities. It focuses on finding similarities and differences between such
spaces, including how different master planning and key urban amenities
affect use. Specifically, this research aids in devising strategies to promote
wider use and interconnectedness of area around stations and to understand
the impact of large dense agglomerated building typologies, which are be-
coming a feature especially in Asian cities (Frampton et al., 2012).
*Singapore University of Technology and Design, 487372, Singapore
Email (corresponding): email@example.com
The collection of social media data represents a powerful insight into the
activities of people which would usually be opaque to researchers. As social
media usage has increased in popularity and grown throughout the world it
is becoming a tool to capture a significant proportion of a populations sen-
timent (Pak & Paroubek 2010). Indeed many of the services generate their
main revenue by leveraging their ability to use this data to help decision
makers, have been shown to be able to be successfully applied to marketing
products and services, finance (Bollen et al 2011) and even elections
(Tumasjan et al 2010).
Services such as Twitter offer even richer information as they also in
many cases record the user’s profiles age, gender, nationality but also the
location of each message giving a unique insight into the thoughts and ac-
tivities of different type of people or people in specific areas. Spatialized
data has been shown to have applications such as giving unique views on
people for detecting illnesses break outs faster than conventional methods
(Paul et al 2014) so that better preventative actions can be taken.
Whilst the examples covered refer to insight relevant to border policy
level decisions for government, companies or marketers; there is potential
to apply such methods to provide based on a more localized insight into a
neighborhood’s users activities. This could have significant to urban design-
ers and architects who look into how space is utilized. Conventional ap-
proaches to recording such activity whilst valuable is typically very expen-
sive and time consuming (Whyte 1980). Methods such as computer vision
offer automated ways to record urban use however this is not very scalable
over time or distance as it requires cameras to record every place, but also it
is currently difficult to work out population activities automatically en
masse in this way.
Social media however, offers messages with data potentially represent-
ing self-tagged activities or opinions by location and time. Whilst this infor-
mation is limited by it being provided by a set of self-selected individuals it
none the less provides are unique window into a socio-geographic area. One
which can be pointed to anywhere in the world with significant media users
and one which can be processed digitally to compare different places.
In this paper we look at the applicability of this as a data source to inform
planning around metro stations. We look at this in Asia specifically as Asian
cities are both those which are growing rapidly and modernizing building
new metro lines. The Asian context is especially interesting as there tends
to be much more concentration of heavily and intentionally programmed
space such as malls or business centers allowing for more focused analysis
of the relationship between zoning and actual activities being undertaken.
Here we look to see if we can correlate the urban configuration and the ac-
tivities and see if we can compare across different cities and stations. Inves-
tigating how insight in one place might be understood and applied else-
A range of different approaches have been used in studies of social media
platform data before: focusing on message content, user, technology, and
concept (Williams et al., 2013). This paper applies geotagged messages sent
through the social media platform Twitter to gain insights into the social use
of urban spaces around transportation nodes in Singapore and beyond. We
develop an approach that takes the position and content of messages sent
within 1 kilometer of selected metro stations and compares this against local
urban planning features to find correlative relationships. Coding to the mes-
sage data to infer activities is applied (Crampton et al., 2013, Tomarchio et
al., 2016). Activity points are then plotted such that insight can be found
through the visualisation.
2.1 Plotting Activities around Interchanges
Nine metro stations in Singapore were selected to be investigated. They are
chosen based on identifying places with different characteristics. Specifi-
cally, (1) City Hall station as the city centre / central interchange, (2) Jurong
East station as the secondary interchange, (3) Pasir Ris station as the station
at the east end of the city, (4) Woodlands station as the station at the north
end of the city, (5) Tiong Bahru station as the station nearing the city centre,
but of an older region, (6) Punggol station as the station at the north east end
of the city, (7) Harbour Front station as the station that is located near the
sea, (8) Orchard station as the station inside an upscale shopping area, and
(9) Paya Lebar station as the smaller interchange at almost the east end of
For comparison we then chose four other cities in Asia with similarly
large, dense agglomerated transport hubs across the city. Hong Kong is se-
lected due to its density and similarity to Singapore as the financial center
of the region. Osaka (Japan) is chosen as it is a city with long urban history
and comparable size with Singapore. Japan in general also has an unique
urban typology and a huge Twitter user base. Twitter is Japan’s favourite
social network1; there is a huge base of Twitter users in Japan because in
Japanese kanji, one can convey a lot in 140 characters compared to Latin
based alphabet. Additionally, locations closer to South East Asia, Bangkok
(Thailand) and Kuala Lumpur (Malaysia) are chosen. Thailand has a large
base of social media users, accounted to 4.5 million users in 20132.
Metro stations with subjectively similar characteristics with 9 stations in
Singapore are then chosen in the other 4 cities. All of the metro stations’
latitude and longitude are recorded in a comma-separated values (.csv) for-
mat text file. Twitter data was extracted from the DOLLY system, which
records every geotagged tweet sent around the world (Poorthuis and Zook,
2016). We extracted all tweets sent in the year 2015 from within the bound-
aries of each of the five cities and then used the coordinates of metro stations
to further filter the data selecting all Twitter data within a radius of 1 kilo-
Figure 1. Active twitter user in 2015 and the city population
Local facilities can be categorised into ‘Life (residential, educational)’,
‘Work’, ‘Shopping’, ‘Eat’, and ‘Entertainment’ (Lee et al., 2013). Given the
3 Bounding box that are used for each city:
Singapore (lon:[103.605537 TO 104.030571] AND lat:[1.19296 TO 1.470976]),
Hong Kong (lon:[113.832779 TO 114.426041] AND lat:[22.172781 TO 22.519534]),
Bangkok (lon:[100.327814 TO 100.938408] AND lat:[13.494088 TO 13.955111]),
Kuala Lumpur (lon:[101.61545 TO 101.758529] AND lat:[3.033633 TO 3.243379]), and
Osaka (lon:[135.372886 TO 135.599171] AND lat:[34.586147 TO 34.768754]).
4 2010 census was used for all cities except Hong Kong (2011 census was used). Information taken
from https://www.citypopulation.de/. Kuala Lumpur city population figure was taken from
‘http://www.newgeography.com/content/003395-the-evolving-urban-form-kuala-lumpur’ due to subur-
ban area included in the bounding box.
User of the pop-
type of activities tweeted more often and could be found more commonly
around metro stations, shopping, eating, and entertainment are chosen. In
additional, exercise is separated from entertainment for a clearer definition.
Hence, four types of broad activity are investigated in this paper: (1) food –
to record eating related activity, (2) entertainment – to record activity related
to watching movies, concerts, music, cultural activities, performance, and
such, (3) shopping – to record shopping/retail kind of activity, and (4) exer-
cise – to record exercise, sports, or outdoor related activity.
To identify these activities based on the text of the tweet, we curated a
dictionary of keywords from ConceptNet5, a crowdsourced semantic net-
work originally built by MIT Media Lab to help computers understand the
meanings of words that people use. We then used Google Translate6 to get
the keywords in Thai and Japanese language as English are not used pre-
Object Notation (.json) format. Once the keyword(s) from a category is(are)
found in a tweet, the tweet is assigned to that category. In this research, one
tweet can only be assigned to one category.
2.2 Plotting Activities within a Land Plot
To understand the activities of people within a region of interest, we plot the
tweets within that land plot. Given tweet content categorised, it can be
matched against official land use and the overlap percentage could be cal-
culated (Frias-Martinez et al., 2012). In this paper, OpenStreetMap7 is used
as a readily available crowdsource map data as a proxy for land use. Bound-
ing box of a metro station is used and OpenStreetMap formatted data (.osm)
within that bounding box is extracted from the website.
The .osm file downloaded can be read in a plain eXtensible Markup Lan-
guage (.xml). It contains a tag with key and value that shows a certain usage
of a building or land plot. <way> tag shows the set of points that makes up
the polygon plot. Once we have known the points, we can redraw the poly-
gon on the map and check if a tweet is sent from within the bounding poly-
gon. <tag> and its key (k) and value (v) shows us what is the polygon plot
used for. If k is land use, and v is residential, for example, we can know that
the polygon corresponds to a residential plot. In a building scale, for exam-
ple, a boundary plot of a church can be found with the <tag> and k=building
and v=church, and k=amenity and v=place_of_worship.
The tweets are plotted in different colours for each category for a quick
overview visually. Tweets that cannot be categorised into any of the four
categories are categorised as others and plotted in blue colour at the back.
Most of the tweets are categorised to Others category. For example, in Sin-
gapore, only about 3% of the tweets are categorisable as food, and even
lower percentage for other categories (entertainment, shopping, and exer-
cise). In Bangkok only about 2% are in the recognisable tweet category.
Latitude and longitude of each tweets are used as the position of the points
on a radial point plot. The centre of the plot is the coordinate of each metro
station. In the figures below, red represents food activity; green represents
entertainment activity; yellow represents shopping activity; and orange rep-
resents exercise/outdoor activity.
Below we show the data for Singapore, Hong Kong and Kuala Lumpur with
Osaka and Bangkok plots shown in subsequent sections.
Figure 2. Singapore Tweet Plot (English)
From the general pattern in figure 1, we can identify several characteris-
tics of Singapore urban life and the individual metro station. In general, out
of 839,297 tweets within 1 km radius of the 9 stations, food is most fre-
quently talked about, amounting to 25,820 out of categorise-able 128,092
tweets. In Punggol, known for its Waterway Park and recreation opportuni-
ties, exercise is the most tweeted category. Tweets remains centralised
around the metro in most of the metro stations, reflecting the transit-oriented
development in many of Singapore’s stations. This is less obvious in Tiong
Bahru and Punggol, again reflecting the underlying urban fabric. Tiong
Bahru is an older neighbourhood, often heralded as Singapore’s prime ex-
ample of a gentrified area. It has many smaller shops and cafes throughout
the area. Punggol has its Waterway Park directly adjacent to the station,
drawing people out and away from only the immediate vicinity of the station
In Orchard station, a clear linear stretch of tweets represents the row of
shopping malls in the Orchard Road. Punggol, Harbour Front, Jurong East,
and Pasir Ris all have areas covered by water, which explains the relative
lack of tweets in those areas.
Figure 3. Hong Kong Tweet Plot (English)
We collected 263,938 tweets around Hong Kong’s metro stations. Owing
to the geography many parts of the city include water (sea) or mountainous
areas. This explains the empty spaces found in many of the plots in figure 2.
While food remains the top tweet category, an exception is found in Wong
Tai Sin station where exercise category tops the tweet counts. Similar to
Pungol in Singapore, this is due to the many parks in Wong Tai Sin. We can
also see a stretch of shopping and eating activity around Mongkok. Yau
Tong, an area at almost the east end of Hong Kong, has the least amount of
tweets. It is not a busy area and also the nearest metro station to the big
Tseung Kwan O cemetery in Hong Kong.
Figure 4. Kuala Lumpur Tweet Plot (English)
Kuala Lumpur has a larger set of tweets, of 4,326,212 tweets. We can
observe that the tweets mostly form a central cluster at the metro station. In
area such as Chan Sow Lin, it is not so obvious as surrounding the metro
station is a college and administrative/office buildings. In Sri Petaling, KL
Sentral, Bandaraya, and Bandar Tasik Selatan there are a nearby parks
around the station (we observe the empty spaces). Instead of parks, empty
spaces around Mid Valley corresponds to residential clusters. While food
category remains the major category, we also observe a number of exercise
related tweets in Kuala Lumpur metro stations.
4.1 The Tale of Five Cities
We compare the percentage of tweeted activity types around each metro sta-
tion across different cities to see their similarities and differences. In gen-
eral, the food category is the most tweeted activity across metro stations in
the five cities. Higher percentage for entertainment and shopping also seems
to be the major characteristic of Bangkok and Osaka. In the north eastern
metro station, food percentage is typically low and other categories are
higher, except for Kuala Lumpur – we noted that we used Asia Jaya station
which is located on south west. Exercise category percentage generally re-
mains low across the stations; except on the north eastern and east end in-
terchange. Especially in upscale shopping metro station, Hong Kong, Sin-
gapore, and Kuala Lumpur had a very similar characteristic.
Figure 5. Comparison of Activity based on Tweets in Different Cities
4.1.1 English or Non-English?
Figure 6. Bangkok Tweet Plot (Thai and English)
In general for Bangkok, we can observe the points are scattered in more
sparse manner compared with the other cities, especially for those further
from the city center (Lat Krabang, Ramkhamhaeng, and Udom Suk), even
though there was a sizable 1,748,272 tweets. We can also observe a distinct
road line rather clearly, perhaps due to the road-alley typology and more
residential area within road block enclosure with alley access. We also do
not see as much activity detected in the central station in Hua Lamphong.
Hua Lamphong is an older train station for intercity connection. More activ-
ity can be detected in Sala Daeng, another interchange in the city centre.
Victory Monument and Siam also show a dense activity, as the former is a
famous tourist attraction, and the later a very popular shopping centre.
Interestingly, if we compare the twitter usage in Thai and English, we find
a strong contrast in what people tweet about (see Fig. 5 above). Thai the
predominant language used by locals (the national language), and perhaps
more tourists or expats tweet in English. Entertainment and Shopping cate-
gory are the top tweets in Thai. Given the tweet results, we can infer, for
example, more locals shop in Siam area, and more tourists shop in Mo Chit
area. Food is the dominant category followed by Exercise/Outdoor related
tweets in English. Walking/jogging tracks in the parks can be clearly iden-
tified in English plot (see Fig. 7).
Figure 7. Tweet Plots of Category ‘Exercise’ formed the shape of running tracks
on the park.
Figure 8. Osaka Tweet Plot (Japanese and English)
General Observation: In Osaka there was 3,174,925 tweets, a high number
especially around the major interchanges, where tweet plots are densely
populated. Not many activities are detected in the Ferry Terminal station, as
mostly it is surrounded by water body. Most tweets remain centralised at the
metro station. We do not see much activity in the center of Hommachi and
Moriguchi station itself however we can see more activity at the surrounding
We could also see the difference between tweets in Japanese and English.
Tweets in Japanese has more content related to Entertainment category,
while in English we see more coming from Food category. Especially in
Osaka, Hommachi, and Namba station, food tweets are in high percentage
in English. After food, exercise/outdoor related activity is most popular
tweets in English.
4.1.2 Keywords: between the most popular keywords and the
Figure 9. Finding improper use of keywords in tweets (shown in grey)
Enjoy eating ..honey mustard chicken half 🍗🍗🍗 at
piri-piri #mk #mystyle #myworld #food #dinner…
Home sweet home💕 (@ ลาดกระบัง (Lat Krabang) in
Lat Krabang, Bangkok) https://t.co/MPfU0V2AVf
Japanese food for today 😜 🍴 Yummy mixed tem-
pura 🍤 #คือหิวมากพูดเลย @ Kin Donburi Cafe' At
I'm Park… http://t.co/OEHIQ74TNX
Kumamonned's cake @ Pancake House Si-
amsquare One https://t.co/F0uBDG4P6X
Famous food eat.. 😘🍰🎂 @ Mr. Jones' Orphanage
Siam Center http://t.co/ZziVyUZuIG
home sweet home (at @Siam_Paragon
(สยามพารากอน) in Pathum Wan, Bangkok)
Starter... The best eating with salted butter😂 @ The
House on Sathorn https://t.co/AKbJVPt4y4
Figure 10. Keywords count for Kuala Lumpur tweets (left) and Hong Kong tweets
In curating the keywords, we also noted the possibility of more than one
meaning or context of words usage that might be limitation of the classifi-
cation method used in this paper. For example, the word ‘sweet’ has a high
occurrence in the Food category. It is used to classify food category for the
intention of having ‘sweet taste’ while eating. However, in reality we also
found tweets not related to food instead, such as ‘home sweet home’, denot-
ing place of stay or rest activity. Thus, people either really like sweet food
other usage of the word ‘sweet’ are contributing to the category count.
Figure 11. Tweet categorised as Exercise/Outdoor activity due to fit keyword
(3, 'This #Banking #job might be a great fit for you: Compliance Offi
cer, Private Bank - https://t.co/1AzRQ1mHcx #Singapore #Hiring #Caree
(3, 'This #HR #job might be a great fit for you: Senior/ Compensation
and Benefits Analyst - https://t.co/5GLuxvcrs6 #Singapore #Hiring')
4.2 Detecting Land Use
4.2.1 What do people do in the parks?
Figure 12. Tweets within land use of Park category gathered from OpenStreetMap
By using OpenStreetMap (OSM) as a data source, we can detect the lo-
cation of parks by finding land use tag with keywords such as ‘grass’,’green-
field’,’meadow’,’recreation_ground’, and ‘reservoir’. Although OSM is not
100% complete, we could redraw ones tagged in OpenStreetMap and plot
tweets whose geolocation coming from the park. In this manner, we get to
know the activity done on the plot. In Fig. 7 above, majority tweets are not
categoriseable however we also see corresponding tweets to the original pur-
pose of the land use. The orange points represent exercise or outdoor activ-
ity. Upon further checking, red and yellow points – food and shopping are
coming from the underground shopping mall below the park. Some of the
green dots (entertainment) plotted represent the F1 concerts on the park.
4.2.2 The residential area
Figure 13. Tiong Bahru tweet points within identifiable land/building plot from
The figure above is a map of Tiong Bahru area, one of the older estates
in Singapore at the heart of Singapore. We plotted residential area in grey,
entertainment area in green, and parks in orange, and commercial in yellow.
Mixed activity can be found in the shopping mall, from shopping to eating,
to entertainment. We can also see exercise activity done in the parks.
In the residential area, there is a mix of eating, shopping, exercise, and
entertainment activity. While we can detect shopping and entertainment ac-
tivity, after further investigation we found that it is mostly keywords found
in the context of sending tweets from home. For example, we see tweet
about bag being tweeted from home, or people selling tickets from home.
Home is also where entertainment activity (for example, playing music) or
when people are going out for an entertainment activity – they will tweet
about it from home. Shopping and eating could also happen in the common
space shared usually below or within the housing / residential plot in Singa-
pore, such as Market or Food court (see Fig. 13 below). A new method of
differentiating tweets coming from activity coming from public space and
private space could then be further developed – perhaps then by measuring
distance from streets or another public space. Another limitation also is that
currently we couldn’t measure verticality of the origin of the tweets (this is
in relation to tweets from ground level public space, and also underground
shopping mall as in 4.2.1)
Figure 14. Shopping Activity (upper) and Entertainment Activity (lower)
from the Tiong Bahru residential area
(2, "Quite amusing to be sitting at a market with Arsenal legend
Ian Wright and Everton's Graham Stuart.\xe2\x80\xa6
(2, "I like this store name. It's just so classic #beocrescent
#singapore #market @ Beo Cresent Market https://t.co/uSYYNU-
(2, 'Tote bag day \n\n#whistlebeesg #handsinframe #vsco #vscocam
#vscophile #totebag #exploresingapore\xe2\x80\xa6
(2, 'For Sale. Ticket FMFA 2015 for 2 days pass. Get your spe-
cial price here with me. Put your email here\xe2\x80\xa6
(1, '@PinasMusicZone please play stuck by Darren Espanto, thanks')
(1, 'Headed to a school in the east tomorrow morning to play a show
with @thelioncityboy as part of the\xe2\x80\xa6
We could also use Twitter data to detect emerging eating place previ-
ously not recorded in OpenStreetMap. In the figure above, it is obvious that
there is a cluster of red dots on the north eastern part above Jalan Bukit Ho
Swee. Openstreetmap doesn’t record this, but upon further investigation, we
find a foodcourt and Siam eating place. Many of the new eating places are
recorded at the outer part of the housing block (see Fig. 15). This is phe-
nomena of gentrification, especially in Tiong Bahru part of Singapore where
it is more obvious that more upscale cafés are emerging in the residential
area to suit the taste of the middle-class.
Figure 15. Finding new eating place in Tiong Bahru
Screenshot from OpenStreetMap (above) and Google Maps (below)
Figure 16. Cafes as result of gentrification at the outer part of residential block
5. Conclusion and Future Work
In this paper, we looked at tweet activities around metro stations in five
cities in Asia: Singapore, Bangkok, Kuala Lumpur, Hong Kong and Osaka.
The study is helpful to understand activities around metro station as the ag-
glomeration of crowds, to see a distinct character in each of metro station
by using a set of keywords. By comparatively looking at distinct metro sta-
tions in a city, one gets insight on how to do overall planning for whole city
better. One could also pin-point a metro station that would then need a spe-
cific attention. By comparing metro stations across different cities, a city
planner could observe a typical character from one city and possibly to adapt
it within the planner’s city. For example, food (eating) remains a major ac-
tivity across all stations in most cities and we could also see more exercises
done around the metro station at the outer part of the city center. Upscales
shopping centers across cities has also shown a distinct stretch of dense
tweets representing mainly shopping and eating along the main road.
Secondly, we also demonstrated cases to infer the change in usage of an
urban plot by utilizing twitter data and categorizing them, comparing the
result against the original urban plan. Implementation with Openstreetmap
is used in this research but given any map with boundary shape of a building,
a change in usage, or activity detection can be done.
Further study is also currently undertaking with plans to improve the dic-
tionary for keywords to better improve the detection of activity. To differ-
entiate double meaning of a word, for example, a machine learning method
could be used.
The characteristics of urban typology surrounding the metro could also
be quantified for an easier comparison of the different regions. A more struc-
tured and concise map of buildings and their usage will also be used to see
the trends at a bigger scale.
Cha, T. W., & Graduate School of Design. (2001). Project on the city. 2.
Harvard Design School guide to shopping. C. J. Chung (Ed.). Taschen.
Frampton, A., Solomon, J. D., & Wong, C. (2012). Cities without ground:
a Hong Kong guidebook. Oro Editions.
Williams, S. A., Terras, M. M., & Warwick, C. (2013). What do people
study when they study Twitter? Classifying Twitter related academic pa-
pers. Journal of Documentation, 69(3), 384-410.
Crampton, J. W., Graham, M., Poorthuis, A., Shelton, T., Stephens, M.,
Wilson, M. W., & Zook, M. (2013). Beyond the geotag: situating ‘big
data’and leveraging the potential of the geoweb. Cartography and geo-
graphic information science, 40(2), 130-139.
Tomarchio, L., Tuncer, B., You L., & Klein, B. (2016). Mapping Planned
and Emerging Art Places in Singapore through Social Media Feeds. Com-
plexity & Simplicity – Proceedings of the 34th eCAADe Conference – Vol-
ume 2, 437-446.
Pak, A. & Paroubek, P., (2010), May. Twitter as a Corpus for Sentiment
Analysis and Opinion Mining. In LREc (Vol. 10, No. 2010).
Bollen, J., Mao, H. & Zeng, X., (2011). Twitter mood predicts the stock
market. Journal of computational science, 2(1), pp.1-8.
Tumasjan, A., Sprenger, T.O., Sandner, P.G. & Welpe, I.M., (2010). Pre-
dicting elections with twitter: What 140 characters reveal about political
senti-ment. ICWSM, 10(1), pp.178-185.
Paul, M.J., Dredze, M. & Broniatowski, D., (2014). Twitter improves in-
flu-enza forecasting. PLOS Currents Outbreaks.
Whyte, W.H., 1980. The social life of small urban spaces.
Lee, R., Wakamiya, S., & Sumiya, K. (2013). Urban area characterization
based on crowd behavioral lifelogs over Twitter. Personal and ubiquitous
computing, 17(4), pp.605-620.
Frias-Martinez, V., Soto, V., Hohwald, H., & Frias-Martinez, E. (2012),
September. Characterizing urban landscapes using geolocated tweets. In
Privacy, Security, Risk and Trust (PASSAT), 2012 International Confer-
ence on and 2012 International Confernece on Social Computing (Social-
Com) (pp. 239-248). IEEE.