Conference PaperPDF Available

Understanding Happiness in Cities using Twitter: Jobs, Children, and Transport


Abstract and Figures

The demographics and landscape of cities are changing rapidly, and there is an emphasis to better understand the factors which influence citizen happiness in order to design smarter urban systems. Few studies have attempted to understand how large-scale sentiment maps to urban human geography. Inferring sentiment from social media data is one such scalable solution. In this paper, we apply natural language processing (NLP) techniques to 0.4 million geo-tagged Tweets in the Greater London area to understand the influence of socioeconomic and urban geography parameters on happiness. Our results not only verify established thinking: that job opportunities correlate with positive sentiments; but also reveal two insights: (1) happiness is negatively correlated with number of children, and (2) happiness has a U-shaped (parabolic) relationship with access to public transportation. The latter implies that the happiest people are those who have good access to public transport, or such poor access that they use private transportation. The number of jobs, children, and transportation availability are every day facets of urban living and individually account for up to 47% of the variations in people's happiness. Our results show that they influence happiness more significantly than long term socioeconomic parameters such as degradation, education, income, housing, and crime. This study will enable urban planners and system designers to move beyond the traditional cost-benefit methodology and to incorporate citizens' happiness.
Content may be subject to copyright.
Understanding Happiness in Cities using
Twitter: Jobs, Children, and Transport
Weisi Guo1,2*, Neha Gupta1,4 , Ganna Pogrebna1,3, Stephen Jarvis1,4
Abstract—The demographics and landscape of cities are
changing rapidly, and there is an emphasis to better
understand the factors which influence citizen happiness in
order to design smarter urban systems. Few studies have
attempted to understand how large-scale sentiment maps to
urban human geography. Inferring sentiment from social
media data is one such scalable solution. In this paper,
we apply natural language processing (NLP) techniques to
0.4 million geo-tagged Tweets in the Greater London area
to understand the influence of socioeconomic and urban
geography parameters on happiness. Our results not only
verify established thinking: that job opportunities correlate
with positive sentiments; but also reveal two insights: (1)
happiness is negatively correlated with number of children,
and (2) happiness has a U-shaped (parabolic) relationship
with access to public transportation. The latter implies that
the happiest people are those who have good access to
public transport, or such poor access that they use private
The number of jobs, children, and transportation avail-
ability are every day facets of urban living and individually
account for up to 47% of the variations in people’s happi-
ness. Our results show that they influence happiness more
significantly than long term socioeconomic parameters such
as degradation, education, income, housing, and crime. This
study will enable urban planners and system designers to
move beyond the traditional cost-benefit methodology and
to incorporate citizens’ happiness.
Index Terms—happiness; social media data; sentiment;
For the longest part of our existence, human beings
have primarily lived in rural environments. This close
proximity to nature has fashioned both our social and
biological evolution. It is only in the last 200 years (a few
generations) that the number of people living in cities has
risen from 3% to over 50% of the global population.
In the past 50 years and within a single generation,
1Warwick Institute for the Science of Cities (WISC), University
of Warwick, UK. 2School of Engineering, University of Warwick,
UK. 3Warwick Manufacturing Group, University of Warwick, UK.
4Department of Computer Science, University of Warwick, UK. *Cor-
responding Author: Funding Acknowl-
edgement: EPSRC Centre for Doctoral Training in Urban Science
and Progress - EP/L016400/1, and ESRC Centre for Competitive
Advantage in the Global Economy (CAGE).
there has been a 6-fold increase in the number of
large metropolitan areas [1]. The urbanisation trend has
presented new economic and technological opportunities
to humanity, but it has also created a set of urban
development challenges related to health and happiness.
Existing studies have shown that both the benefits and
challenges of cities scale super-linearly with the city’s
size [2], [3], and the growing global urban population
certainly exasperates the hidden competition between
urban improvements and decay. The question, of how
the high density of opportunities (i.e., jobs) and urban
threats (i.e., crime and pollution) affect our happiness
has become more pertinent than ever.
Surveying citizen happiness is an important area of
research [4]–[6]. Qualitatively, the pursuit of happiness
are cornerstone philosophies in the governance theory
propelled by many ancient cultures. In modern his-
tory, quantitative measures such as the Gross National
Happiness (GNH) gained traction after 2005. Given
the subjective nature of happiness, it is typically mea-
sured through self-reported surveys that are validated
and normalised against more objective metrics that are
widely accepted as ones that support positive sentiment
(i.e., income and lifespan). Existing research projects
have pursued both qualitative and empirical experiments
to understand the sentiment of urban spaces [4]. In-
deed, census data is extensively used by governments
to create well-being scores (an example can be found
for London1). However, the data from survey based
methods are limited in their resolution (spatial-temporal).
Alternative sentiment data collection methods employ
wearable monitoring systems such as electro-dermal-
activity sensors [7]. These systems will yield precise
longitudinal data with high spatial-temporal accuracy.
However, their expensive nature means that scaling to
the general public and establishing pervasive and non-
intrusive sensing remains challenging.
The proliferation of online social interactions has in
recent years provided an opportunity to study sentiment
(a) (b)
Fig. 1. Mapping the Sentiment in London: (a) 0.4 million geo-tagged Tweets in Greater London over a 2-weeks period. (b) Tweets labelled
as negative (red triangle), positive (green diamond), or neutral (pale circle) on a scale of 11. (c) Ward level sentiment where dark red indicates
negative sentiment and dark blue indicates positive sentiment.
of urban dwellers (residents, workers, tourists, etc.).
Social media platforms, such as Twitter, have achieved
significant penetration (25% of adult population in the
UK), and usage (over 500 million messages per day
worldwide). Whilst detecting sentiment using social me-
dia data as a proxy incurs bias, it does offer attractive
benefits in scalability and there is growing research to
validate and benchmark the sentiment labels. Topic based
sentiment analysis utilizing natural language processing
(NLP) has been extensively exploited in business in-
telligence [8]. There has also been a growing body of
work in applying similar methodologies to examining
the sentiment of citizens in urban spaces. In terms
of similar research, there have been numerous studies
conducted on detecting emotions from Twitter data [9]
and creating mood heat maps of city locations [10] as
well as comparing between cities [5], [6]. Social media
data has the added benefit of not only uncovering real-
time high spatial-temporal resolution meta-data, but also
reaching across urban demographics to include residents,
workers, and tourists.
Despite the growing abundance in urban related senti-
ment studies through social media data, as far as we
are aware, very few research outputs have attempted
to understand how large-scale sentiment data (obtained
from social media) maps to urban socioeconomic and
infrastructure features. As such, without such a mapping,
we are no closer to understanding the underlying causes
of happiness. Furthermore, without understanding how
human beings feel about their urban environment, urban
planners are limited to planning services using traditional
costbenefit analysis using economic indicators and can-
not consider accurately the consequential effects it has
on citizen sentiment [11]. This study, as far as we are
aware of, is the first attempt to map and correlate large-
scale sentiment data to urban geography features, and
consequently attempt to understand the main sources of
happiness in the city landscape.
A. The Data
The data used in this paper comes from two sources:
(1) 0.4 million geo-tagged social media data purchased
from Twitter, covering a 2 weeks period (see Fig. 1a),
and (2) UK government ward-level socioeconomic and
urban geographical data (open access) from the London
Data Store2. In terms of spatial resolution, the analysis
in this paper will focus on Greater London, which is
made up of 628 wards, and are roughly analogous to a
neighbourhood. Many services are delegated to the ward
level, including policing; and a range of census statistics
are available at the ward level. The ward level census
data considers 64 key metrics, including demographics,
education, housing, and business statistics.
This paper’s focus on using Twitter data (aggregated
from all urban dwellers) as a proxy and comparing it to
census data (mainly registered residential and business
data) means that we are concerned with how all people
in London (including residents, workers, tourists) feel as
a function of the urban geography and its socioeconomic
parameters. It is extremely challenging to understand
what distribution of the social media data belongs to
which demographic, and in this paper we treat all data
as equally important (uniform weighting) and do not
consider demographic categories within the sentiment
B. Sentiment Labelling using NLP
In this paper we employ unigram (i.e., keyword) based
sentiment analysis. Whilst state-of-the-art methods often
include classifying entire sentences using machine learn-
ing (e.g. Maximum Entropy, Support Vector Machine),
it can be challenging to scale such methods accurately
to reflect the diversity and veracity in millions of Twitter
users over a large urban area. Therefore, as a first
approach, we apply established unigrams to find the
polarity of the tweets, and measure a general happiness
averaged over a small area (i.e., a ward). This technique
was successfully implemented in previous research to
analyse sentiment [12], but has not been applied to
urban contexts to understand the underlying sources of
To assign each tweet with a sentiment score we first
apply Tokenization filtering to remove language noise
and transform all text to a common lower case format
with no punctuations. We then extract single words or
features (unigrams) independently to determine the ori-
entation of the tweet. Researchers in opinion mining have
focused on trying to find suitable lexicon for classifying
tweets sentiments by annotating tweets for negative or
positive polarity (henceforth happiness) by recognising
words as positive and negative sentiment. We apply the
opinion lexicon [13] (full list is approximately 6800
words 3) to each tweet. Our algorithm calculates the
score of each tweet by simply subtracting the number
of occurrences of negative words from the number of
positive occurrences for each tweet. An example of the
sentiment labelled Tweets is shown in Fig. 1b, and
clustered to ward level in Fig. 1c. An interesting trend
can be observed: that the happy wards (blue) are either
in the centre or on the outer edges of Greater London,
and the unhappy wards (red) are in the middle. We will
analyse this in greater detail in Section III-D.
C. Metrics for Comparison
In order to conduct cross-dataset comparisons, the
coefficient of determination, denoted R2is a number
that indicates how well the statistical regression model
fits the data or in other words: the percentage of variance
in the data that can be explained by the proposed model.
For a data vector y= [y1, y2, ...yK](with mean y) and a
predicted data vector using the regression model ˆy, the
residue vector is defined as e=yˆy. The coefficient
of determination R2is defined as:
3 liub/FBS/sentiment-analysis.html
High Sentiment per Tweet
Low Sentiment per Tweet
Fig. 2. Sentiment Data Analysis: (a) Ward level aggregate sentiment
can accurately explain 96% of the variance in individual sentiments.
(b) People who tweet more also express stronger aggregate sentiments,
but on average express a lower sentiment per tweet.
where the numerator is the residual sum of squares and
the denominator is the total sum of squares. In this paper,
we use the adjusted R2= 1 (1 R2)K1
KP1to take
discount against extra variables Pin the model.
A. Baseline Sentiment Data
We first present baseline sentiment data results, to gain
a better understanding of the sentiment data of individual
people, their tweets, and the averaged sentiment of a
ward. In order to understand the representativeness of
ward-level sentiment relative to individual sentiments in
the ward, we plot the average sentiment per person (in
the ward) against the aggregate sentiment in the ward in
Fig. 2a. The results show that a simple linear regression
Fig. 3. Relating Avg. Sentiment per Person to Jobs Opportunities
in London: (a) The number of jobs available in a ward is positively
correlated with the sentiment in the ward (adjusted R2= 0.45). (b)
The number of jobs opportunities (jobs normalised against working
population) in a ward is positively correlated with the sentiment in the
ward (adjusted R2= 0.47).
with gradient 1 can relate the ward level sentiment with
the average individual sentiment. The regression can
accurately explain 96% of the variance in the ward’s
individual sentiments. The outlier result (Harefield ward
in Hillingdon borough) shows that a large discrepancy
(negative bias) between individual sentiments and the
ward average. This is due to a few people tweeting a
high number of negative sentiments. It is also of interest
to understand the relationship between the number of
tweets and aggregate sentiment of tweets. The results
in Fig. 2bshows that people who tweet more also
express stronger aggregate sentiments (absolute value:
either positive or negative), but on average express a
lower sentiment per tweet.
The paper will now focus on 3 key areas that were
identified through a correlation panel analysis (see Fig. 6
in Appendix): (1) Employment Opportunities, (2) Chil-
dren and Fertility Rate, and (3) Accessibility to Public
Transport. In particular, these are areas which affect
urban lives on a daily/monthly basis and as such have
a direct impact on the sentiment (see Table I in Ap-
pendix). It is worth mentioning that for the results to
be presented below, given the census data lists over
60 urban geography features that can potentially affect
happiness, obtaining a coefficient of determination for
a single feature that accounts for 33 to 47% of the
variations in sentiment is a significant result.
B. Employment Opportunities
The two main attributes in employment opportunity
measured by the census data are: (i) Number of jobs
in a ward (data from businesses) and (ii) Number of
jobs normalised against the number of people in the
working age (16-64) in a ward. Both sets of employment
data are highly positively correlated with each other,
as well with other crime and ambulance incident data
(see Fig. 6 in Appendix). This reinforces the notion
that increased opportunities often lead to an increase
in the challenges [2], [3]. In terms of how employment
relates to online sentiment, Fig. 3ashows the number of
jobs available in a ward is positively correlated with the
sentiment in the ward (adjusted R2= 0.45). Similarly,
Fig. 3bshows that the number of jobs normalised
against working population is positively correlated with
the sentiment in the ward (adjusted R2= 0.47). The
adjusted R2= 0.45 0.47 indicates that the regressions
(which both use quadratic functions, P= 2) explains
for almost 50% of the variance in sentiment variations,
and the remaining variations are due to other factors.
In other words, this shows that the availability of jobs
determines a significant 50% of the expressed sentiment.
Yet, the sentiment is correlated with the number of jobs
available and not with the number of employed people
(see Fig. 6 in Appendix). This seems to indicate that
the existence of businesses in close proximity promotes
positive sentiments.
C. Number of Children
The main attributes in measuring the distribution of
children in census data is the number and percentage
of children (aged 0-15) in a ward. This percentage is
negatively correlated with sentiment, as well with other
data such as the general fertility rate (see Fig. 6 in
Appendix). Fig. 4ashows the percentage of population
that are children in a ward is negatively correlated with
Fig. 4. Relating Avg. Sentiment per Person to Number of Children
and Access to Public Transport in London: (a) The percentage of
population that are children in a ward is negatively correlated with
the sentiment in the ward (adjusted R2= 0.33). (b) The accessibility
to public transport in a ward has a parabolic relationship with the
sentiment in the ward (adjusted R2= 0.44), such that those with
good access to public transport are happy and those who are in areas
with poor public transport are also happy (rely on personal transport),
whilst those that are in between are generally less happy.
the sentiment in the ward (adjusted R2= 0.33,P= 3).
This shows that the percentage of children determines
a significant 33% of the expressed sentiment. More
specifically, it shows that there is a steep decline in
sentiment from 5% to 15%, and the relationship saturates
thereafter. It is worth noting that the percentage of
children does not correlate with other socioeconomic
factors such as the deprivation level in the ward, but
is negatively correlated with the employment level in
the ward. Without inferring causality, the data supports
our previous finding that increased job availability leads
Access to Public Transport
Number of Private Owned Vehicles per Household
Public Transport Access vs. No. of Private Vehicles
R2 = 0.71
Fig. 5. Public Transport Access vs. Number of Private Vehicles.
Those with poor public transport access levels (PTALs) own up to 4x
more private vehicles per household, and the PTALs explains 71% of
the variance in car ownership numbers.
to higher sentiment and a decrease in the percentage of
children. We suspect that the wider applicability of this
result will depend on the family cultural context.
D. Accessibility to Public Transport
The main attributes in measuring public transport
availability in census data is the Public Transport Ac-
cessibility Levels (PTAL). It is a detailed and accurate
measure of the accessibility of a point to the public
transport network, taking into account walk access time
and service availability. The method is essentially a way
of measuring the density of the public transport network
at any location within Greater London. The measure
reflects 4 main attributes: (1) walking time to transport
access point, (2) reliability of services, (3) number of
services, and (4) the average waiting time. It does not
consider the speed or utility of the service, crowding
effects, and ease or efficiency of interchange. The PTAL
methodology was developed for London where a dense
integrated public transport network means that nearly all
destinations can be reached within a reasonable amount
of time. Research using the ATOS (Access to Oppor-
tunities and Services) methodology shows that there is
a strong correlation between PTALs and the time taken
to reach key services i.e., high PTAL areas generally
have good access to services and low PTAL areas have
poor access to services. Each area is graded between 0
and 6b, where a score of 0 is very poor access to public
transport, and 6b is excellent access to public transport.
Fig. 4bThe accessibility to public transport in a
ward has a U-shaped (parabolic) relationship with the
sentiment in the ward (adjusted R2= 0.44,P= 4),
such that those with good access to public transport
are happy and those who are in areas with poor public
transport are also happy (possibly because they rely on
personal transportation means), whilst those that are in-
between are generally unhappy. Certainly the results in
Fig. 5 seem to strongly support this hypothesis. The
PTAL values explain for 71% of the variance in the
number of private vehicles per household, showing that
those with poor public transport access own up to 4
times more private vehicles per household. Therefore,
the availability of public transport explains 44% of the
variance in sentiment scores. The wider applicability
of this result beyond London is difficult to determine.
Yet, we speculate that economies with a high number
of privately owned vehicles will exhibit similar patterns,
i.e., people are happy when they are either close to public
transport or far removed, and struggle when they are in-
between the choices.
The demographics and landscape of cities are chang-
ing rapidly, and there is an emphasis to better understand
the factors which influence citizen happiness in order
to design smart urban systems. In this paper, we apply
natural language processing to 0.4 million geo-tagged
tweets in the Greater London area to understand the un-
derlying socioeconomic and urban geography parameters
that influence happiness. Our results not only verify es-
tablished thinking: that job opportunities explain 45-47%
of the sentiment variations, but also reveal two additional
insights: (1) happiness is negatively correlated with the
number of children (accounts for 33% of sentiment
variations) and (2) happiness has a U-shaped (parabolic)
relationship with access to public transportation (44% of
variations). The latter implies that happy people are those
who have good access to public transport, or such poor
access that they drive (4 times more cars than those who
have the best access). The unhappy people are those that
rely on, but do not have strong access to public transport.
The number of jobs and children, as well as accessibility
to public transport are every day facets of urban living
(see Table I in Appendix) and individually explain up
to 47% of the variations in happiness. Our results show
that they influence happiness more significantly than
more ambient parameters such as degradation, education
quality, and crime.
The wider applicability of these results beyond Lon-
don depends on the context. We expect that the availabil-
ity of jobs is widely applicable across cultures, whereas
the number of children will depend on the culture and the
availability to public transport will depend on the own-
ership level of personal vehicles as well as the culture
of transport usage. Future work will focus on creating
proprietary sentiment labels for each city by combining
meta-data for boosting sentiment analysis accuracy [14].
This will enable large-scale cross-country/city compar-
isons to be made [4].
The general study of how sentiment is linked to urban
features and socioeconomic parameters is useful for
urban planners and urban system designers. The results
will allow decision makers to move beyond planning
services using traditional costbenefit analyses, and en-
able them to consider the consequences on citizens’
happiness. Further research on understanding how these
patterns change with different cities and cultures is
of interest, as well as how more reliable methods of
labelling sentiment to social media data can be applied.
The authors would like to acknowledge the
EPSRC Doctoral Training Centre (EP/L016400/1),
RCUK/EPSRC Grant (EPL023911/1), and the Centre
for Competitive Advantage in the Global Economy
(CAGE) at the University of Warwick.
A linear regression of sentiment vs. ward level so-
cioeconomic and infrastructure metrics is shown in
Fig. 6. The linear regression does not uncover more
complex parabolic relationships such as those found
between sentiment and accessibility to public trans-
portation. Nonetheless it serves as an overview of the
first order relationship between all 67 parameters. A
categorized table of census data is given in Table I, with
challenges that affect citizens daily, yearly, or long-term
Daily / Monthly Annual Long Term
Pop. Density Open Space
No. Children Ethnic Diversity Fertility
Rent/Buy, Tax Housing Types
Jobs Income/Benefits Deprivation
Public Transport Cars
Obesity Life Expectancy
Fig. 6. Linear Regression Matrix of Sentiment vs. Ward Level Socioeconomic and Infrastructure Metrics. Sentiment correlations are
[1] “World Urbanization Prospects,” United Nations, Technical Re-
port, 2014.
[2] L. Bettencourt, J. Lobo, D. Helbing, C. Kuhnert, and G. West,
“Growth, innovation, scaling, and the pace of life in cities,
Proceedings of the National Academy of Sciences (PNAS), vol.
104, 2007.
[3] L. Bettencourt, “The Origins of Scaling in Cities,Science, vol.
340, 2013.
[4] H. Engelbrecht, “Natural capital, subjective well-being, and the
new welfare economics of sustainability: Some evidence from
cross-country regressions,” Ecological Economics, 2009.
[5] L. Mitchell, M. Frank, K. Harris, P. Dodds, and C. Danforth,
“The Geography of Happiness: Connecting Twitter Sentiment
and Expression, Demographics, and Objective Characteristics of
Place,” PLOS ONE, vol. 8, 2013.
[6] M. Frank, L. Mitchell, P. Dodds, and C. Danforth, “Happiness
and the Patterns of Life: A Study of Geolocated Tweets,Scien-
tific Reports, vol. 3, 2013.
[7] E. Kanjo and A. Chamberlain, “Emotions in context: examining
pervasive affective sensing systems, applications, and analyses,”
Personal and Ubiquitous Computing, 2015.
[8] E. Qualman, Socialnomics: How Social Media Transforms the
Way We Live and Do Business. New York, USA: Wiley, 2010.
[9] R. Mitchell and F. Popham, “Greenspace, urbanity and health:
Relationships in England,” Journal of Epidemiology and Com-
munity Health, 2007.
[10] T. Lansdall-Welfare, V. Lampos, and N. Cristianini, “Nowcasting
the mood of the nation,” Significance, vol. 9, 2012.
[11] A. Duarte, C. Garcia, G. Giannarakis, S. Limao, A. Poly-
doropoulou, and N. Litinas, “New approaches in transportation
planning: happiness and transport economics,” Economic Re-
search and Electronic Networking, vol. 10, 2010.
[12] J. Fiaidhi, O. Mohammed, S. Mohammed, S. Fong, and T. H.
Kim, “Opinion mining over twitterspace: Classifying tweets
programmatically using the R approach,” in ACM Int. Conf. Digit.
Inf. Manag. (ICDIM 2012), 2012.
[13] M. Hu, B. Liu, and S. M. Street, “Mining and Summarizing
Customer Reviews,” in ACM SIGKDD Conference on Knowledge
Discovery and Data Mining, 2004.
[14] F. Brave-Marquez, M. Mendoza, and B. Poblete, “Combining
strengths, emotions and polarities for boosting Twitter sentiment
analysis,” in ACM Proceedings on Issues of Sentiment Discovery
and Opinion Mining, 2013.
... From those 19 articles, 15 were published in conference proceedings and only four were published in scientific journals [9][10][11][12]. After the respective analyses, these articles were divided into healthy lifestyles promotion [11,12,13] and the following types of surveillance: i) accidents [14]; ii) environmental conditions [15,16]; iii) electromagnetic radiation [9]; iv) health conditions of older adults [17,18]; v) emotions [10,[19][20][21][22][23]; vi) epidemics [24,25]; vii) fitness activities [26]; and viii) food quality [27]. ...
... Regarding the surveillance of emotions, six articles were explored [10,[19][20][21][22][23]. Guthier et al. [19] provide an overview of the relevant affective states, showing how they can be detected individually and then aggregated into a global model of affect. ...
... Guo et al. [21] attempted to map and correlate large-scale sentiment data to urban geography features, and consequently endeavored to understand the main sources of happiness in the city landscape. The data used from two sources: 0.4 million geo-tagged social media data purchased from Twitter, covering a two weeks period, and United Kingdom government ward-level socioeconomic and urban geographical data (open access) from the London Data Store. ...
Full-text available
Objectives - The study reported in this article aimed to identify: i) the most relevant smart cities’ applications with impact in public health; ii) the types of technologies being used; and iii) the maturity level of the applications being reported. Methods - A systematic review was performed based on a search of the literature. Results - A total of 19 articles were retrieved. The articles report applications to support surveillance of populations and environmental conditions and to promote healthy lifestyles. Conclusion - Although relevant arguments were made regarding the importance of smart cities’ infrastructures to support public health, most of the articles report applications in an early stage of development.
... With the progression of sentiment analysis techniques, NLPbased research has been employed to comprehend public sentiment more effectively, a crucial aspect of smart city management (Guo et al., 2016;Serna et al., 2017;Wang and Taylor, 2019). This research area has primarily concentrated on enhancing the accuracy and scalability of sentiment analysis techniques, including developing deep learning-based models for more precise classification of the sentiment (Ghahramani et al., 2021;Song et al., 2021;Dutt et al., 2023). ...
Full-text available
This study presents an advanced review of policy and governance research in the context of smart cities and artificial intelligence (AI). With cities playing a crucial role in achieving the United Nations Sustainable Development Goals, it is vital to understand the opportunities and challenges that arise from the applications of smart technologies and AI in promoting urban sustainability. Using the Latent Dirichlet Allocation (LDA) method based on a three-layer Bayesian algorithm model, we conducted a systematic review of approximately 3700 papers from Scopus. Our analysis revealed prominent topics such as “service transformation,” “community participation,” and “sustainable development goals.” We also identified emerging concerns, including “open user data,” “ethics and risk management,” and “data privacy management.” These findings provide valuable insights into the current progress and frontiers of policy and governance research in the field, informing future research directions and decision-making processes.
... Other studies focused on identifying user sentiment during or after a particular event (such as a pandemic [11,12] or terrorism event [13]) or analyzed user sentiment at different geographical areas with the aim of investigating the mobility patterns of people within a city [14]. In a number of these studies, data about the sentiment of the tweets themselves were correlated with various metrics affecting urban living (such as job opportunities and access to public transportation) to provide insight into the effect these measures have on the happiness of the population [15]. In a more applied setting, researchers have also incorporated sentiment analysis data from geotagged social media as part of a larger system or predictive model to recommend safe walking routes [16] or improve the accuracy of crime prediction [17]. ...
Full-text available
The proliferation of Social Media and Open Web data has provided researchers with a unique opportunity to better understand human behavior at different levels. In this paper, we show how data from Open Street Map and Twitter could be analyzed and used to portray detailed Human Emotions at a city wide level in two cities, San Francisco and London. Neural Network classifiers for fine-grained emotions were developed, tested and used to detect emotions from tweets in the two cites. The detected emotions were then matched to key locations extracted from Open Street Map. Through an analysis of the resulting data set, we highlight the effect different days, locations and POI neighborhoods have on the expression of human emotions in the cities.
... A variety of existing methods already exist to monitor cities and its citizens, ranging from geo-tagged social media and Google search data to gauge happiness and public health response [6]- [10], mobile apps to measure public activity [11], and Internet-of-Things sensors to monitor critical infrastructures [12]. What is lacking is a remote sensing approach to real-time estimate the social and economic damage from a variety of natural and man-made disasters. ...
Conference Paper
Humanitarian disasters and political violence cause significant damage to our living space. The reparation cost to homes, infrastructure, and the ecosystem is often difficult to quantify in real-time. Real-time quantification is critical to both informing relief operations, but also planning ahead for rebuilding. Here, we use satellite images before and after major crisis around the world to train a robust baseline Residual Network (ResNet) and a disaster quantification Pyramid Scene Parsing Network (PSPNet). ResNet offers robustness to poor image quality and can identify areas of destruction with high accuracy (92%), whereas PSPNet offers contextualised quantification of built environment damage with good accuracy (84%). As there are multiple damage dimensions to consider (e.g. economic loss and fatalities), we fit a multi-linear regression model to quantify the overall damage. To validate our combined system of deep learning and regression modeling, we successfully match our prediction to the ongoing recovery in the 2020 Beirut port explosion. These innovations provide a better quantification of overall disaster magnitude and inform intelligent humanitarian systems of unfolding disasters.
... La red que ha suscitado tradicionalmente más interés es Twitter (Durahim & Coşkun, 2015;Guo, Gupta, Pogrebna, & Jarvis, 2016;Nguyen et al., 2016), aunque el poder visual de la imagen está elevando el protagonismo de Instagram como escenario para la medición de la felicidad a través de plataformas interactivas (Pittman & Reich, 2016). En este sentido, un estudio de la Royal Society for Public Health (2017) señala a Instagram como la red social más perjudicial para la salud mental y el bienestar de los jóvenes. ...
Full-text available
El uso de emoticonos supone un recurso expresivo de gran utilidad para los usuarios de las redes sociales digitales, debido a su capacidad para transmitir ideas y conceptos de forma visual e inmediata. Este estudio explora la posibilidad de medir la felicidad a través de los emojis empleados por los usuarios de Instagram. Dado que esta aplicación permite la geolocalización de sus publicaciones, proponemos una taxonomía basada en la clasificación de Novak et al. (2015) que aplicamos a las seis ciudades más pobladas de España, como son: Madrid, Barcelona, Valencia, Sevilla, Zaragoza y Málaga. Fueron identificadas, desde el 10 hasta el 21 de diciembre de 2017, un total de 15234 publicaciones para su posterior análisis y que contenían, al menos, uno de los emojis seleccionados para integrar nuestra ficha de codificación, compuesta por variables como el tipo de publicación (fotografía o vídeo) o su número de “likes” y comentarios. Asimismo, se generó un “Índice de Felicidad en Instagram” (IFI) a partir de la asignación ponderada de valores numéricos a cada emoji, lo que nos permitió calcular el nivel de felicidad expresado mediante esta red social en cada una de las seis ciudades, que manifestaron, a su vez, diferencias estadísticamente significativas.
... Similarly, Frank et al. (2013) applied the same assessment tool to examine the relationship between happiness and the patterns of life in the US. To explore the relationship between sentiment and socio-economic parameters, Guo et al. (2016) conducted unigram-based sentiment analysis with geotagged tweets for different socio-demographic groups and found that the number of jobs, children, and transportation availability can well explain the sentiment variations. However, the content from social media is not just text but also emojis used to express users' emotions. ...
Full-text available
The penetration of devices integrated with location-based services and internet services has generated massive data about the everyday life of citizens and tracked their activities happening in cities. Crowdsourced data, such as social media data, POIs data and collaborative websites, generated by the crowd, has become fine-grained proxy data of urban activity and widely used in research in urban studies. However, due to the heterogeneity of data types of crowdsourced data and the limitation of previous studies mainly focusing on a specific application, a systematic review of crowdsourced data mining for urban activity is still lacking. In order to fill the gap, this paper conducts a literature search in the Web of Science database, selecting 226 highly related papers published between 2013 and 2019. Based on those papers, the review firstly conducts a bibliometric analysis identifying underpinning domains, pivot scholars and papers around this topic. The review also synthesises previous research into three parts: main applications of different data sources and data fusion; application of spatial analysis in mobility patterns, functional areas and event detection; application of socio-demographic and perception analysis in city attractiveness, demographic characteristics and sentiment analysis. The challenges of this type of data are also discussed in the end. This study provides a systematic and current review for both researchers and practitioners interested in the applications of crowdsourced data mining for urban activity.
The IoT, which is referred to as the “new industrial revolution,” has revolutionized the interactions between the governments and their surrounding world with the virtual world and technology because of the change in life, work, entertainment, and travel. IoT applications have influenced every person’s life and have changed human applications more conveniently. Most well-known IoT applications are health care systems, transportation, military, structural monitoring, and disaster recovery. These applications have bounded to each other, and the human intervention is reduced in all aspects in a smart city. Decision-making is done with the aid of machine/deep learning algorithms. The quality of life is improved due to the smart city applications and the great improvement that is achieved in deep learning. In this chapter, we aim to review some smart applications in this regard. The innovations, challenges, and upcoming changes in smart city applications attract many researchers and it is still a hot topic. We discuss some challenges in smart city applications and their influence on everyday life.
Sentiment analysis also called opinion mining, and it studies opinions of people towards products and services. Opinions are very important as the organizations always want to know the public opinions about their products and services. People give their opinions via social media. With the advent of social media like Twitter, Facebook, blogs, forums, etc. sentiment analysis has become important in every field like automobile, medical, film, fashion, stock market, mobile phones, insurance, etc. Analyzing the opinions and predicting the opinion is called sentiment analysis. Sentiment analysis is done using opinion words by classification methods or by sentiment lexicons. This chapter compares different methods of solving sentiment analysis problem, algorithms, its merits and demerits, applications, and also investigates different research problems in sentiment analysis.
Full-text available
Link here: Pervasive sensing has opened up new opportunities for measuring our feelings and understanding our behavior by monitoring our affective states while mobile. This review paper surveys pervasive affect sensing by examining and considering three major elements of affective pervasive systems, namely; “sensing”, “analysis”, and “application”. Sensing investigates the different sensing modalities that are used in existing real-time affective applications, Analysis explores different approaches to emotion recognition and visualization based on different types of collected data, and Application investigates different leading areas of affective applications. For each of the three aspects, the paper includes an extensive survey of the literature and finally outlines some of challenges and future research opportunities of affective sensing in the context of pervasive computing. Keyword: Mobile Sensing, Affective Computing, Affect Sensing, Emotion Recognition, Context, Pervasive Computing, Ubiquitous Computing, Mobile, Applications
Full-text available
The patterns of life exhibited by large populations have been described and modeled both as a basic science exercise and for a range of applied goals such as reducing automotive congestion, improving disaster response, and even predicting the location of individuals. However, these studies have had limited access to conversation content, rendering changes in expression as a function of movement invisible. In addition, they typically use the communication between a mobile phone and its nearest antenna tower to infer position, limiting the spatial resolution of the data to the geographical region serviced by each cellphone tower. We use a collection of 37 million geolocated tweets to characterize the movement patterns of 180,000 individuals, taking advantage of several orders of magnitude of increased spatial accuracy relative to previous work. Employing the recently developed sentiment analysis instrument known as the hedonometer, we characterize changes in word usage as a function of movement, and find that expressed happiness increases logarithmically with distance from an individual's average location.
Conference Paper
Full-text available
Twitter sentiment analysis or the task of automatically retrieving opinions from tweets has received an increasing interest from the web mining community. This is due to its importance in a wide range of fields such as business and politics. People express sentiments about specific topics or entities with different strengths and intensities, where these sentiments are strongly related to their personal feelings and emotions. A number of methods and lexical resources have been proposed to analyze sentiment from natural language texts, addressing different opinion dimensions. In this article, we propose an approach for boosting Twitter sentiment classification using different sentiment dimensions as meta-level features. We combine aspects such as opinion strength, emotion and polarity indicators, generated by existing sentiment analysis methods and resources. Our research shows that the combination of sentiment dimensions provides significant improvement in Twitter sentiment classification tasks such as polarity and subjectivity.
Full-text available
Despite the increasing importance of cities in human societies, our ability to understand them scientifically and manage them in practice has remained limited. The greatest difficulties to any scientific approach to cities have resulted from their many interdependent facets, as social, economic, infrastructural, and spatial complex systems that exist in similar but changing forms over a huge range of scales. Here, I show how all cities may evolve according to a small set of basic principles that operate locally. A theoretical framework was developed to predict the average social, spatial, and infrastructural properties of cities as a set of scaling relations that apply to all urban systems. Confirmation of these predictions was observed for thousands of cities worldwide, from many urban systems at different levels of development. Measures of urban efficiency, capturing the balance between socioeconomic outputs and infrastructural costs, were shown to be independent of city size and might be a useful means to evaluate urban planning strategies.
Full-text available
We conduct a detailed investigation of correlations between real-time expressions of individuals made across the United States and a wide range of emotional, geographic, demographic, and health characteristics. We do so by combining (1) a massive, geo-tagged data set comprising over 80 million words generated in 2011 on the social network service Twitter and (2) annually-surveyed characteristics of all 50 states and close to 400 urban populations. Among many results, we generate taxonomies of states and cities based on their similarities in word use; estimate the happiness levels of states and cities; correlate highly-resolved demographic characteristics with happiness levels; and connect word choice and message length with urban characteristics such as education levels and obesity rates. Our results show how social media may potentially be used to estimate real-time levels and changes in population-scale measures such as obesity rates.
Full-text available
Vast data-streams from social networks like Twitter and Facebook contain a people's opinions, fears and dreams. Thomas Lansdall-Welfare, Vasileios Lampos and Nello Cristianini exploit a whole new tool for social scientists.
Full-text available
The measurement of social and psychological phenomena has been advanced by recent progress in the fields of behavioural economics and hedonic psychology. In addition, the increased interest in understanding how individuals perceive their own quality of life, has led to investigating the relations between various macro and individual level variables, generically subsumed as happiness. For many “happiness is considered to be an ultimate goal in life” and it plays an important role in the way people perceive the overall society they live in. Therefore, social scientists and behavioural economists are now stressing the importance of well-being measures, related to people’s evaluations of their quality of life in addition to economic indicators. In the transport sector, project evaluation is mainly based on cost–benefit analyses using economic indicators. However, any provided transportation project/service impacts the quality of the travel experience, the well-being of travellers and their travel behaviour. Competitiveness of modes may be also affected by the promotion of derived or experienced travellers’ well-being. Thus, existing behavioural travel choice models should be enhanced with regards to their behavioural validity incorporating the impacts of travelling happiness/ satisfaction. This study aims to understand and model the impact of stated (anticipated) happiness in the decision choice between a private transport mode—car, and a public transport mode—metro. KeywordsHappiness-Well-being-Discrete choice models-Latent variables-Transport surveys
Conference Paper
Full-text available
Merchants selling products on the Web often ask their customers to review the products that they have purchased and the associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make an informed decision on whether to purchase the product. It also makes it difficult for the manufacturer of the product to keep track and to manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the same product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.
Conference Paper
Today the channels for expressing opinions seem to increase daily. When these opinions are relevant to a company, they are important sources of business insight, whether they represent critical intelligence about a customer's defection risk, the impact of an influential reviewer on other people's purchase decisions, or early feedback on product releases, company news or competitors. Capturing and analyzing these opinions is a necessity for proactive product planning, marketing and customer service and it is also critical in maintaining brand integrity. The importance of harnessing opinion is growing as consumers use technologies such as Twitter to express their views directly to other consumers. Tracking the disparate sources of opinion is hard - but even harder is quickly and accurately extracting the meaning so companies can analyze and act. Tweets' Language is complicated and contextual, especially when people are expressing opinions and requires reliable sentiment analysis based on parsing many linguistic shades of gray. This article argues that using the R programming platform for analyzing tweets programmatically simplifies the task of sentiment analysis and opinion mining. An R programming technique has been used for testing different sentiment lexicons as well as different scoring schemes. Experiments on analyzing the tweets of users over six NHL hockey teams reveals the effectively of using the opinion lexicon and the Latent Dirichlet Allocation (LDA) scoring scheme.
The measurement of natural capital and its management during the economic development process are important aspects of the capital approach to sustainable development. However, the assessment of social welfare in terms of genuine savings (or changes in total wealth per capita) is arguably too limited. This paper tries to make a case for the incorporation of subjective well-being measures in debates about sustainable development by exploring the macro-level relationship between subjective well-being and natural capital in a cross-country setting. It is tested whether natural capital per capita is correlated with subjective well-being in a sample of fifty-eight developed and developing countries, using natural capital data from the World Bank's Millennium Capital Assessment. Bivariate regressions indicate that it is. When multiple regression models are estimated that include (a) major country-level determinants of subjective well-being (GNI per capita, social capital, income distribution, unemployment, inflation), and (b) regional dummy variables for ex-Soviet Union and Latin American countries, the positive correlation remains. The role of data outliers is carefully explored, and the sensitivity of the results to the use of alternative subjective well-being measures (i.e. life satisfaction, happiness, and a combined life satisfaction and happiness index) is investigated. This does not change the nature of the results. The findings arguably strengthen the case for a 'new welfare economics of sustainability' that takes subjective well-being measures into account.