Technical ReportPDF Available

An advanced Systematic Literature Review (SLR) on Spatiotemporal Analyses of Twitter Data – Technical Report

Authors:

Abstract and Figures

The increasing number of research contributions covering various topics and research questions from multiple academic disciplines which are relevant to geographical information science (GIScience), are a challenging factor for a successful selection and assessment of corresponding high quality research articles. Heterogeneous multiple electronic databases pools underline the demand for a statistical significant inclusion of research relevant articles with more sophisticated methods, in order to prove empirical observations during the review and to be able to answer specific research questions. The aim is therefore to conduct a Systematic Literature Review (SLR) to provide a current state of research concerning methods and application of crowdsourced Web 2.0 data for the Location Based Social Network (LBSN) of Twitter. We introduce an advanced framework enabling an automatized systematical, reproducible literature review process.
Content may be subject to copyright.
An advanced Systematic Literature Review (SLR) on
Spatiotemporal Analyses of Twitter Data
Technical Report
Enrico Steiger, João Porto de Albuquerque, Alexander Zipf
GIScience Research Group, Institute of Geography, University of Heidelberg
Berliner Straße 48
D-69120 Heidelberg, Germany
+49(0)6221 / 54 4546
{enrico.steiger,joao.porto,zipf}@geog.uni-heidelberg.de
Abstract
The increasing number of research contributions covering various topics and research
questions from multiple academic disciplines which are relevant to geographical
information science (GIScience), are a challenging factor for a successful selection
and assessment of corresponding high quality research articles. Heterogeneous
multiple electronic databases pools underline the demand for a statistical significant
inclusion of research relevant articles with more sophisticated methods, in order to
prove empirical observations during the review and to be able to answer specific
research questions. The aim is therefore to conduct a Systematic Literature Review
(SLR) to provide a current state of research concerning methods and application of
crowdsourced Web 2.0 data for the Location Based Social Network (LBSN) of Twitter.
We introduce an advanced framework enabling an automatized systematical,
reproducible literature review process.
Keywords: Crowdsourcing, Location Based Social Networks, Twitter, Volunteered
geographic information, VGI, Systematic Literature Review
1 Introduction
Interactive social media platforms offer a tremendous amount of voluntarily,
user-generated content. The enormous potential of LBSN is increasingly recognized
by numerous research domains over the last years. Simultaneously we are facing an
interdisciplinary and relatively new research field with a lack of common online
databases and available literature sources. In terms of application, established and
applied methods for LBSN are hard to identify on the first view. Hence the overall goal
Location Based Social Networks Systematic Literature Review
2
of this paper is to provide an objective summary of the current state of the research
concerning where Twitter has been used, for which specific use cases and what
methods have been applied. The reviewed articles allow a more detailed evaluation
regarding the potential of LBSN, but also intend to summarize remaining challenges
and investigate possible drawbacks. A key element of this review is to identify where
excess research already exists and where new research is needed. Cross analyzing
our review paper regarding involved research disciplines, applications and methods,
we identified research gaps and provided a solid foundation for further studies. We
are also able to give recommendations of future research directions. GIScience can
contribute essential research methods in order to advance the research of LBSN by
further integrating methods of spatial analysis.
In the first part of this paper we will outline existing well defined review methods from
other professions in order to synthesize the methodology of systematic literature
reviews closer to the field of information systems and the related discipline of
GIScience. The following section 2 describes the implemented quantitative and
qualitative review workflow and states the used research questions. Review results
are presented in part 3, followed by a discussion of the results in section 4. The last
part 5 will finally conclude the review outcome with some final remarks.
1.1 Background of VGI, Social Media and LBSN
Emerging technologies have created new approaches towards the distribution and
acquisition of crowdsourced information. The growing availability of mobile devices
equipped with GPS sensors, high performing computers and broadband internet
connections with advanced server and client side key technologies, allow users to
actively participate and create content through mobile applications and location based
services. The role of the user is more and more intertwined from a previously
distinctive perspective being either a producer or consumer, into a rather dynamic
manner of becoming a prosumer (Tapscott 1996). The participation of individuals and
their vast amount of generated data has been commonly known under the term of
Web 2.0 (o’Reilly 2009). Facilitated by new technologies audiences are committing
their local knowledge without the need of having a prior expertise. Goodchild names
this phenomenon Citizens as Sensors, where Volunteered Geographic Information
(VGI) is created, assembled, and disseminated by individuals or groups with
knowledge or capabilities using the Web 2.0 (Goodchild 2007).Within this interactive
networked, participatory model of People as Sensors (Resch 2013) information is
supplied free of charge and voluntarily. Haklay terms this development of new
innovative social web mapping applications as the evolution of the GeoWeb (Haklay
Location Based Social Networks Systematic Literature Review
3
et al. 2008). Social Networks are a key part of this development, incorporating new
information plus communication tools and attracting millions of users. Boyd & Ellison
(2007) outline the term Social Network Sites (SNS), typified by individuals who
construct an online profile communicating with other users, sharing common ideas,
activities, events and interests. Location Based Social Networks (LBSN) further
enhances existing social networks, adding a spatial dimension with
location-embedded services. For example users upload geotagged photos via Flickr,
checking in at a venue with Foursquare or commenting on a local event via Twitter.
These location-driven social structures allow mobile device owner with ubiquitous
internet access to exchange details of their personal location as a key point of
interaction (Zheng 2011). LBSN are bridging the gap between our physical world and
online social network services containing three layers of information according to
Symeonidis et al. (2014): a social network (user layer), a geographical network
(location layer) and a semantic metadata network (content layer).
1.2 Background of Systematic Literature Review
Systematic Literature Review (SLR) is a well-established review method, first notably
applied by professionals in the field of medicine and health care. According to
Cochrane, one of the common well known and highly ranked medical database review
libraries setup in 1994, “systematic reviews seek to collate all evidence that fits
pre-specified eligibility criteria in order to address a specific research question”
(Cochrane 2011). The Cochrane Handbook for Systematic Reviews of Interventions
contains methodological guidance and aims to minimize bias by using explicit,
systematic methods (Cochrane 2011).Coming from a medical and public health
background, Fink (2005) considers four keyword strategies to be essential when
conducting a research review: systematic examination of all literature sources,
comprehensive elaboration of a structured review, explicit explanation of all used
methods and reproducible exclusion of subjective examinations allowing others to
reproduce results. Webster & Watson (2009) state a lack of theoretical progress in
information systems due to the complex and difficult delineation of research in an
interdisciplinary field. SLRs therefore can assist in summarizing existing evidence
concerning a technology to identify research gaps for further investigations and
provide a reproducible methodology to be able to conduct new research activities
(Brereton et al. 2007). Levy & Ellis (2006) introduce a framework for conducting and
writing an effective literature review adapting the review concept into the field of
information systems. Kitchenham refined the approach of an Evidence-based
Software Engineering (EBSE) aiming “to improve decision making related to software
Location Based Social Networks Systematic Literature Review
4
development and maintenance by integrating current best evidence from research with
practical experience” (Kitchenham 2004). Keele (2007) adapted the concept of
systematic reviews into software engineering research by setting up distinct guidelines
derived from those in the medical field. In further tertiary studies (Kitchenham et al.
2009 and Kitchenham et al. 2010) automated searches for systematic literature
reviews have been performed in order to quantitatively and qualitatively evaluate the
applied SLR methodologies. One study outcome was an increased number of SLRs
being published between 2004 and 2008. However many researchers still conduct
informal literature surveys. Related to geographic information science Horita et al.
(2013) assessed the current state of research analyzing VGI for disaster management
and applying a SLR including a screening process of important literature databases.
Roick & Heuser (2013) provided a general non systematical review article about the
current research on Location Based Social Networks, stating the need of further
studies on investigating how social networks can be applied to specific use cases.
However literature reviews have been performed in a rather non systematical manner
with a lack of statistical techniques including meta-analysis.
2 Review method
This review will follow the guidelines developed by Kitchenham & Keele (2007),
dividing the research into three main phases: planning the review, conducting the
review with the selection of studies from electronic databases and reporting the final
review report itself (Kitchenham & Keele 2007). The procedure of the literature review
including all derived results has been documented in the review protocol. Furthermore
test reviews with preliminary trial searches have been carried out, in order to detect
and minimize bias concerning the defined search strings or during the subsequent
data extraction process. The flowchart review model in fig. 1 visualizes our automatic
workflow approach. All following paragraphs and chapters are divided according to
the review process steps shown in the flowchart.
Location Based Social Networks Systematic Literature Review
5
Fig. 1: Flowchart Review Process and number of included papers
Location Based Social Networks Systematic Literature Review
6
2.1 Electronic Databases
The initial step of selecting eligible literature sources is based on following criteria:
- consideration of journal and conference proceedings published between 2005
and 2013 in English (technical drafts etc. are excluded)
- selection of multiple digital libraries with relevance to information research
identified by Brereton et al. (2007) are further supplemented
Several electronic literature sources have been evaluated during the search
documentation according to ability of citation export, maximum number of keyword
terms, query limits etc. Other input factors were crawl and search limitations (e.g.
Springer) and research papers not being fully accessible. Table 2 visualizes our initial
288 and 92 final reviewed paper concerning the publication origin. Duplicate search
results found in multiple electronic databases have been excluded. Papers appearing
in several electronic databases e.g. inside Google Scholar search engine for
publications and Web of Knowledge will only be included once, storing unique search
results.
Source
URL
Unique
Search
Result
Meta
Analysis
Text
Analysis
Paper
Screening
Backward
Reference
Search
Final
Review
IEEE Library
http://www.ieeexplore.ieee.org
36
16
14
5
9
14
ACM Digital Library
http://dl.acm.org
149
33
35
20
21
41
AIS Electronic Library
http://aisel.aisnet.org
4
1
1
1
0
1
Google Scholar
http://scholar.google.de
12
8
12
8
8
16
Science Direct
http://www.sciencedirect.com
12
3
4
0
0
0
Elsevier
http://www.scopus.com
23
10
0
3
1
4
Springer Link
http://www.springerlink.com
9
7
1
0
3
3
Taylor & Francis
http://www.tandfonline.com/
15
10
1
0
0
0
Wiley Online Library
http://onlinelibrary.wiley.com
2
2
1
1
1
2
Web of Knowledge
http://www.webofknowledge.com
18
11
2
2
0
2
AAAI
https://www.aaai.org/
2
0
3
2
7
9
Total
282
101
74
42
50
92
Table 2: Used electronic databases with included and excluded papers during the review process.
Location Based Social Networks Systematic Literature Review
7
2.2 Search Terms
The defined electronic databases has been searched conducting an automatic
predefined keyword search including research relevant terms which have been
collected through an iterative training step proposed by Kitchenham et al. (2010) (table
3).








Table 3: defined Search Terms
As described by Levy & Ellis (2006) defining the search terms can present multiple
problems for the novice researcher. In addition, it can introduce possible bias,
therefore an iterative approach beginning with predefined terms was used in this
review. All retrieved papers and their meta-data are then semantically analyzed
looking for new keywords. These key words form new additional search terms for the
next search iteration and so on. Finally we are generating a cloud of terms and their
probabilistic occurrences within all the screened research papers, indicating search
terms with the highest relevance to our research question. All search terms have been
queried pair- and crosswise to the digital libraries and connected through and as well
as “or” operators. The search term “social media” has been excluded as search results
including meta-analysis have shown no relevant research papers with specific
methods and use cases were extracted. “Volunteered Geographic Information” and
“VGI” as its abbreviation have been queried as two individual search terms, because
term occurrences within selected research papers and in the metadata after our
training phase, have revealed a random distribution and use by authors. Some papers
only included the abbreviation as a keyword, others mentioned the full term. Keywords
arbitrary defined by researchers can be an issue since these buzzwords appear and
disappear during temporal and the technological development (Levy & Ellis 2006).
Therefore the underlying methodologies might be subject to a more static
development, but difficult to quantitatively assess. To assist the selection process a
backward reference search was performed within the qualitative review proposed by
Webster & Watson (2009). Implementing an automatic citation search approach during
the quantitative review however is not possible at this stage, due to the high amount of
primarily included papers and the fact that meta-data of research paper currently does
not contain machine-readable information concerning used references. The performed
Location Based Social Networks Systematic Literature Review
8
broad automated search has been finished when no new results and only identical
literature references were identified (Okoli & Schabram 2010). After the search
process all papers were screened to avoid duplications and were excluded when
already existing, in order to minimize publication bias (Brereton et al. 2007).
Quantitative review
282 research papers have been identified with our previously described defined setup
inclusion and exclusion parameter. Searching the literature generates a large number
of resulting studies and a single reviewer is not capable to review these qualitatively.
Therefore all selected papers are further processed within a meta-data analysis
exporting citations and references using common file formats (bibtex, endnote, ris).
2.3 Metadata Analysis
Fig. 4 visualizes exemplarily a conference paper we selected including author, title,
year and keywords.
Fig. 4: meta-data example in ris reference format
The created term corpus from our meta-data was eliminated from white spaces,
converted to lower cases and removed from numbers, stop words (for english) and
special characters. We are comparing the total number of term occurrences for all 282
screened papers during our meta-analysis, once only with keywords and once with
keywords including the paper titles (Fig 5). The specific terms “social”, “twitter”,
“information”, “data” and “network occur in absolute numbers more than 60 times and
appear in almost 43% of all papers when semantically analyzing keywords and topics.
Location Based Social Networks Systematic Literature Review
9
Fig. 5: total occurring term frequencies of 282 screened papers comparing only keywords and
keywords with title. The terms “urban” and “routing” only appear as keywords, not within the title.
Looking for association rules and terms showing a significant correlation with other
extracted words, we were able to build up a term adjacency matrix represented as a
graph (fig. 6) highlighting the most occurring terms. The more terms mutually
correlating, the higher the edge weight and the closer they appear in our graph. For
example the terms “social” + “media” and “social” + networkcorrelate to each other
by more than 0.5 and are therefore associated.
Fig 6: weighted graph network generated from term adjacency matrix
In the next step all semantic annotations and occurring terms are counted simply using
the term frequencyinverse document frequency (tf-idf) algorithm implemented within
R text mining package (Feinerer 2013):
0
20
40
60
80
100
120
140
160
total number of term occurences
histogram - term frequencies
Meta-Analysis: only Keyword Meta-Analysis: Keyword + Title
n = 282
Location Based Social Networks Systematic Literature Review
10


=number of occurrences of i (term) in j (paper)
=number of papers containing I
N= total number of selected papers
We are calculating each term frequency across the specific research papers and divide
it by the total number of term occurrences over all papers. The obtained weighting
factor indicates whether terms occur rarely or are commonly used words within all
selected research papers. We are applying DBSCAN (Ester et al. 1996) density based
clustering algorithm in order to detect statistically significant centroids of term
occurrences. The minimum number of points to form a cluster was defined to be 5 with
a reachability distance of ε=1.5. As a result (fig. 7) cluster 1 has a high overall
document frequency with a tf-idf score close to zero. The research papers in cluster 1
(n=101) seem to cover similar topics because we are computing pairwise a two
dimensional semantic distance from used keywords and titles. In addition, these
papers form a strong cluster without scattering and are our targeted research papers.
Cluster 2 shows a medium document/ term frequency being more dispersed and
diffusing into local sub cluster of semantic similarity. Cluster 3 has a high term
frequency for the specific paper with a low document frequency (tf-idf score higher)
over all papers, becoming a noisy outlier cluster. We can identify most frequently
occurring terms from all 101 papers which have been assigned to cluster
1:”Crowdsourcing”, “Twitter”, “Volunteered Geographic Information” and “Social
Networks”. We have extracted a group of terms being clearly semantically related to
each other and suitable to answer our research questions.
Location Based Social Networks Systematic Literature Review
11
Fig. 7: Term frequency-inverse document frequency of research papers with DBSCAN clustering
2.4 Text Analysis
The next goal is to find papers within the identified semantic cluster, intersecting with
most of the frequent terms from our previous meta-analysis (Kofod-Petersen 2012).
Simultaneously we are assessing whether the extracted cluster from our meta-analysis
correlates with the content of the paper. Therefore we are now focusing on
semantically analyzing the full text. All initially stored papers are converted from pdf
into plain text content using the java open source Tika toolkit. Afterwards the remaining
converted papers undergo a natural language processing (NLP) step. We are
semantically processing all documents including Tokenization, Stop word-Filtering,
and Stemming using Rapidminer functionalities as a powerful data and text mining tool
(Mierswa et al. 2006). References at the end of research papers have been filtered,
due to the processing parameters of minimum and maximum word lengths, removal of
numbers and punctuations. For our case we are using Latent Dirichlet allocation (LDA)
as one semantic probability based topic extraction model introduced by Blei et al.
(2003). The unsupervised machine learning model identifies latent topics and
corresponding document clusters from a large text collection.
𝑐𝑟𝑜𝑤𝑑𝑠𝑜𝑢𝑟𝑐
𝑡𝑤𝑖𝑡𝑡𝑒𝑟
𝑣𝑜𝑙𝑢𝑛𝑡𝑔𝑒𝑜𝑔𝑟𝑎𝑝𝑖𝑛𝑓𝑜𝑟𝑚
𝑠𝑜𝑐𝑖𝑎𝑙
𝑛𝑒𝑡𝑤𝑜𝑟𝑘
frequent terms (cluster1):
Location Based Social Networks Systematic Literature Review
12
Topic 1





Topic 2





1 document-term matrix (306 terms)
Non-/sparse entries: 306/0
Maximal term length: 8
Weighting: term frequency (tf)
Fig. 8: Topic Model results, showing the log-likelihood
() of the data for different number of topics (k)
Fig. 9: LDA topic model for our example paper
Longueville & Smith (2009) and document
term matrix results with no sparsely terms
As the number of specified topics k for LDA needs to be set prior, we are estimating the
parameter by computing the maximum likelihood estimation iterating through each
topic model for every paper, generated from the document term matrix. Results have
shown a highest log likelihood for all papers with 2 topic models (
) following the topic model selection by Griffiths & Steyvers (2004) (Fig. 8).
Out of all 282 papers, we extracted 564 individual topics (2 topics per paper), each
consisting of 5 associated terms (in total 2820 terms). For our example paper “OMG,
from here, I can see the flames !: a use case of mining Location Based Social Networks
to acquire spatio-temporal data on forest fires published by Longueville & Smith
(2009) (from fig. 4), we were uncovering 2 latent topics intersecting with each other (fig.
9). When qualitatively reviewing this paper we can indeed discern a time analysis of
twitter tweet distributions for a fire event in Marseille where user are providing
information through LBSN. In the next step we are now picking only terms related to
topics within all of our papers which occur several times. When analyzing most
frequent occurring associated terms inside a correlation matrix (fig. 10), we are able to
detect topics in our papers which significantly overlap with each other. Each dot inside
our scatterplot represents the specific correlation of one paper against another paper
for our given topics. For example paper 1-4, 10-18 and paper 234-249 highly mutually
correlate (  ) and contain semantically the same topics. Papers are highly
Location Based Social Networks Systematic Literature Review
13
semantic correlating towards each other for the specific topics “social network”, “tweet”
and twitter”. We are only including paper during the text analysis process which shows
a strong positive correlation coefficient for these specific topics equal or higher than
  . Therefore we can reject the null hypothesis, since t-test critical value
is   74 documents have been identified based on a semantic
topic extraction by measuring topic correlations.
Fig. 10: Correlation matrix of all papers for three extracted topic related terms “social network”, “tweet”
and “twitter”.
Results of most frequent topics from the text analysis are shown in fig. 11. The
frequency of occurrence of topics has been included according to the color and term
label size. Fig. 12 visualizes our selected 74 paper and their year of publication. The
count of publications per year by comparing the initially 282 records with results from
meta-analysis and the final text analysis, shows similar frequencies. This analysis
Location Based Social Networks Systematic Literature Review
14
helps to verify if there is a one-sided selection of papers in any of the quantitative
review steps, which might bias the result.
Fig. 11: Wordcloud with most frequent
extracted topics (applying LDA) over all
research paper (n=282)
Fig. 12: Comparison year of publication of initially selected
papers (n=282) with results after meta-analysis (n=101) and
text analysis (n=74)
2.5 Merge Process
Comparing identified papers from meta- and text analysis, not all are matching up. Out
of 74 papers from the text analysis, 54 (=72%) have been part of the previous cluster
from our meta-analysis. 14 are journal publications, 60 are conference proceedings. 21
papers from the text analysis have not been identified within the meta-analysis.
Therefore all papers need to go through a merging process (Fig. 13), where frequent
terms from the metadata-analysis are compared with extracted topics from the text
analysis for each paper.
Fig. 13: review of identical and non-identical papers
Location Based Social Networks Systematic Literature Review
15
47 papers and 20 papers respectively are non-identical. Out of 47 additional identified
papers from the meta-analysis 39 contain, when focusing on LDA extracted terms from
the text-analysis, none of our relevant topics “twitter”, “tweet” and “social network”.
Correlating text analysis topics with meta-analysis terms, 8 papers from the text
analysis have shown similarities with meta-data terms and have therefore been
included. Validating the merge results, we can indeed confirm that non identical papers
which have been excluded cover different topics (e.g. dealing with Foursquare or Flickr
data). One reason meta- and text analysis results differ are character- and keyword
limitations within the metadata of papers. Also every papers metadata contains
different amounts of keywords, or no keywords at all. Based on our relevant topics, we
have quantitative extracted scientific articles (n=62) and can now begin the qualitative
review phase by screening the remaining articles.
Qualitative review
During the qualitative review all clearly irrelevant results will be discarded, i.e. papers
that neither address any aspect of the research questions. Drafting a clear and
concise research question is an essential task needed to successful identify primary
studies providing a detailed State of the Art (Okoli & Schabram 2010).
2.6 Research questions
As the reviews objectives are to extract use cases, focused research areas and
methods when utilizing Twitter as one LBSN, the following three research questions
have been selected:
1. What are the applications where Twitter as one LBSN has been used?
2. Which of the academic disciplines are mainly focused on researching Twitter?
3. What are the methods aiming to use voluntary information and crowdsourced social
media data from Twitter?
2.7 Paper Screening
A practical screen of included papers further synthesizes the review by examining
methods and use cases. During the paper screening process, paper have been
excluded which do not show any relevance to our previous formulated research
questions. Paper from the Association for the Advancement of Artificial Intelligence
(AAAI) have been extracted from the text analysis but not detected within the
meta-analysis. The qualitative review has shown a relevance of these articles to our
Location Based Social Networks Systematic Literature Review
16
research question and therefore all papers have been included. 15 Papers not
explaining their methodological approach or application of Twitter are fallen under
exclusion criteria. Another 5 paper have been excluded because of self-citation.
These cross citations have not been excluded quantitatively in the meta- and
text-analysis as they are strongly semantically close. 42 papers are remaining.
2.8 Backward Reference Search
Given the topic and term related semantic inclusion of papers, we started a manually
backward reference search according to Webster & Watson (2009) and referred from
Levy & Ellis (2006). This approach looks through all citations of our 42 finally selected
articles to follow methods and their development. However papers from authors
referencing back to their own papers covering similar topics have been excluded. 50
additional articles have been included through backward reference searching.
Conclusions
This technical report has presented an advanced framework to conduct a quantitative
and qualitative review of studies providing a state of research concerning
methodologies, applications and use cases of Twitter as one main Location Based
Social Network (LBSN). The proposed systematic literature review method considers
and combines search results from multiple heterogeneous digital libraries and allows
an effective reproducible assessment of relevant research studies. With a combined
synthesis of methods from computer linguistics including tf-idf algorithm during the
meta-analysis and LDA probabilistic topic approach modeling the semantic content of
all documents, we achieved a successful quantitative inclusion of papers. Together
with the implementation of an iterative keyword search considering meta-analysis
results, we were able to minimize bias during the overall review process. The major
research outcome, generating answers for our research questions after the qualitative
review has been accomplished going into detail by providing new statistical insights
for LBSN.
6 Acknowledgement
This research has been funded through the graduate scholarship program
Crowdanalyser- spatiotemporal analysis of user-generated content supported by the
state of Baden Wurttemberg. We also thank Prudence Carr for proofreading this
research article.
Location Based Social Networks Systematic Literature Review
17
References
Abel, F. et al., 2012. Semantics + Filtering + Search = Twitcident Exploring Information in Social
Web Streams Categories and Subject Descriptors.
ACM Transactions on Intelligent Systems
and Technology
, pp.285294.
Andrienko, G. & Andrienko, N., 2013. Thematic Patterns in Georeferenced Tweets through
Space-Time Visual Analytics.
Becker, H. & Gravano, L., 2011. Beyond Trending Topics: Real-World Event Identification on
Twitter.
AAAI
, pp.438441.
Blei, D., Ng, A. & Jordan, M., 2003. Latent dirichlet allocation.
the Journal of machine Learning
research
.
Boettcher, A. & Lee, D., 2012. EventRadar: A Real-Time Local Event Detection Scheme Using
Twitter Stream.
2012 IEEE International Conference on Green Computing and
Communications
, pp.358367.
Boyd, D.M. & Ellison, N.B., 2007. Social Network Sites: Definition, History, and Scholarship.
Journal of Computer-Mediated Communication
, 13(1), pp.210230.
Brereton, P. et al., 2007. Lessons from applying the systematic literature review process within the
software engineering domain.
Journal of Systems and Software
, 80(4), pp.571583.
Cha, M. et al., 2010. Measuring User Influence in Twitter: The Million Follower Fallacy.
ICWSM
.
Chae, J. et al., 2012. Spatiotemporal social media analytics for abnormal event detection and
examination using seasonal-trend decomposition.
2012 IEEE Conference on Visual Analytics
Science and Technology (VAST)
, pp.143152.
Chu, Z., Gianvecchio, S. & Wang, H., 2010. Who is Tweeting on Twitter: Human , Bot , or Cyborg?
ACM
, pp.2130.
Cochrane Reviewers’ Handbook Glossary, Version 5.1.0. Cochrane Collaboration.
Corvey, W. et al., 2010. Twitter in Mass Emergency: What NLP Techniques Can Contribute.
Computational Linguistics
, 4(June), pp.2324.
Cranshaw, J. et al., 2012. The Livehoods Project: Utilizing Social Media to Understand the
Dynamics of a City.
ICWSM
.
Crooks, A. et al., 2013. #Earthquake: Twitter as a Distributed Sensor System.
Transactions in GIS
,
17(1), pp.124147.
Cui, A. et al., 2012. Discover breaking events with popular hashtags in twitter. In
Proceedings of
the 21st ACM international conference on Information and knowledge management - CIKM
’12
. New York, New York, USA: ACM Press, p. 1794.
Location Based Social Networks Systematic Literature Review
18
Dalvi, N., Kumar, R. & Pang, B., 2012. Object matching in tweets with spatial models. In
Proceedings of the fifth ACM international conference on Web search and data mining -
WSDM ’12
. New York, New York, USA: ACM Press, p. 43.
Earle, P.S., Bowden, D.C. & Guy, M., 2011. Twitter earthquake detection: earthquake monitoring
in a social world.
Annals of Geophysics
.
Ester, M. et al., 1996. A density-based algorithm for discovering clusters in large spatial databases
with noise.
KDD
.
Feinerer, I., 2013. Introduction to the tm Package Text Mining in R.
tiré de http://cran. r-project.
org/web/packages/tm/
Ferrari, L. et al., 2011. Extracting urban patterns from location-based social networks. In
Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social
Networks - LBSN ’11
. New York, New York, USA: ACM Press, p. 1.
Finin, T. et al., 2010. Annotating Named Entities in Twitter Data with Crowdsourcing. ACM pp.
8088.
Fink, A., 2005.
Conducting Research Literature Reviews: From the Internet to Paper
Fuchs, G., Jankowski, P. & Augustin, S., 2013. Extracting Personal Behavioral Patterns from
Geo-Referenced Tweets.
AGILE
.
Gelernter, J. & Balaji, S., 2013. An algorithm for local geoparsing of microtext.
GeoInformatica
.
Gerais, M. et al., 2012. Traffic Observatory: a system to detect and locate traffic events and
conditions using Twitter.
ACM
, pp.511.
Go, A., Huang, L. & Bhayani, R., 2009. Sentiment Analysis of Twitter Data.
Entropy
, 2009(June),
pp.3038.
Gonzalez, R. & Chen, Y., 2012. TweoLocator: A Non-Intrusive Geographical Locator System for
Twitter.
ACM
, pp.2431.
Goodchild, M., 2007. Citizens as sensors: the world of volunteered geography.
GeoJournal
.
Griffiths, T. & Steyvers, M., 2004. Finding scientific topics.
… academy of Sciences of the United
.
Gupta, A. & Kumaraguru, P., 2012. Credibility Ranking of Tweets during High Impact Events.
ACM
.
Haklay, M., Singleton, A. & Parker, C., 2008. Web mapping 2.0: The neogeography of the GeoWeb.
Geography Compass
.
Hecht, B. et al., 2011. Tweets from Justin Bieber ’ s Heart: The Dynamics of the Location Field
in User Profiles.
Location Based Social Networks Systematic Literature Review
19
Hiruta, S. et al., 2012. Detection , Classification and Visualization of Place-triggerd Geotagged
Tweets.
ACM
.
Hong, L. et al., 2012. Discovering geographical topics in the twitter stream.
Proceedings of the 21st
international conference on World Wide Web - WWW ’12
, p.769.
Hong, L., Convertino, G. & Chi, E.H., 2011. Language Matters in Twitter: A Large Scale Study
Characterizing the Top Languages in Twitter Characterizing Differences across Languages
Including URLs and Hashtags. , (1), pp.518521.
Horita, F.E.A. et al., 2013. The use of Volunteered Geographic Information and Crowdsourcing in
Disaster Management: a Systematic Literature Review. In
Proceedings of the Nineteenth
Americas Conference on Information Systems, Chicago Illinois, August 15-17, 2013
. Atlanta,
GA, USA: AIS, pp. 110.
Hughes, A.L. & Palen, L., 2009. Twitter adoption and use in mass convergence and emergency
events.
International Journal of Emergency Management
, 6(3/4), p.248.
Jackoway, A., Samet, H. & Sankaranarayanan, J., 2011. Identification of live news events using
Twitter. In
Proceedings of the 3rd ACM SIGSPATIAL International Workshop on
Location-Based Social Networks - LBSN ’11
. New York, New York, USA: ACM Press, p. 1.
Keele, S., 2007. Guidelines for performing Systematic Literature Reviews in Software Engineering.
Kinsella, S., Murdock, V. & Hare, N.O., 2011. “ I ’ m Eating a Sandwich in Glasgow ”: Modeling
Locations with Tweets.
ACM
, (June), pp.6168.
Kitchenham, B., 2004. Evidence-based software engineering.
Engineering, 2004. ICSE.
Proceedings. 26th International Conference on. IEEE, 2004.
Kitchenham, B. et al., 2009. Systematic literature reviews in software engineering A systematic
literature review.
Information and Software Technology
, 51(1), pp.715.
Kitchenham, B. et al., 2010. Systematic literature reviews in software engineering A tertiary
study.
Information and Software Technology
, 52(8), pp.792805.
Kitchenham, B. & Keele, S., 2007. Guidelines for performing Systematic Literature Reviews in
Software Engineering.
Kling, F., Kildare, C. & Pozdnoukhov, A., 2012. When a City Tells a Story: Urban Topic Analysis.
ACM
, pp.482485.
Kofod-Petersen, A., 2012. How to do a Structured Literature Review in computer science.
Kosala, R. & Adi, E., 2012. Harvesting Real Time Traffic Information from Twitter.
Procedia
Engineering
, 50(Icasce), pp.111.
Krishnamurthy, B. & Arlitt, M., 2006. A Few Chirps About Twitter. , pp.1924.
Location Based Social Networks Systematic Literature Review
20
Kulshrestha, J. & Gummadi, K.P., 2008. Geographic Dissection of the Twitter Network.
Kwon, K.H. & Hall, B., 2010. AN EXPLORATION OF SOCIAL MEDIA IN EXTREME EVENTS:
RUMOR THEORY AND TWITTER DURING THE HAITI EARTHQUAKE 2010.
AIS
, pp.1
14.
Lampos, V. & Cristianini, N., 2010. Tracking the flu pandemic by monitoring the Social Web. ,
pp.411416.
Lee, B. & Hwang, B.-Y., 2012. A Study of the Correlation between the Spatial Attributes on
Twitter.
2012 IEEE 28th International Conference on Data Engineering Workshops
, pp.337
340.
Lee, R. & Sumiya, K., 2010. Measuring geographical regularities of crowd behaviors for
Twitter-based geo-social event detection. In
Proceedings of the 2nd ACM SIGSPATIAL
International Workshop on Location Based Social Networks - LBSN ’10
. New York, New
York, USA: ACM Press, p. 1..
Levy, Y. & Ellis, T.J., 2006. A Systems Approach to Conduct an Effective Literature Review in
Support of Information Systems Research. , 9.
Li, W. et al., 2011. The where in the tweet. In
Proceedings of the 20th ACM international
conference on Information and knowledge management - CIKM ’11
. New York, New York,
USA: ACM Press, p. 2473.
Longueville, B. De & Smith, R.S., 2009. “ OMG , from here , I can see the flames!: a use case of
mining Location Based Social Networks to acquire spatio- temporal data on forest fires.
ACM
,
(c), pp.7380.
Maceachren, A.M. et al., 2011. SensePlace2: GeoTwitter Analytics Support for Situational
Awareness. , pp.181190.
Mierswa, I., Wurst, M. & Klinkenberg, R., 2006. Yale: Rapid prototyping for complex data mining
tasks.
and data mining
.
Murthy, D. & Longwell, S. a., 2013. Twitter and Disasters.
Information, Communication & Society
,
16(6), pp.837855.
o’Reilly, T., 2009.
What is web 2.0
Okoli, C. & Schabram, K., 2010. A Guide to Conducting a Systematic Literature Review of
Information Systems Research. , (2008).
Pan, C., 2011. Event Detection with Spatial Latent Dirichlet Allocation.
ACM
, pp.349358.
Pennacchiotti, M. & Popescu, A., 2010. to Twitter User Classification. , pp.281288.
Location Based Social Networks Systematic Literature Review
21
Resch, B., 2013. People as sensors and collective sensing-contextual observations complementing
geo-sensor network measurements.
Progress in Location-Based Services
.
Ritterman, J., Osborne, M. & Klein, E., 2009. Using Prediction Markets and Twitter to Predict a
Swine Flu Pandemic. , (2004).
Roick, O. & Heuser, S., 2013. Location Based Social Networks - Definition, Current State of the Art
and Research Agenda.
Transactions in GIS
, p.n/an/a.
Sadilek, A., Krumm, J. & Horvitz, E., 2013. Crowdphysics: Planned and Opportunistic
Crowdsourcing for Physical Tasks.
Mircrosoft Research
.
Sakaki, T., 2009. Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors.
Sakaki, T. & Matsuo, Y., 2012. Real-time Event Extraction for Driving Information from Social
Sensors.
IEEE
, pp.221226.
Sakaki, T., Okazaki, M. & Matsuo, Y., 2010. Earthquake shakes Twitter users: real-time event
detection by social sensors. In
Proceedings of the 19th international conference on World wide
web
. ACM, pp. 851860.
Sofean, M. & Smith, M., 2012. A Real-Time Architecture for Detection of Diseases using Social
Networks: Design , Implementation and Evaluation.
ACM
, (figure 1), pp.309310.
Starbird, K. & Muzny, G., 2012. Learning from the Crowd: Collaborative Filtering Techniques for
Identifying On-the-Ground Twitterers during Mass Disruptions.
ISCRAM
, 2011; pp.110.
Stefanidis, A., Crooks, A. & Radzikowski, J., 2011. Harvesting ambient geospatial information
from social media feeds.
GeoJournal
, 78(2), pp.319338.
Symeonidis, P., Ntempos, D. & Manolopoulos, Y., 2014. Location-Based Social Networks.
Recommender Systems for
.
Takhteyev, Y., Gruzd, A. & Wellman, B., 2012. Geography of Twitter networks.
Social Networks
,
34(1), pp.7381.
Tapscott, D., 1996. The digital economy: Promise and peril in the age of networked intelligence.
Terpstra, T., 2012. Towards a realtime Twitter analysis during crises for operational crisis
management.
ISCRAM
, pp.19.
Thomson, R. et al., 2012. Trusting Tweets: The Fukushima Disaster and Information Source
Credibility on Twitter.
ISCRAM
, pp.110.
Veloso, A. & Ferraz, F., 2011. Dengue surveillance based on a computational model of
spatio-temporal locality of Twitter.
Location Based Social Networks Systematic Literature Review
22
Wakamiya, S. & Lee, R., 2012. Crowd-sourced Urban Life Monitoring: Urban Area
Characterization based Crowd Behavioral Patterns from Twitter Categories and Subject
Descriptors.
ACM
.
Wang, H. et al., 2012. A System for Real-time Twitter Sentiment Analysis of 2012 U . S .
Presidential Election Cycle. In
Other
. pp. 115120.
Wanichayapong, N. et al., 2011. Social-based Traffic Information Extraction and Classification.
IEEE
, pp.107112.
Watanabe, K. et al., 2011. Jasmine: a real-time local-event detection system based on geolocation
information propagated to microblogs. In
CIKM ’11 Proceedings of the 20th ACM
international conference on Information and knowledge management
. ACM New York, NY,
USA, pp. 25412544.
Webster, J. & Watson, R.T., 2009. ANALYZING THE PAST TO PREPARE FOR THE FUTURE:
WRITING A REVIEW. , 26(2).
Weng, J. & Lee, B., 2011. Event Detection in Twitter.
AAAI
, pp.401408.
Weng, J., Lim, E. & Jiang, J., 2010. Twitterrank: Finding Topic-Sensitive Influential Twitterers
TwitterRank: Finding Topic-sensitive Influential Twitterers. , pp.261270.
Wu, S. et al., 2011. Who Says What to Whom on Twitter. , pp.705714.
Yardi, S., Tweeting from the Town Square: Measuring Geographic Local Networks.
Yardi, S. & Boyd, D., 2010. Tweeting from the Town Square: Measuring Geographic Local
Networks.
ICWSM
.
Yuan, Q. et al., 2013. Who , Where , When and What: Discover Spatio-Temporal Topics for Twitter
Users.
ACM
, pp.605613.
Zhang, D. et al., 2010. Extracting Social and Community Intelligence from Digital Footprints: An
Emerging Research Area. , pp.418.
Zheng, Y., 2011. Location-based social networks: Users.
Computing with Spatial Trajectories
.
Zielinski, A. & Bügel, U., 2012. Multilingual Analysis of Twitter News in Support of Mass
Emergency Events. , pp.15.
Zielinski, A. & Middleton, S., 2013. Social Media Text Mining and Network Analysis for Decision
Support in Natural Crisis Management. Proceedings of the 10th International ISCRAM
Conference
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Research on human computation and crowdsourcing has concentrated on tasks that can be accomplished remotely over the Internet. We introduce a general class of problems we call crowdphysics (CP)-crowdsourcing tasks that require people to collaborate and synchronize both in time and physical space. As an illustrative example, we focus on a crowd-powered delivery service-a specific CP instance where people go about their daily lives, but have the opportunity to carry packages to be delivered to specific locations or individuals. Each package is handed off from person to person based on overlaps in time and space until it is delivered. We formulate CP tasks by reduction to a graph-planning problem, and analyze the performance using a large sample of geotagged tweets as a proxy for people's location. We show that packages can be delivered with remarkable speed and coverage. These results hold for the case when we know people's future locations and also when routing without global knowledge, making only local greedy decisions. To our knowledge, this is the first empirical evidence that dynamic networks of mobile individuals are highly navigable. Copyright © 2013, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Article
Full-text available
The U.S. Geological Survey (USGS) is investigating how the social networking site Twitter, a popular service for sending and receiving short, public text messages, can augment USGS earthquake response products and the delivery of hazard information. Rapid detection and qualitative assessment of shaking events are possible because people begin sending public Twitter messages (tweets) with in tens of seconds after feeling shaking. Here we present and evaluate an earthquake detection procedure that relies solely on Twitter data. A tweet-frequency time series constructed from tweets containing the word "earthquake" clearly shows large peaks correlated with the origin times of widely felt events. To identify possible earthquakes, we use a short-term-average, long-term-average algorithm. When tuned to a moderate sensitivity, the detector finds 48 globally-distributed earthquakes with only two false triggers in five months of data. The number of detections is small compared to the 5,175 earthquakes in the USGS global earthquake catalog for the same five-month time period, and no accurate location or magnitude can be assigned based on tweet data alone. However, Twitter earthquake detections are not without merit. The detections are generally caused by widely felt events that are of more immediate interest than those with no human impact. The detections are also fast; about 75% occur within two minutes of the origin time. This is considerably faster than seismographic detections in poorly instrumented regions of the world. The tweets triggering the detections also provided very short first-impression narratives from people who experienced the shaking.
Conference Paper
Full-text available
Social networks such as Twitter and Facebook are popular, personal, and real-time in nature. We found that there exists a significant number of traffic information such as traffic congestion, incidents, and weather in Twitter. However, an algorithm is needed to extract and classify the traffic information before publishing (re-tweeting) and becoming useful for others. Traffic information was extracted from Twitter using syntactic analysis and then further classified into two categories: point and link. This method can classify 2,942 traffic tweets into the point category with 76.85% accuracy and classify 331 traffic tweets into the link category with 93.23% accuracy. Our system can report traffic information real-time.
Book
Online social networks collect information from users' social contacts and their daily interactions (co-tagging of photos, co-rating of products etc.) to provide them with recommendations of new products or friends. Lately, technological progressions in mobile devices (i.e. smart phones) enabled the incorporation of geo-location data in the traditional web-based online social networks, bringing the new era of Social and Mobile Web. The goal of this book is to bring together important research in a new family of recommender systems aimed at serving Location-based Social Networks (LBSNs). The chapters introduce a wide variety of recent approaches, from the most basic to the state-of-the-art, for providing recommendations in LBSNs. The book is organized into three parts. Part 1 provides introductory material on recommender systems, online social networks and LBSNs. Part 2 presents a wide variety of recommendation algorithms, ranging from basic to cutting edge, as well as a comparison of the characteristics of these recommender systems. Part 3 provides a step-by-step case study on the technical aspects of deploying and evaluating a real-world LBSN, which provides location, activity and friend recommendations. The material covered in the book is intended for graduate students, teachers, researchers, and practitioners in the areas of web data mining, information retrieval, and machine learning.
Chapter
Location-based Social Networks (LBSNs) can be considered as a special Online Social Network (OSN) category. Actually, an LBSN has the same OSN’s properties, but considers location as the core object of its structure. This chapter initially provides some definitions and basic services that are offered by LBSNs, a brief literature review, and two commercial paradigms of LBSNs. Additionally, a few location-based research projects are presented. Moreover, there is an economic and social report regarding LBSNs, which aims to investigate the field under a different, more market oriented prism. The last section provides an example of how a recommender system can benefit an LBSN.
Article
Twitter, as a form of social media, is fast emerging in recent years. Users are using Twitter to report real-life events. This paper focuses on detecting those events by analyzing the text stream in Twitter. Although event detection has long been a research topic, the characteristics of Twitter make it a non-trivial task. Tweets reporting such events are usually overwhelmed by high flood of meaningless "babbles". Moreover, event detection algorithm needs to be scalable given the sheer amount of tweets. This paper attempts to tackle these challenges with EDCoW (Event Detection with Clustering of Wavelet-based Signals). EDCoW builds signals for individual words by applying wavelet analysis on the frequency-based raw signals of the words. It then filters away the trivial words by looking at their corresponding signal auto-correlations. The remaining words are then clustered to form events with a modularity-based graph partitioning technique. Experimental studies show promising result of EDCoW. We also present the design of a proofof- concept system, which was used to analyze netizens' online discussion about Singapore General Election 2011.