ArticlePDF Available

Geo-Information Monitoring and Assessing Post-Disaster Tourism Recovery Using Geotagged Social Media Data

Authors:

Abstract and Figures

Tourism is one of the most economically important industries. It is, however, vulnerable to disaster events. Geotagged social media data, as one of the forms of volunteered geographic information (VGI), has been widely explored to support the prevention, preparation, and response phases of disaster management, while little effort has been put on the recovery phase. This study develops a scientific workflow and methods to monitor and assess post-disaster tourism recovery using geotagged Flickr photos, which involve a viewshed based data quality enhancement, a space-time bin based quantitative photo analysis, and a crowdsourcing based qualitative photo analysis. The developed workflow and methods have also been demonstrated in this paper through a case study conducted for the Philippines where a magnitude 7.2 earthquake (Bohol earthquake) and a super typhoon (Haiyan) occurred successively in October and November 2013. In the case study, we discovered spatiotemporal knowledge about the post-disaster tourism recovery, including the recovery statuses and trends, and the photos visually showing unfixed damages. The findings contribute to a better tourism rehabilitation of the study area.
Content may be subject to copyright.
International Journal of
Geo-Information
Article
Monitoring and Assessing Post-Disaster Tourism
Recovery Using Geotagged Social Media Data
Yingwei Yan *, Melanie Eckle, Chiao-Ling Kuo, Benjamin Herfort, Hongchao Fan and
Alexander Zipf
Institute of Geography, Heidelberg University, 69120 Heidelberg, Germany; eckle@uni-heidelberg.de (M.E.);
kuo@chiaoling.com (C.-L.K.); herfort@uni-heidelberg.de (B.H.); hongchao.fan@uni-heidelberg.de (H.F.);
zipf@uni-heidelberg.de (A.Z.)
*Correspondence: Yingwei.Yan@uni-heidelberg.de or yanyingwei@u.nus.edu; Tel.: +49-6221-54-5509
Academic Editors: Marguerite Madden and Wolfgang Kainz
Received: 27 January 2017; Accepted: 27 April 2017; Published: 3 May 2017
Abstract:
Tourism is one of the most economically important industries. It is, however, vulnerable
to disaster events. Geotagged social media data, as one of the forms of volunteered geographic
information (VGI), has been widely explored to support the prevention, preparation, and response
phases of disaster management, while little effort has been put on the recovery phase. This
study develops a scientific workflow and methods to monitor and assess post-disaster tourism
recovery using geotagged Flickr photos, which involve a viewshed based data quality enhancement,
a space-time bin based quantitative photo analysis, and a crowdsourcing based qualitative photo
analysis. The developed workflow and methods have also been demonstrated in this paper through
a case study conducted for the Philippines where a magnitude 7.2 earthquake (Bohol earthquake)
and a super typhoon (Haiyan) occurred successively in October and November 2013. In the case
study, we discovered spatiotemporal knowledge about the post-disaster tourism recovery, including
the recovery statuses and trends, and the photos visually showing unfixed damages. The findings
contribute to a better tourism rehabilitation of the study area.
Keywords:
tourism; post-disaster recovery; geotagged social media data; Flickr; volunteered
geographic information (VGI); data quality; space-time bin; crowdsourcing
1. Introduction
In recent years, geotagged social media data, as one of the forms of volunteered geographic
information (VGI) [
1
], has been put forward to the research frontier with regard to disaster
management [
2
]. Such data are of the merits of being rich in data coverage and volume, cost-effective,
and timely, which has been widely explored for the prevention, preparation, and response phases
of the disaster management cycle [
3
,
4
]. However, it has rarely been explored for the recovery phase.
This study explores social media data to assist post-disaster tourism recovery. It develops a scientific
workflow and methods to monitor and assess post-disaster tourism recovery using geotagged Flickr
photos, which involve a viewshed based data quality enhancement, a space-time bin based quantitative
photo analysis, and a crowdsourcing based qualitative photo analysis for revealing recovery statuses
and trends. The developed workflow and methods have also been demonstrated in this paper through
a case study conducted for the Philippines where a magnitude 7.2 earthquake (Bohol earthquake)
and a super typhoon (Haiyan) occurred successively in October and November 2013. Spatiotemporal
characteristics of the tourism recovery in the study area have been discovered.
Insights into utilizing the social media data to support tourism rehabilitation contribute to the
decision-making for rebuilding the attractiveness of an affected tourist destination. They also contribute
to consumer confidence rebooting, which is highly important to the health or even survival of the
ISPRS Int. J. Geo-Inf. 2017,6, 144; doi:10.3390/ijgi6050144 www.mdpi.com/journal/ijgi
ISPRS Int. J. Geo-Inf. 2017,6, 144 2 of 17
tourism industry. Indeed, tourism is one of the most economically important industries over the world;
it is, however, vulnerable to disaster events [
5
7
]. In general, the tangible or physical damages to a
destination (e.g., destroyed infrastructure) can be restored over time. The greater challenge, however,
rests in recovering the tarnished image and reputation of a destination, which otherwise may be
destroyed permanently [
8
]. This is due to the fact that the attractiveness of a tourist destination highly
depends on tourists’ perceptions with regard to its image and reputation [
6
,
9
]. Consequently, tourism
is often not able to recover as fast as other businesses do.
Realizing post-disaster recoveries oftentimes require considerable financial, manpower, and
intellectual investments [
5
,
10
], during the processes of which monitoring and assessing the recoveries
demand special attention [
11
]. In general, this is due to the limited resources that can be invested [
11
].
For tourism in particular, distant tourists may be hesitant to visit affected destinations even if
the destinations have been physically fully recovered, because they lack credible and up-to-date
information regarding the recovery statuses [
12
]. To monitor and assess post-disaster recovery,
traditional strategies include but are not limited to ground survey and observation, social audit (key
informant interview, focus group), household survey, and satellite imagery analysis [
11
]. However, on
the one hand, approaches like survey and social audit are labor-intensive, costly, and time-consuming,
especially when a constant flow of information is needed from a large region (some affected areas
are often inaccessible and insecure) [
11
]. On the other hand, approaches like remote sensing fail
to capture tourist quantities, which only reveal physical reconstruction statuses (e.g., building and
road reconstructions). In addition, images with high spatial and temporal resolutions normally
are expensive.
Alternatively, social media data has the potential to compensate for these shortcomings. Of the
three sources of social media feeds (i.e., Twitter, Flickr, Facebook) that are discussed frequently in
the GIScience community, Flickr (a photo-based social network) offers the highest accessibility.
It is
possible to access the full stream of Flickr photos through its application programming interface
(API) [
13
]. Twitter’s API returns a small sample of the full stream (with a size of one percent), which
requires a persistent connection to its server, so that obtaining long-term historical Twitter data is
less possible [
14
]. Facebook is even more restricted on obtaining its users’ data contributions [
15
].
Additionally, monitoring and assessing the recovery of a tourist destination is interested in people
being in the field providing eye-witness information, which allows qualitative interpretations of the
captured and surrounding environments. Therefore, Flickr is adopted in this study.
The remainder of this article is organized as follows. Section 2briefly reviews some related work.
Section 3presents the workflow and methods. Section 4presents the case study. Section 5discusses
our research outcomes and the related future work. Lastly, Section 6concludes this article.
2. Related Work
2.1. Geotagged Social Media Data for Disaster Management
Geotagged social media data, as one of the various sources of VGI, has been widely explored in
various application domains, such as people’s travel patterns or human mobility [
16
,
17
], socioeconomic
patterns [
18
], and road map inference [
19
]. The explorations imply the great potential of social media
data in answering scientific inquires.
Particularly, an effective management of natural disaster events relies on the immediacy and
currency of geospatial information collected as inputs for relevant agency decision-making, which
stresses the value of VGI in all phases of the disaster management cycle (i.e., prevention, preparation,
response, and recovery) [
4
]. Geotagged social media data has therefore been explored to support
the management of both those highly harmful ones (e.g., floods, earthquakes, fires, tsunamis, and
typhoons), and those relatively less attended ones such as landslides. For example, De Albuquerque et
al. [
20
] investigated flood-related Tweets in Germany for the sake of both response and preventive
monitoring. Zook et al. [
21
] discussed how Twitter assisted the aftermath rescue of 2010 Haitian
ISPRS Int. J. Geo-Inf. 2017,6, 144 3 of 17
earthquake. Flickr photos were also adopted as an alternative to official sources to inform the public
during the 2007 Zaca wildfire in Southern California as described by Goodchild and Glennon [
2
].
Regarding tsunamis, Peary et al. [
22
] discussed the potential of utilizing social media in the related
preparedness and response by public, civil society, and government organizations in Japan. Takahashi
et al. [
23
] explored how people communicated using Twitter during the immediate aftermath of
typhoon Haiyan, which pummeled the Philippines in 2013. In addition, Pennington et al. [
24
]
explored the use of social media for data acquisition for the British Geological Survey National
Landslide Database.
These explorations to a great extent represent the belief that disaster management can benefit
from the collection and analysis of social media feeds. Despite these studies, they mostly focus on the
prevention, preparation, and response phases of disaster management. The research community has
rarely explored social media data for the recovery phase.
2.2. Traditional Approaches for Monitoring and Assessing Post-Disaster Recovery
Post-disaster recovery can be defined as bringing the conditions of affected areas back to a certain
level of acceptability through the development and implementation of strategies and actions for
rectifying damages [
5
]. The recovery may immediately follow the occurrence of a disaster or start from
a time when the destination is able to receive assistance. The Federal Emergency Management Agency
in the United States manages the aftermath of a disaster with three stages: (1) Emergency response
(twenty-four hours to two or three weeks); (2) Relief (a week to half a year); and (3) Recovery (several
weeks to ten years) [25].
Researchers have adopted a variety of methods to monitor and assess post-disaster recovery.
For example, Hoshi et al. [
26
] monitored and assessed the Pisco urban recovery after the 2007 Peru
earthquake using satellite images. Brown et al. [
27
] investigated the recovery status after the 2008
Wenchuan earthquake in China based on both satellite images and field surveys (capturing detailed
georeferenced records of the recovery through photographs, video, and observations). Resident
interviews have also been used as a method to monitor and assess the recovery of Punta Gorda (Florida,
the United States) after hurricane Charley in 2004 [
28
]. McCarthy and Hanson [
29
] compiled and
analyzed authoritative building permit data, census, and damage assessment data for three counties
hit by the 2005 hurricane Katrina in the United States to describe the degree of their housing recovery.
Moreover, Platt, Brown and Hughes [
11
] discussed a range of approaches (e.g., satellite imagery
analyses, field surveys and observations, social audits, household surveys, official publications and
statistics, outsourced data, and insurance data) to examine the recovery of Ban Nam Khem, Thailand
from the 2004 Indian Ocean tsunami and Muzaffarabad, Pakistan from the 2005 Kashmir earthquake.
However, approaches like survey, social audit, and interview are labor-intensive, costly, and
time-consuming, especially when a constant flow of information is needed from a large region
(some affected areas are even difficult and insecure to access after a disaster) [
11
]. In addition,
most published information is unavailable on a small scale [
11
]. Approaches like remote sensing fail
to capture tourist quantities, which only reveal physical reconstruction statuses (e.g., building and
road reconstructions). Satellite images with high spatial and temporal resolutions are also expensive.
Although the International Charter makes some satellite data available to support disaster responses
at no cost, it does not offer the data to support post-disaster recovery [
30
]. On the contrary, “on the
ground” citizens may provide timely, cheaper, and in-suit information.
3. Monitoring and Assessing Post-Disaster Tourism Recovery Using Geotagged Flickr Photos
Given the limitations of the traditional approaches for monitoring and assessing post-disaster
recovery, and the lack of attention to utilizing social media in this field, we aimed to contribute to the
filling of such research gaps. In this section, we present a workflow and methods to utilize geotagged
Flickr photos for monitoring and assessing post-disaster tourism recovery, which are outlined in
Figure 1. There are three steps after the retrieval of Flickr photos and their metadata through Flickr
ISPRS Int. J. Geo-Inf. 2017,6, 144 4 of 17
API (https://www.flickr.com/services/api/). Step one is quality enhancement, which is a preparatory
step for the subsequent steps. VGI’s heterogeneity, diversity, lack of adherence to standards required
in the creation of conventional authoritative data (e.g., government generated data), and lack of data
descriptions for determining its fitness for a particular purpose [
31
], result in the importance of VGI
quality enhancement before using them for any further analysis. Specifically, we incorporate the
enhancement of locational accuracy and thematic accuracy (i.e., discriminating tourist photos from
non-tourist photos) into the workflow. Step two is quantitative tourist photo analysis. This step
performs space-time analysis of the tourist photos that resulted from step one. The last step is to
qualitatively investigate the visual contents of all available photos (regardless of whether a photo is a
tourist photo or not) that resulted from step one, in order to dig up those showing the reconstruction
status of an affected tourist destination. The following sub-sections will describe the workflow and
methods in detail.
ISPRS Int. J. Geo-Inf. 2017, 6, 144 4 of 17
descriptions for determining its fitness for a particular purpose [31], result in the importance of VGI
quality enhancement before using them for any further analysis. Specifically, we incorporate the
enhancement of locational accuracy and thematic accuracy (i.e., discriminating tourist photos from
non-tourist photos) into the workflow. Step two is quantitative tourist photo analysis. This step
performs space-time analysis of the tourist photos that resulted from step one. The last step is to
qualitatively investigate the visual contents of all available photos (regardless of whether a photo is
a tourist photo or not) that resulted from step one, in order to dig up those showing the reconstruction
status of an affected tourist destination. The following sub-sections will describe the workflow and
methods in detail.
Figure 1. The outline of the developed workflow and methods for monitoring and assessing post-
disaster tourism recovery using geotagged Flickr photos.
3.1. Quality Enhancement
The tourist sites of a study area can be extracted from OpenStreetMap, namely, retrieving the
features with tags related to tourism. For OpenStreetMap, being a VGI project, its data quality is
expected to be high in general in regions with an active mapping community [32]. With the tourist
sites obtained from OpenStreetMap, we can conduct a viewshed analysis using a digital elevation
model covering the study area [13]. Based on terrain characteristics (e.g., whether there exist
mountain barriers), a viewshed analysis informs us where people can see at least one of the tourist
sites of the study area. Such an analysis can be conducted using the Viewshed tool of ArcGIS (ESRI
Products, Redlands, CA, USA). Therefore, on the one hand, the viewshed analysis enables us to
remove those photos that are probably incorrectly positioned (i.e., those tourist photos located at
positions from where no tourist site can actually be seen); and, on the other hand, enables us to
enhance the thematic relevancy of the photos to tourism (i.e., only retaining photos within the
viewshed area of tourist sites). Additionally, the map zoom-in levels for positioning Flickr photos
after being taken by users vary between 1 and 16 (world level is 1, country 2–3, region 4–6, city 7–11,
street 12–16). Working only on those street-level photos ensures the highest locational accuracy.
Second, to further ensure the relevancy of retrieved Flickr photos to tourism, we classify the photos
into tourist versus non-tourist ones based on user profiles (i.e., determining whether a user is a tourist
or not based on where he or she resides). Lastly, before conducting quantitative photo analysis,
Figure 1.
The outline of the developed workflow and methods for monitoring and assessing
post-disaster tourism recovery using geotagged Flickr photos.
3.1. Quality Enhancement
The tourist sites of a study area can be extracted from OpenStreetMap, namely, retrieving the
features with tags related to tourism. For OpenStreetMap, being a VGI project, its data quality is
expected to be high in general in regions with an active mapping community [
32
]. With the tourist
sites obtained from OpenStreetMap, we can conduct a viewshed analysis using a digital elevation
model covering the study area [
13
]. Based on terrain characteristics (e.g., whether there exist mountain
barriers), a viewshed analysis informs us where people can see at least one of the tourist sites of the
study area. Such an analysis can be conducted using the Viewshed tool of ArcGIS (ESRI Products,
Redlands, CA, USA). Therefore, on the one hand, the viewshed analysis enables us to remove those
photos that are probably incorrectly positioned (i.e., those tourist photos located at positions from
where no tourist site can actually be seen); and, on the other hand, enables us to enhance the thematic
relevancy of the photos to tourism (i.e., only retaining photos within the viewshed area of tourist
sites). Additionally, the map zoom-in levels for positioning Flickr photos after being taken by users
ISPRS Int. J. Geo-Inf. 2017,6, 144 5 of 17
vary between 1 and 16 (world level is 1, country 2–3, region 4–6, city 7–11, street 12–16). Working
only on those street-level photos ensures the highest locational accuracy. Second, to further ensure the
relevancy of retrieved Flickr photos to tourism, we classify the photos into tourist versus non-tourist
ones based on user profiles (i.e., determining whether a user is a tourist or not based on where he or
she resides). Lastly, before conducting quantitative photo analysis, further data cleansing is necessary
depending on the characteristics of retrieved Flickr dataset. For example, some extremely active
users may contribute many more photos than other users do. The photos from such users should be
treated with caution, for avoiding possible bias towards such users (see our case study as follows as
an example).
3.2. Quantitative Tourist Photo Analysis
The second analytic step is to quantitatively analyze the tourist photos that resulted from the
abovementioned quality enhancement step, which enables the derivation of photo flow patterns as
an indicator for tourism recovery status. Post-disaster recovery has been shown to vary over space
and time due to socioeconomic and political factors and a variety of decisions that have been made
throughout the disaster management cycle [
25
]. Therefore, speaking of monitoring and assessment of a
post-disaster tourism recovery, the spatiotemporal patterns of the recovery are perhaps the top concern
of stakeholders. We put forward a space-time bin method (adapted from [
33
]) to analyze tourist photos
in both spatial and temporal dimensions. As illustrated in Figure 2, a study area is first divided into
a certain number of tiles (cells). Note that the determination of the tile size is context-dependent
(see how we determined the tile size in our case study below as an example). A bin is formed when
temporal dimension has been incorporated into a tile as the height of the bin. Each bin aggregates the
data from a single tile location and a single time period. The bins within the same time period form a
bin slice, and the bins overlapping on a single tile location form a bin time series. Bin slices and bin
time series can extend infinitely before reaching the boundary of the study area and the time needed
for recovery, respectively. The advantage of using the space-time bins to encapsulate data is that it
allows us to easily perform three-dimensional comparisons among data. For instance, to examine the
spatiotemporal patterns of tourist photo flows, two types of analysis are considered in our work: (1)
similarity analysis; and (2) post-disaster trend analysis.
ISPRS Int. J. Geo-Inf. 2017, 6, 144 5 of 17
further data cleansing is necessary depending on the characteristics of retrieved Flickr dataset. For
example, some extremely active users may contribute many more photos than other users do. The
photos from such users should be treated with caution, for avoiding possible bias towards such users
(see our case study as follows as an example).
3.2. Quantitative Tourist Photo Analysis
The second analytic step is to quantitatively analyze the tourist photos that resulted from the
abovementioned quality enhancement step, which enables the derivation of photo flow patterns as
an indicator for tourism recovery status. Post-disaster recovery has been shown to vary over space
and time due to socioeconomic and political factors and a variety of decisions that have been made
throughout the disaster management cycle [25]. Therefore, speaking of monitoring and assessment
of a post-disaster tourism recovery, the spatiotemporal patterns of the recovery are perhaps the top
concern of stakeholders. We put forward a space-time bin method (adapted from [33]) to analyze
tourist photos in both spatial and temporal dimensions. As illustrated in Figure 2, a study area is first
divided into a certain number of tiles (cells). Note that the determination of the tile size is context-
dependent (see how we determined the tile size in our case study below as an example). A bin is
formed when temporal dimension has been incorporated into a tile as the height of the bin. Each bin
aggregates the data from a single tile location and a single time period. The bins within the same time
period form a bin slice, and the bins overlapping on a single tile location form a bin time series. Bin
slices and bin time series can extend infinitely before reaching the boundary of the study area and
the time needed for recovery, respectively. The advantage of using the space-time bins to encapsulate
data is that it allows us to easily perform three-dimensional comparisons among data. For instance,
to examine the spatiotemporal patterns of tourist photo flows, two types of analysis are considered
in our work: (1) similarity analysis; and (2) post-disaster trend analysis.
Figure 2. Space-time bin for data aggregation (adapted from [33]).
Similarity analysis is concerned with the similarity between a post-disaster bin or bin time series
(with data) and its pre-disaster counterpart. For example, assuming that we have bins that aggregate
tourist photos yearly, one may simply examine to what degree the total number of photos that
contributed in a post-disaster year is close to that in the year before the disaster (as an indicator for
the degree to which a post-disaster tourism is similar to its pre-disaster state). However, such an
analysis is probably only suited to study areas of which the tourism does not depend on seasonality.
For areas with seasonal changes that affect the tourism, the tourism recoveries may vary over
different seasons. It leads to bias towards those leading seasons in the recoveries if we simply
compare the yearly sums. One better way of conducting the similarity analysis is through a month-
based analysis. Assuming that we have bins that aggregate tourist photos monthly, such an analysis
can be conducted that measures the similarity between the monthly photo quantities for a year after
a disaster and the corresponding monthly photo quantities for the year before the disaster. To do that,
the modified Jaccard similarity index which ranges from zero to one (greater value represents higher
Figure 2. Space-time bin for data aggregation (adapted from [33]).
Similarity analysis is concerned with the similarity between a post-disaster bin or bin time
series (with data) and its pre-disaster counterpart. For example, assuming that we have bins that
aggregate tourist photos yearly, one may simply examine to what degree the total number of photos
that contributed in a post-disaster year is close to that in the year before the disaster (as an indicator
for the degree to which a post-disaster tourism is similar to its pre-disaster state). However, such an
analysis is probably only suited to study areas of which the tourism does not depend on seasonality.
ISPRS Int. J. Geo-Inf. 2017,6, 144 6 of 17
For areas with seasonal changes that affect the tourism, the tourism recoveries may vary over different
seasons. It leads to bias towards those leading seasons in the recoveries if we simply compare the
yearly sums. One better way of conducting the similarity analysis is through a month-based analysis.
Assuming that we have bins that aggregate tourist photos monthly, such an analysis can be conducted
that measures the similarity between the monthly photo quantities for a year after a disaster and the
corresponding monthly photo quantities for the year before the disaster. To do that, the modified
Jaccard similarity index which ranges from zero to one (greater value represents higher similarity) [
34
]
is adopted in our work. In the case of month-based photo quantity analysis, the computation of
this index takes not only the total numbers of photos contributed in two comparative years but also
the total number of photos contributed during individual months of the two comparative years into
account (see our case study as an example). The index is expressed using Equation (1):
Ji=
12
j=1minai,j,bi,j
12
j=1ai,j+12
j=1bi,j12
j=1minai,j,bi,j, (1)
where
ai,j
is the total number of tourist photos contributed during month
j
(e.g., January, February) at
tile
i
in a post-disaster year;
bi,j
is the total number of tourist photos contributed during month
j
at tile
iin a pre-disaster year.
In addition, post-disaster trend analysis examines whether the quantities of user contributions
within a post-disaster bin time series is with an upward trend, downward trend, or flat trend, as an
indicator for tourism recovery trend (in recovery, or recession, or stagnation). For example, assuming
that we have bins that aggregate tourist photos monthly, such an analysis can be conducted that
investigates the trend of the monthly photo quantities for a number of successive months (e.g.,
24 successive months) after a disaster. However, it is also necessary to take seasonality into account for
tourism that highly depends on seasonal variations. In such cases, we can group the monthly bins
from individual post-disaster years based on seasonality, so as to obtain seasonal bin time series (e.g.,
summer bin time series). Subsequently, we can combine the seasonal bin time series from several
successive post-disaster years, and analyze the trend of the monthly photo quantities within the
combined bin time series (see our case study as an example). The Mann–Kendall trend test [
35
]
describes the trend in a series of values, which is therefore adopted for the trend analysis.
3.3. Qualitative Photo Analysis
Not only making sense of the quantity of tourist photos may generate insights into the recovery
status of a disaster-affected tourism, but also qualitatively examining the visual contents of both tourist
and non-tourist photos can be informative for decision-makers. This practice has already been proved
to be of great potential for generating situational awareness in disaster responses [
36
]. In the recovery
phase, we may be able to dig up photos showing, for example, the reconstruction status of an affected
tourist destination (e.g., detecting unfixed damage that may affect the tourism), and inform tourists
or disaster managers for decision-making. Image content recognition could be a tedious task, as the
quantity of photos retrieved from a disaster-affected area can be huge. Therefore, we incorporate a
crowdsourcing-based image content recognition process into the workflow, which can be readily set
up with off-the-shelf web-based crowdsourcing platforms (e.g., pybossa, http://pybossa.com/). This
method has also been used in [36].
4. Case Study
4.1. Study Area
The study area is the central Philippines islands region (Figure 3), which is a popular tourist
destination with culture and religious heritage, a diversity of plants, animals, geological wonders,
beaches, diving sites, and resorts. Regarding seasonal characteristics, the study area is located in the
ISPRS Int. J. Geo-Inf. 2017,6, 144 7 of 17
tropical zone, of which the seasons are not characterized by spring, summer, autumn, and winter.
It, however, has dry seasons and wet seasons that result in the high and low seasons of its tourism.
According to [
37
], the yearly high season of the tourism in the study area ranges from December to
April because of the cooler and pleasant weather, while the low season ranges from May to November
due to the hot temperatures and high rainfall amounts, and even extreme weather such as torrential
downpours and typhoons.
ISPRS Int. J. Geo-Inf. 2017, 6, 144 7 of 17
landscapes. The tourism industry of the area has therefore been strongly affected, making it an ideal
study area for us to monitor and assess its post-disaster tourism recovery. According to news media,
slow progress of government response to the post-disaster recovery has been reported even after one
year since the disasters [40]. Many tourists were hesitant to visit the islands, due to their perception
that the islands were not ready for tourists [41]. Although full recovery is still underway, there could
be leading areas that have recovered first and have geared up to accommodate guests [41]. Therefore,
this case study utilizes the above introduced workflow and methods to monitor and assess the post-
disaster tourism recovery of the study area by making sense of Flickr photos.
Figure 3. Study area. The central Philippines islands region.
4.2. Data Collection and Pre-Processing
Following the access policies and regulations of Flickr API, we developed a tool based on a PHP
script to collect Flickr photos and their metadata. The tool retrieves Flickr data by scanning the study
area using a 0.5 degree by 0.5 degree moving window, starting from the upper left corner. Since the
Flickr API allows for accessing a maximum of 4000 photos in a single API query execution, a window
is subdivided into four equal-sized sub-windows in case more than 4000 photos are contained within
that window. This subdivision is recursively performed until no API query returns more than 4000
photos. For the case study, using the tool, we collected 71,329 geo-tagged (WGS 84 bounding box:
123.220, 9.371, 124.696, 11.604) time-stamped (ranging from 1 April 2004 to 6 July 2016) Flickr photos
contributed by 3790 users (Figure 4). Although the disasters occurred in 2013, we collected all
historical photos dating back to the startup of Flickr, which were beneficial for the tourist versus non-
tourist photo classification (see the classification details below).
With the collected photos, a viewshed analysis was first conducted using the Viewshed tool of
ArcGIS 10.4.1 with tourist sites locations extracted from OpenStreetMap and ASTER GDEM 2 data
Figure 3. Study area. The central Philippines islands region.
Unfortunately, the area was devastated by a magnitude 7.2 earthquake (Bohol earthquake) and
a super typhoon (Haiyan) on 15 October 2013 and 8 November 2013, respectively [
38
,
39
]. The twin
disasters led to extreme loss of life and widespread damage to the infrastructure and natural landscapes.
The tourism industry of the area has therefore been strongly affected, making it an ideal study area for
us to monitor and assess its post-disaster tourism recovery. According to news media, slow progress
of government response to the post-disaster recovery has been reported even after one year since the
disasters [
40
]. Many tourists were hesitant to visit the islands, due to their perception that the islands
were not ready for tourists [
41
]. Although full recovery is still underway, there could be leading areas
that have recovered first and have geared up to accommodate guests [
41
]. Therefore, this case study
utilizes the above introduced workflow and methods to monitor and assess the post-disaster tourism
recovery of the study area by making sense of Flickr photos.
4.2. Data Collection and Pre-Processing
Following the access policies and regulations of Flickr API, we developed a tool based on a PHP
script to collect Flickr photos and their metadata. The tool retrieves Flickr data by scanning the study
area using a 0.5 degree by 0.5 degree moving window, starting from the upper left corner. Since the
ISPRS Int. J. Geo-Inf. 2017,6, 144 8 of 17
Flickr API allows for accessing a maximum of 4000 photos in a single API query execution, a window
is subdivided into four equal-sized sub-windows in case more than 4000 photos are contained within
that window. This subdivision is recursively performed until no API query returns more than 4000
photos. For the case study, using the tool, we collected 71,329 geo-tagged (WGS 84 bounding box:
123.220, 9.371, 124.696, 11.604) time-stamped (ranging from 1 April 2004 to 6 July 2016) Flickr photos
contributed by 3790 users (Figure 4). Although the disasters occurred in 2013, we collected all historical
photos dating back to the startup of Flickr, which were beneficial for the tourist versus non-tourist
photo classification (see the classification details below).
ISPRS Int. J. Geo-Inf. 2017, 6, 144 8 of 17
[42] obtained from USGS’s earth explorer [43]. We extracted all features (1477 point features, one line
feature, and 573 polygon features) with tourism, historic sites, and places of worship related tags
from OpenStreetMap as the tourist sites. The point features and the mean centers of the polygon and
polyline features were used as the inputs of the viewshed analysis. OpenStreetMap mapping in the
Philippines has been ongoing since 2006, which has been constantly developing thanks to a growing
Filipino mapping community continuing to add to, edit, and validate the data [44]. Moreover, the
study area has been covered in the mapping tasks of the Humanitarian OpenStreetMap Team after
typhoon Haiyan in 2013, and the related map products have already been used on the ground by aid
agencies such as the American Red Cross [45]. This high interest in the Mapping leads to the
assumption that the OpenStreetMap data in the study area can be expected to be of a good level of
quality. Regarding ASTER GDEM 2 (30 m by 30 m cell size), this digital elevation model data includes
building and canopy heights, rather than the “bare Earth”, which is necessary in the context of this
study as users may climb to the top of buildings to take photos. The viewshed area of the tourist sites
is shown in Figure 4b in light brown color, and photos located outside the light brown colored area
were removed. After the viewshed analysis, 49,608 photos were retained, of which we further
removed 13,242 non-street-level photos (36,365 photos left).
Figure 4. (a) The spatial distribution of the initially retrieved Flickr photos and the tourist sites; (b)
the viewshed area of the tourist sites, from where people can see at least one of the tourist sites.
The next step was to classify the remaining 36,365 photos into tourist and non-tourist photos
based on where the users reside. A user living outside the study area was considered as a tourist.
Note that approximately 30% of the 36,365 photos lacked information about where the contributors
live, which were manually classified based on the photos’ visual contents. That is, a user was
classified as a tourist if most of the photos contributed by this user could be visually recognized
related to tourism (e.g., sightseeing), and then all the photos from the user were classified as tourist
photos. Indeed, it was observed that most of the photos contributed by non-tourists (locals) were
about the activities of their daily lives (e.g., parties, friend or family gatherings). This was also the
reason we collected all historical photos through Flickr’s API, the more the photos from a user were
available for the tourist versus non-tourist recognition, the easier the recognition would be. The
Figure 4.
(
a
) The spatial distribution of the initially retrieved Flickr photos and the tourist sites; (
b
) the
viewshed area of the tourist sites, from where people can see at least one of the tourist sites.
With the collected photos, a viewshed analysis was first conducted using the Viewshed tool of
ArcGIS 10.4.1 with tourist sites locations extracted from OpenStreetMap and ASTER GDEM 2 data [
42
]
obtained from USGS’s earth explorer [
43
]. We extracted all features (1477 point features, one line
feature, and 573 polygon features) with tourism, historic sites, and places of worship related tags
from OpenStreetMap as the tourist sites. The point features and the mean centers of the polygon and
polyline features were used as the inputs of the viewshed analysis. OpenStreetMap mapping in the
Philippines has been ongoing since 2006, which has been constantly developing thanks to a growing
Filipino mapping community continuing to add to, edit, and validate the data [
44
]. Moreover, the
study area has been covered in the mapping tasks of the Humanitarian OpenStreetMap Team after
typhoon Haiyan in 2013, and the related map products have already been used on the ground by
aid agencies such as the American Red Cross [
45
]. This high interest in the Mapping leads to the
assumption that the OpenStreetMap data in the study area can be expected to be of a good level of
quality. Regarding ASTER GDEM 2 (30 m by 30 m cell size), this digital elevation model data includes
building and canopy heights, rather than the “bare Earth”, which is necessary in the context of this
study as users may climb to the top of buildings to take photos. The viewshed area of the tourist sites
is shown in Figure 4b in light brown color, and photos located outside the light brown colored area
ISPRS Int. J. Geo-Inf. 2017,6, 144 9 of 17
were removed. After the viewshed analysis, 49,608 photos were retained, of which we further removed
13,242 non-street-level photos (36,365 photos left).
The next step was to classify the remaining 36,365 photos into tourist and non-tourist photos
based on where the users reside. A user living outside the study area was considered as a tourist. Note
that approximately 30% of the 36,365 photos lacked information about where the contributors live,
which were manually classified based on the photos’ visual contents. That is, a user was classified as a
tourist if most of the photos contributed by this user could be visually recognized related to tourism
(e.g., sightseeing), and then all the photos from the user were classified as tourist photos. Indeed, it
was observed that most of the photos contributed by non-tourists (locals) were about the activities of
their daily lives (e.g., parties, friend or family gatherings). This was also the reason we collected all
historical photos through Flickr’s API, the more the photos from a user were available for the tourist
versus non-tourist recognition, the easier the recognition would be. The classification results showed
71.2% tourist photos, 27.4% non-tourist photos, and 1.3% unclassified photos that were the ambiguous
ones (e.g., some users only uploaded photos about food, who also did not indicate where they live).
The tourist photos resulting from the viewshed analysis and photo classification are shown in
Figure 5. It can be seen that the resulting tourist photos highly associate with the tourist sites (Figure 5),
compared with those initial photos (Figure 4a). Those initially retrieved photos scatter around the
study area, many of which are even in the middle of the ocean represented as the white area in
Figure 4a. Lastly, we noticed that 10.5% of the resulting tourist photos were contributed by one single
tourist from Switzerland (extremely active compared with usual users). We removed all of the photos
from this user before we conducted the quantitative photo analysis (to be introduced in the next
section), in order to avoid analytic bias towards this user.
ISPRS Int. J. Geo-Inf. 2017, 6, 144 9 of 17
classification results showed 71.2% tourist photos, 27.4% non-tourist photos, and 1.3% unclassified
photos that were the ambiguous ones (e.g., some users only uploaded photos about food, who also
did not indicate where they live).
The tourist photos resulting from the viewshed analysis and photo classification are shown in
Figure 5. It can be seen that the resulting tourist photos highly associate with the tourist sites (Figure
5), compared with those initial photos (Figure 4a). Those initially retrieved photos scatter around the
study area, many of which are even in the middle of the ocean represented as the white area in Figure
4a. Lastly, we noticed that 10.5% of the resulting tourist photos were contributed by one single tourist
from Switzerland (extremely active compared with usual users). We removed all of the photos from
this user before we conducted the quantitative photo analysis (to be introduced in the next section),
in order to avoid analytic bias towards this user.
Figure 5. The tourist photos resulting from the viewshed analysis and photo classification.
4.3. Photo Analysis
With the tourist photos resulting from the pre-processing, quantitative similarity analysis and
post-disaster trend analysis were conducted based on the space-time bin method introduced above.
A tile size of 14,465.3 m was adopted here, which was determined with the Calculate Distance Band
from Neighbor Count tool of ArcGIS 10.4.1. In our case, given the tourist sites as the input feature
class, we used the tool to calculate the distance for each tourist site to its nearest neighboring tourist
site. Among the distances generated, the maximum one was used as the tile size, which ensured that
every feature in the input feature class had at least one neighbor.
The similarity analysis was conducted first. Since the first year after the disasters was the
transition from the response phase to the recovery phase of the disaster management [46], we
compared the similarity between the bin time series for the second year after the twin disasters (from
Figure 5. The tourist photos resulting from the viewshed analysis and photo classification.
ISPRS Int. J. Geo-Inf. 2017,6, 144 10 of 17
4.3. Photo Analysis
With the tourist photos resulting from the pre-processing, quantitative similarity analysis and
post-disaster trend analysis were conducted based on the space-time bin method introduced above.
A tile size of 14,465.3 m was adopted here, which was determined with the Calculate Distance Band
from Neighbor Count tool of ArcGIS 10.4.1. In our case, given the tourist sites as the input feature
class, we used the tool to calculate the distance for each tourist site to its nearest neighboring tourist
site. Among the distances generated, the maximum one was used as the tile size, which ensured that
every feature in the input feature class had at least one neighbor.
The similarity analysis was conducted first. Since the first year after the disasters was the transition
from the response phase to the recovery phase of the disaster management [
46
], we compared the
similarity between the bin time series for the second year after the twin disasters (from 1 October
2014 to 30 September 2015) and its counterpart for the year before the disasters (1 October 2012 to 30
September 2013). Each bin in the bin time series aggregated tourist photos monthly. The similarity was
investigated using the modified Jaccard similarity index introduced above [
34
], considering that the
tourism of the study area varies over different seasons.
The similarity analysis indicates the degree to which the tourism is similar to its pre-disaster state.
However, it does not indicate the direction of the tourism recovery. A post-disaster trend analysis
answers this question. Still using bins aggregating tourist photos monthly, for the two successive years
after the disasters, we examined the trends of the monthly photo quantities for high seasons and low
seasons of the tourism, separately. Therefore, we had two separate seasonal bin time series for each
tile of the study area. In chronological order, the high season bin time series involved ten months:
December 2013 to April 2014 + December 2014 to April 2015; and the low season bin time series
involved 14 months: October to November 2013 + May to November 2014 + May to September 2015.
After the quantitative analyses, we further qualitatively examined the visual contents of the
photos contributed during the last three months of the second year after the disasters (i.e., July, August,
and September 2015, which were the three most recent bins involved in the quantitative analyses).
Note that the photos we examined in this step involved all available street-level photos resulting from
the viewshed analysis, regardless of whether a photo is a tourist photo or not. In this qualitative photo
analysis, we aimed to identify unfixed damages that may affect the tourism. As mentioned above, a
crowdsourcing-based method can be used. However, as a simplified demonstration, we did the task
manually by ourselves instead of seeking help from the crowd.
4.4. Results and Interpretations
The similarity between the monthly tourist photo quantities of the second year after the disasters
and those of the year before the disasters is shown in Figure 6. Higher similarity values are generally
distributed in those relatively more popular tourist areas, such as Cebu city, Southwest Bohol province,
Bantayan and Malapascua islands in the Northern Cebu (Figure 6). This observation implies that
such areas are relatively more robust after the disasters, which perhaps received more attention to
their restorations. Indeed, not long after the disasters, there was effort to boost the tourism of the
top destinations (e.g., Cebu and Bohol) by ensuring that they remained open for business with their
respective ports of entry still accessible to tourists [
47
]. Additionally, the Philippine Red Cross spent
much effort on constructing core shelters, full houses, and structures in Bohol [
48
]. The regional
director Rowena Lu Montecillo of the Department of Tourism Central Visayas further echoed our
interpretation, who tried to convince tourists to come back by saying that some regions, for instance,
Bantayan and Malapascua islands, have been ready to accommodate tourists since even the middle of
2014, although full recovery was still underway [
41
]. The recovery of Bantayan island even attracted
volunteering support from overseas [49].
ISPRS Int. J. Geo-Inf. 2017,6, 144 11 of 17
ISPRS Int. J. Geo-Inf. 2017, 6, 144 11 of 17
Figure 6. The similarity between the monthly tourist photo quantities of the second year after the
disasters and those of the year before the disasters (only the tiles with photo contributions are shown).
Those areas with low similarity values need more attention to their tourism recoveries anyhow.
While for those areas with relatively higher similarity values, we should not treat their recoveries
lightly. Despite the higher similarity values, their tourism may be still fading out somehow.
Therefore, we looked into 14 such tiles from eight areas (Figure 7) to further investigate their seasonal
tourism recovery trends, the results of which is shown in Figure 8. There is no downward trend
during high seasons for all of the 14 selected tiles (Figure 8), which is a good sign. This is perhaps
due to the cooler and drier weather compensating for the tarnished images and reputations of the
tourism. In addition, the attractive Sinulog-Santo Niño Festival is also celebrated during high seasons
(on January every year). Unfortunately, we observed nine tiles suffering from downward trends,
possibly due to the fact that the unfavorable weather in low seasons worsened the attractiveness of
these areas. Adding the reported slow progress of government response to the post-disaster recovery
after even one year since the disasters [40], tourist confidence might be even lower, and further losses
of tourists could occur. Therefore, the findings imply that more effective tourism recovery tactics are
needed for hotter and wetter seasons, especially for Bantayan island, Cebu city, Malapascua island,
Matalom municipality, Santander municipality, and part of Southwest Bohol province. In addition,
as for the two most popular and largest tourist destinations in the study area, namely Southwest
Bohol province and Cebu city, it appears that Southwest Bohol province has recovered better than
Cebu city (Figure 8). All three of the tiles covering Cebu city faced downward trends, while three out
of four tiles covering Southwest Bohol province had upward trends. This is perhaps due to the
perceptions of people that the infrastructure and tourist attractions of Cebu city have not fully
prepared to accommodate tourists. Instead of man-made attractions, the tourism of Southwest Bohol
province relies relatively more on their natural landscapes or seascapes.
Figure 6.
The similarity between the monthly tourist photo quantities of the second year after the
disasters and those of the year before the disasters (only the tiles with photo contributions are shown).
The similarity map can be useful for both disaster management decision makers and tourists.
The former can decide which areas need more attention for their tourism recoveries, while the latter
can decide which areas can be the priorities for visiting (i.e., following preceding tourists to visit
those areas that appear to be more robust after the disasters). Note that the similarity map could be
qualitatively validated by 8List [
40
] regarding Bohol province. 8List [
40
] states that Bohol’s tourism
has been returning to its pre-disaster state during the two post-disaster years. However, more effort is
needed for the recovery, and there is indeed ongoing effort to bring Bohol back on track [
40
]. In fact,
the restoration of Southwest Bohol province’s infrastructure and historical heritage remains a work in
progress after two years, although some buildings and roads have been recovered [
50
]. This echoes the
recovery status of Bohol illustrated in Figure 6(i.e., the illustrated similarity to the pre-disaster state).
Those areas with low similarity values need more attention to their tourism recoveries anyhow.
While for those areas with relatively higher similarity values, we should not treat their recoveries
lightly. Despite the higher similarity values, their tourism may be still fading out somehow. Therefore,
we looked into 14 such tiles from eight areas (Figure 7) to further investigate their seasonal tourism
recovery trends, the results of which is shown in Figure 8. There is no downward trend during high
seasons for all of the 14 selected tiles (Figure 8), which is a good sign. This is perhaps due to the
cooler and drier weather compensating for the tarnished images and reputations of the tourism. In
addition, the attractive Sinulog-Santo Niño Festival is also celebrated during high seasons (on January
every year). Unfortunately, we observed nine tiles suffering from downward trends, possibly due
to the fact that the unfavorable weather in low seasons worsened the attractiveness of these areas.
Adding the reported slow progress of government response to the post-disaster recovery after even
one year since the disasters [
40
], tourist confidence might be even lower, and further losses of tourists
could occur. Therefore, the findings imply that more effective tourism recovery tactics are needed
for hotter and wetter seasons, especially for Bantayan island, Cebu city, Malapascua island, Matalom
municipality, Santander municipality, and part of Southwest Bohol province. In addition, as for the
ISPRS Int. J. Geo-Inf. 2017,6, 144 12 of 17
two most popular and largest tourist destinations in the study area, namely Southwest Bohol province
and Cebu city, it appears that Southwest Bohol province has recovered better than Cebu city (Figure 8).
All three of the tiles covering Cebu city faced downward trends, while three out of four tiles covering
Southwest Bohol province had upward trends. This is perhaps due to the perceptions of people that
the infrastructure and tourist attractions of Cebu city have not fully prepared to accommodate tourists.
Instead of man-made attractions, the tourism of Southwest Bohol province relies relatively more on
their natural landscapes or seascapes.
Lastly, regarding the visual contents of the photos, we identified two photos showing that the
restoration of the damaged Baclayon Church was still a work in progress. They were uploaded by two
different tourists on 24 July 2015 (Figure 9a) and 26 July 2015 (Figure 9b), respectively, which were
positioned at tile 13 in Figure 7. Indeed, it was reported that this church was devastated by the Bohol
earthquake [
51
], and the tourists’ observations have been validated by Camongol [
50
]. In addition, one
of the two tourists also contributed a photo on 26 July 2015, further showing that the damaged seashore
infrastructure near the church had not been fixed. On the one hand, these photos send a message to
disaster managers about the restoration progress until 26 July 2015. On the other hand, these photos
send an alert to tourists so that they can plan their sightseeing in Bohol more effectively, unless they are
indeed interested in seeing the ruins. Note that we identified only three photos showing the unfixed
damage illustrated in Figure 9. Other than this case study, in more developed countries with more
Flickr users, it is possible that more photos (both tourist and non-tourist photos) showing various
damaged properties such as recreational sites, transportation facilitates, and parks can be detected.
ISPRS Int. J. Geo-Inf. 2017, 6, 144 12 of 17
Figure 7. Fourteen tiles selected from eight areas for post-disaster trend analysis, due to their
relatively higher popularity in the local tourism sector and higher similarity values generated by the
similarity analysis. Identification (ID) number of the tiles are labeled using Arabic numbers. These
eight areas include Bantayan island (tile IDs: 1 and 2), Cebu city (tile IDs: 3–5), Malapascua island (tile
ID: 6), Matalom municipality (tile ID: 7), Moalboal municipality (tile ID: 8), Ormoc city (tile ID 9),
Santander municipality (tile ID: 10), and Southwest Bohol province (tile IDs: 11–14).
Figure 8. Results of the post-disaster seasonal trend analysis for the 14 selected tiles. Blue color indicates
an upward trend, red color indicates a downward trend, and green color indicates a flat trend.
Lastly, regarding the visual contents of the photos, we identified two photos showing that the
restoration of the damaged Baclayon Church was still a work in progress. They were uploaded by
two different tourists on 24 July 2015 (Figure 9a) and 26 July 2015 (Figure 9b), respectively, which
were positioned at tile 13 in Figure 7. Indeed, it was reported that this church was devastated by the
Bohol earthquake [51], and the tourists’ observations have been validated by Camongol [50]. In
addition, one of the two tourists also contributed a photo on 26 July 2015, further showing that the
Figure 7.
Fourteen tiles selected from eight areas for post-disaster trend analysis, due to their relatively
higher popularity in the local tourism sector and higher similarity values generated by the similarity
analysis. Identification (ID) number of the tiles are labeled using Arabic numbers. These eight areas
include Bantayan island (tile IDs: 1 and 2), Cebu city (tile IDs: 3–5), Malapascua island (tile ID: 6),
Matalom municipality (tile ID: 7), Moalboal municipality (tile ID: 8), Ormoc city (tile ID 9), Santander
municipality (tile ID: 10), and Southwest Bohol province (tile IDs: 11–14).
ISPRS Int. J. Geo-Inf. 2017,6, 144 13 of 17
ISPRS Int. J. Geo-Inf. 2017, 6, 144 12 of 17
Figure 7. Fourteen tiles selected from eight areas for post-disaster trend analysis, due to their
relatively higher popularity in the local tourism sector and higher similarity values generated by the
similarity analysis. Identification (ID) number of the tiles are labeled using Arabic numbers. These
eight areas include Bantayan island (tile IDs: 1 and 2), Cebu city (tile IDs: 3–5), Malapascua island (tile
ID: 6), Matalom municipality (tile ID: 7), Moalboal municipality (tile ID: 8), Ormoc city (tile ID 9),
Santander municipality (tile ID: 10), and Southwest Bohol province (tile IDs: 11–14).
Figure 8. Results of the post-disaster seasonal trend analysis for the 14 selected tiles. Blue color indicates
an upward trend, red color indicates a downward trend, and green color indicates a flat trend.
Lastly, regarding the visual contents of the photos, we identified two photos showing that the
restoration of the damaged Baclayon Church was still a work in progress. They were uploaded by
two different tourists on 24 July 2015 (Figure 9a) and 26 July 2015 (Figure 9b), respectively, which
were positioned at tile 13 in Figure 7. Indeed, it was reported that this church was devastated by the
Bohol earthquake [51], and the tourists’ observations have been validated by Camongol [50]. In
addition, one of the two tourists also contributed a photo on 26 July 2015, further showing that the
Figure 8.
Results of the post-disaster seasonal trend analysis for the 14 selected tiles. Blue color
indicates an upward trend, red color indicates a downward trend, and green color indicates a flat trend.
ISPRS Int. J. Geo-Inf. 2017, 6, 144 13 of 17
damaged seashore infrastructure near the church had not been fixed. On the one hand, these photos
send a message to disaster managers about the restoration progress until 26 July 2015. On the other
hand, these photos send an alert to tourists so that they can plan their sightseeing in Bohol more
effectively, unless they are indeed interested in seeing the ruins. Note that we identified only three
photos showing the unfixed damage illustrated in Figure 9. Other than this case study, in more
developed countries with more Flickr users, it is possible that more photos (both tourist and non-
tourist photos) showing various damaged properties such as recreational sites, transportation
facilitates, and parks can be detected.
Figure 9. (a,b) Photos uploaded on 24 July 2015 and 26 July 2015, respectively, showing the
reconstruction of the damaged Baclayon Church; (c) photo uploaded on 26 July 2015, showing the
unfixed seashore infrastructure near the church.
5. Discussion
The features of social media data in monitoring and assessing post-disaster tourism recovery are
manifold. First, social media data are rich in data coverage and volume, and the analysis can easily
cover a large area as exemplified in our case study. Second, social media data are cost-effective, the
Flickr data collected in the case study were free of charge, and the only effort required in retrieving
the data was to gain necessary technical skills for using Flickr API. Third, social media data are timely.
For example, Flickr data are shared continuously, which reflect the near real-time situations on the
ground. In comparison to social media data, the collection of traditional data is often focused on a
limited number of selected sampling sites or survey sites, which is also time-consuming due to
financial and manpower constraints [11]. Although remote sensing images can cover a large area of
interest, obtaining images with both high spatial and temporal resolutions normally come with high
costs. It is also hardly possible to reveal tourist quantities from the images.
Nevertheless, several pertinent issues must be pointed out. First, it should be noted that the
quantity of social media contributions only implies tourist preferences of visiting during a tourism
recovery process, rather than the actual status of infrastructure recovery. In fact, an area with
physically well restored infrastructure but an unrestored reputation can still have a low number of
tourists. Another issue of using social media data for monitoring and assessing tourism recovery is
probably the related sampling bias issues. Traditional data collection methods involve scientifically
pre-designed sampling strategies and strict controls during data collection processes. For example,
when using a key informant interview method, investigators can select those respondents who are
Figure 9.
(
a
,
b
) Photos uploaded on 24 July 2015 and 26 July 2015, respectively, showing the
reconstruction of the damaged Baclayon Church; (
c
) photo uploaded on 26 July 2015, showing the
unfixed seashore infrastructure near the church.
5. Discussion
The features of social media data in monitoring and assessing post-disaster tourism recovery are
manifold. First, social media data are rich in data coverage and volume, and the analysis can easily
cover a large area as exemplified in our case study. Second, social media data are cost-effective, the
Flickr data collected in the case study were free of charge, and the only effort required in retrieving the
data was to gain necessary technical skills for using Flickr API. Third, social media data are timely.
For example, Flickr data are shared continuously, which reflect the near real-time situations on the
ground. In comparison to social media data, the collection of traditional data is often focused on
a limited number of selected sampling sites or survey sites, which is also time-consuming due to
financial and manpower constraints [
11
]. Although remote sensing images can cover a large area of
interest, obtaining images with both high spatial and temporal resolutions normally come with high
costs. It is also hardly possible to reveal tourist quantities from the images.
ISPRS Int. J. Geo-Inf. 2017,6, 144 14 of 17
Nevertheless, several pertinent issues must be pointed out. First, it should be noted that the
quantity of social media contributions only implies tourist preferences of visiting during a tourism
recovery process, rather than the actual status of infrastructure recovery. In fact, an area with
physically well restored infrastructure but an unrestored reputation can still have a low number
of tourists. Another issue of using social media data for monitoring and assessing tourism recovery is
probably the related sampling bias issues. Traditional data collection methods involve scientifically
pre-designed sampling strategies and strict controls during data collection processes. For example,
when using a key informant interview method, investigators can select those respondents who are
considered as more representative [
27
]. In the case of social media, the data contributors tend to be
diverse and heterogeneous, and the acquisition of detailed user personal particulars are less possible.
Taking the Flickr photos contributed by the non-locals in our case study as an example, it is hard to
determine whether those non-local photo contributors are pure tourists or photographers who are less
representative for the local tourism. In addition, there exist unequal representativeness even among
pure tourists because of the unequal numbers of photo contributions from different users. Even though
we can remove those extremely active users (i.e., extremely unequal representativeness, as shown in
the case study), those relatively minor unequal representativeness may still be there.
Therefore, these issues point to our future work. First, although we qualitatively examined the
visual contents of Flickr photos about unfixed damages, it is necessary to explore methods to further
mine the qualitative metadata of Flickr photos such as user-generated captions (i.e., short descriptions)
and system-generated auto-tags. These qualitative metadata could be heterogeneous [
52
], but mining
them may generate further insights into the physical recovery status of an affected tourist destination.
The second future work pertains to the mitigation of the aforementioned sampling bias issues. One
potential solution is to classify photos based on user behaviors, although we classified the photos
into tourist and non-tourist photos based on user profiles in the case study presented above. For
example, Zheng et al. [
16
] introduced the concept of mobility entropy. Tourists and non-tourists may
have different mobility entropies, which is the same with the case when comparing pure tourists with
photographers. In addition, methods should be developed to equalize the weights of different users
who contributed different numbers of photos. Apart from the future work related to Flickr photos,
since it is not guaranteed that there are active OpenStreetMap mapping activities throughout the
world, it will be necessary to incorporate a better way to retrieve tourist sites. It is also likely that the
tourist sites extracted in our case study are incomplete, although the OpenStreetMap data in the study
area can be expected to be of a good level of quality. Finally, this study privileged major tourist areas
(i.e., the viewshed area of tourist sites). Some tourists may like to explore and visit areas that are far
from tourist attractions, and it is worth examining how such areas recover in future work.
6. Conclusions
With the growing popularity of utilizing social media data to assist disaster management, the
research community has rarely explored such data for the recovery phase of disaster management.
This study generates insights into this line of research by creating a scientific workflow and methods
to enable the monitoring and assessment of post-disaster tourism recovery based on Flickr photos.
The workflow and methods have also been demonstrated through a case study conducted for the
Philippines. In the case study, we discovered spatiotemporal knowledge about the post-disaster
tourism recovery, including the recovery statuses and trends mined from tourist photos and photos
visually showing unfixed damage from both tourist and non-tourist photos. Future work mainly
includes investigating the qualitative metadata of Flickr photos, photo classification based on user
behaviors, and developing methods to minimize user sampling bias.
ISPRS Int. J. Geo-Inf. 2017,6, 144 15 of 17
Acknowledgments:
This research was supported by the Deutsche Forschungsgemeinschaft Initiative of
Excellence, Heidelberg Karlsruhe Research Partnership (HEiKA) Programme. It was also partly supported
by funding from Klaus Tschira Foundation, Heidelberg. We also acknowledge the financial support of the
Deutsche Forschungsgemeinschaft and Ruprecht-Karls-Universität Heidelberg within the funding programme
Open Access Publishing. We thank the Karlsruhe Institute of Technology, Geophysical Institute for the partnership
in conducting this research.
Author Contributions:
Yingwei Yan, Melanie Eckle, Chiao-Ling Kuo, Benjamin Herfort, Hongchao Fan, and
Alexander Zipf contributed ideas to the development of the workflow and methods. Yingwei Yan, Melanie Eckle,
and Chiao-Ling Kuo collected the data involved in the study. Yingwei Yan and Melanie Eckle conducted the data
analyses. Yingwei Yan wrote the paper.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
Goodchild, M.F. Citizens as sensors: The world of volunteered geography. GeoJournal
2007
,69, 211–221.
[CrossRef]
2.
Goodchild, M.F.; Glennon, J.A. Crowdsourcing geographic information for disaster response: A research
frontier. Int. J. Digit. Earth 2010,3, 231–241. [CrossRef]
3.
Horita, F.E.A.; Degrossi, L.C.; de Assis, L.F.G.; Zipf, A.; de Albuquerque, J.P. The use of volunteered
geographic information (VGI) and crowdsourcing in disaster management: A systematic literature review.
In Proceedings of the Nineteenth Americas Conference on Information Systems, Chicago, IL, USA,
15–17 August 2013.
4.
Haworth, B.; Bruce, E. A review of volunteered geographic information for disaster management.
Geogr. Compass 2015,9, 237–250. [CrossRef]
5.
Mair, J.; Ritchie, B.W.; Walters, G. Towards a research agenda for post-disaster and post-crisis recovery
strategies for tourist destinations: A narrative review. Curr. Issues Tour. 2016,19, 1–26. [CrossRef]
6.
Walters, G.; Mair, J. The effectiveness of post-disaster recovery marketing messages—The case of the 2009
Australian bushfires. J. Travel Tour. Mark. 2012,29, 87–103. [CrossRef]
7.
Ryu, K.; Bordelon, B.M.; Pearlman, D.M. Destination-image recovery process and visit intentions: Lessons
learned from Hurricane Katrina. J. Hosp. Mark. Manag. 2013,22, 183–203. [CrossRef]
8. Santana, G. Crisis management and tourism. J. Travel Tour. Mark. 2004,15, 299–321. [CrossRef]
9.
Pearlman, D.; Melnik, O. Hurricane Katrina
'
s effect on the perception of New Orleans leisure tourists.
J. Travel Tour. Mark. 2008,25, 58–67. [CrossRef]
10.
Liu, M.; Scheepbouwer, E.; Giovinazzi, S. Critical success factors for post-disaster infrastructure recovery:
Learning from the Canterbury (NZ) earthquake recovery. Disaster Prev. Manag. Int. J.
2016
,25, 685–700.
[CrossRef]
11.
Platt, S.; Brown, D.; Hughes, M. Measuring resilience and recovery. Int. J. Disaster Risk Reduct.
2016
,19,
447–460. [CrossRef]
12.
Rittichainuwat, B. Ghosts: A travel barrier to tourism recovery. Ann. Tour. Res.
2011
,38, 437–459. [CrossRef]
13.
Senaratne, H.; Bröring, A.; Schreck, T. Using reverse viewshed analysis to assess the location correctness of
visually generated VGI. Trans. GIS 2013,17, 369–386. [CrossRef]
14. Gerlitz, C.; Rieder, B. Mining one percent of Twitter: Collections, baselines, sampling. M/C J. 2013,16, 620.
15.
Frank, M.; Dong, B.; Felt, A.P.; Song, D. Mining permission request patterns from Android and Facebook
applications. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, Brussels,
Belgium, 10–13 December 2012.
16.
Zheng, Y.-T.; Zha, Z.-J.; Chua, T.-S. Mining travel patterns from geotagged photos. ACM Trans. Intell. Syst.
Technol. 2012,3, 56. [CrossRef]
17.
Huang, W.; Li, S. Understanding human activity patterns based on space-time-semantics. ISPRS J.
Photogramm. Remote Sens. 2016,121, 1–10. [CrossRef]
18.
Li, L.; Goodchild, M.F.; Xu, B. Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr.
Cartogr. Geogr. Inf. Sci. 2013,40, 61–77. [CrossRef]
19.
Li, J.; Qin, Q.; Han, J.; Tang, L.-A.; Lei, K.H. Mining trajectory data and geotagged data in social media for
road map inference. Trans. GIS 2015,19, 1–18. [CrossRef]
ISPRS Int. J. Geo-Inf. 2017,6, 144 16 of 17
20.
De Albuquerque, J.P.; Herfort, B.; Brenning, A.; Zipf, A. A geographic approach for combining social media
and authoritative data towards identifying useful information for disaster management. Int. J. Geogr. Inf. Sci.
2015,29, 667–689. [CrossRef]
21.
Zook, M.; Graham, M.; Shelton, T.; Gorman, S. Volunteered geographic information and crowdsourcing
disaster relief: A case study of the Haitian earthquake. World Med. Health Policy 2010,2, 7–33. [CrossRef]
22.
Peary, B.D.M.; Shaw, R.; Takeuchi, Y. Utilization of social media in the east Japan earthquake and tsunami
and its effectiveness. J. Nat. Disaster Sci. 2012,34, 3–18. [CrossRef]
23.
Takahashi, B.; Tandoc, E.C., Jr.; Carmichael, C. Communicating on Twitter during a disaster: An analysis of
tweets during Typhoon Haiyan in the Philippines. Comput. Hum. Behav. 2015,50, 392–398. [CrossRef]
24.
Pennington, C.; Freeborough, K.; Dashwood, C.; Dijkstra, T.; Lawrie, K. The National Landslide Database
of Great Britain: Acquisition, communication and the role of social media. Geomorphology
2015
,249, 44–51.
[CrossRef]
25.
Brown, D.; Saito, K.; Spence, R.; Chenvidyakarn, T.; Adams, B.; Mcmillan, A.; Platt, S. Indicators for
measuring, monitoring and evaluating post-disaster recovery. In Proceedings of the 6th International
Workshop on Remote Sensing for Disaster Applications, Pavia, Italy, 11–12 September 2008.
26.
Hoshi, T.; Murao, O.; Yoshino, K.; Yamazaki, F.; Estrada, M. Post-disaster urban recovery monitoring in Pisco
after the 2007 Peru earthquake using satellite image. J. Disaster Res. 2014,9, 1059.
27.
Brown, D.; Saito, K.; Liu, M.; Spence, R.; So, E.; Ramage, M. The use of remotely sensed data and ground
survey tools to assess damage and monitor early recovery following the 12.5.2008 Wenchuan earthquake in
China. Bull. Earthq. Eng. 2012,10, 741–764. [CrossRef]
28.
Rathfon, D.; Davidson, R.; Bevington, J.; Vicini, A.; Hill, A. Quantitative assessment of post-disaster housing
recovery: A case study of Punta Gorda, Florida, after Hurricane Charley. Disasters
2013
,37, 333–355.
[CrossRef] [PubMed]
29.
McCarthy, K.; Hanson, M. Technical Report: Post-Katrina Recovery of the Housing Market along the Mississippi
Gulf Coast; Gulf States Policy Institute: New Orleans, LA, USA, 2008.
30.
The International Charter. Available online: https://www.disasterscharter.org/documents/10180/187832/
CHARTER_UA_ENG.pdf (accessed on 16 March 2017).
31.
Goodchild, M.F.; Li, L. Assuring the quality of volunteered geographic information. Spat. Stat.
2012
,1,
110–120. [CrossRef]
32.
Haklay, M.; Basiouka, S.; Antoniou, V.; Ather, A. How many volunteers does it take to map an area well?
The validity of Linus’ law to volunteered geographic information. Cartogr. J. 2010,47, 315–322. [CrossRef]
33.
Create Space Time Cube. Available online: http://pro.arcgis.com/en/pro-app/tool-reference/space-time-
pattern-mining/create-space-time-cube.htm (accessed on 27 December 2016).
34.
Wiens, J.A.; Stralberg, D.; Jongsomjit, D.; Howell, C.A.; Snyder, M.A. Niches, models, and climate change:
Assessing the assumptions and uncertainties. Proc. Natl. Acad. Sci. USA
2009
,106, 19729–19736. [CrossRef]
[PubMed]
35.
Hamed, K.H. Trend detection in hydrologic data: The Mann–Kendall trend test under the scaling hypothesis.
J. Hydrol. 2008,349, 350–363. [CrossRef]
36.
Imran, M.; Castillo, C.; Lucas, J.; Meier, P.; Vieweg, S. AIDR: Artificial intelligence for disaster response.
In Proceedings of the 23rd International Conference on World Wide Web, Seoul, Korea, 7–11 April 2014.
37.
When to Go and Weather. Available online: http://www.lonelyplanet.com/philippines/weather (accessed
on 21 December 2016).
38.
Lagmay, A.M.F.; Eco, R. Brief communication: On the source characteristics and impacts of the magnitude
7.2 Bohol earthquake, Philippines. Nat. Hazards Earth Syst. Sci. 2014,14, 2795–2801. [CrossRef]
39.
Lagmay, A.M.F.; Agaton, R.P.; Bahala, M.A.C.; Briones, J.B.L.T.; Cabacaba, K.M.C.; Caro, C.V.C.; Dasallas, L.L.;
Gonzalo, L.A.L.; Ladiero, C.N.; Lapidez, J.P.; et al. Devastating storm surges of Typhoon Haiyan. Int. J.
Disaster Risk Reduct. 2015,11, 1–12. [CrossRef]
40.
The Bohol Quake: Then and Now (How Has Bohol’s Recovered after this Disaster?). Available online:
http://8list.ph/bohol-quake-third-anniversary/ (accessed on 10 December 2016).
41.
A Year after Yolanda: Malapascua, Bantayan Now Ready for Tourists. Available online: http://www.philstar.
com/cebu-business/2014/11/08/1389447/year-after-yolanda-malapascua-bantayan-now-ready-tourists
(accessed on 10 December 2016).
ISPRS Int. J. Geo-Inf. 2017,6, 144 17 of 17
42.
Tachikawa, T.; Kaku, M.; Iwasaki, A.; Gesch, D.B.; Oimoen, M.J.; Zhang, Z.; Danielson, J.J.; Krieger, T.;
Curtis, B.; Haase, J.; et al. Aster Global Digital Elevation Model Version 2—Summary of Validation Results; NASA
Land Processes Distributed Active Archive Center and the Joint Japan-US ASTER Science Team: Sioux Falls,
SD, USA, 2011.
43. USGS Earthexplorer. Available online: https://earthexplorer.usgs.gov/ (accessed on 23 December 2016).
44.
OpenStreetMap. org.ph Mina-Mapa ang Pilipinas. Available online: https://openstreetmap.org.ph/
(accessed on 23 December 2016).
45.
Humanitarian OpenStreetMap Team. Available online: https://hotosm.org/projects/typhoon_haiyan
(accessed on 23 December 2016).
46.
Gocotano, A.; Geroy, L.S.; Alcido, M.R.; Dorotan, M.M.; Balboa, G.; Hall, J.L. Is the response over? The
transition from response to recovery in the health sector post-Typhoon Haiyan. West. Pac. Surveill. Response
2015,6, 5–9. [CrossRef] [PubMed]
47.
Philippine Tourism Continues. Available online: http://tourism.gov.ph/Pages/
20131118aPHILIPPINETOURISMCONTINUES.aspx (accessed on 10 March 2017).
48.
Bohol Earthquake Recovery Operation Comes to an End; Boholanos Grateful for Repaired Shelter,
New Homes. Available online: http://reliefweb.int/report/philippines/bohol-earthquake-recovery-
operation-comes-end-boholanos- grateful-repaired-shelter (accessed on 9 March 2017).
49.
After Typhoon Haiyan, Australians Help Philippines with Recovery. Available online: http://www.smh.
com.au/world/after-typhoon-haiyan-australians-help-philippines- with-recovery-20140912--10eu1e.html
(accessed on 9 March 2017).
50.
Look: Bohol’s Centuries-old Churches Two Years after the Quake. Available online: http://lifestyle.inquirer.
net/198313/look-bohols-centuries-old- churches-two- years-after-the-quake/ (accessed on 10 March 2017).
51.
2 Centuries-old Bohol Churches Devastated by 7.2 Magnitude Earthquake. Available online: http://
newsinfo.inquirer.net/507125/centuries-old- baclayon-church-damaged-by-7- 2-magnitude-quake (accessed
on 15 December 2016).
52.
Feick, R.; Roche, S. Understanding the value of VGI. In Crowdsourcing Geographic Knowledge: Volunteered
Geographic Information (VGI) in Theory and Practice; Sui, D., Elwood, S., Goodchild, M., Eds.; Springer:
Dordrecht, The Netherlands, 2013; pp. 15–29.
©
2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).
... In tracking recovery progress, VGI has shown VGI has shown its potential in various studies. Yan et al., (2017) presented a VGI-based methodology using Flickr data. Similarly, Lowrie et al., (2022) evaluated the value of using VGI from the Waze platform to report flash floods and assist with the postdisaster recovery process with the case study of Hurricane Harvey. ...
... To plan, monitor, and evaluate recovery in tourist locations, Yan et al., (2020) used public sentiment and perspectives from geotagged social media data to examine the post-disaster recovery process. Using geotagged Flickr images, the study Yan et al., (2017) established a scientific approach and methodologies to track and assess post-disaster tourism recovery. The methodology included crowdsourcing-based qualitative photo analysis, space-time bin-based quantitative photo analysis, and view shed-based data quality enrichment. ...
Chapter
Volunteered geographic information is a crucial tool in disaster response and recovery, transforming data collection and sharing. It has significantly improved disaster management by providing insights into urgent needs, damage estimation, and resource allocation. The distribution of VGI during relief efforts has greatly benefited from social media platforms, supplemented by hashtags and machine learning techniques. Mobile and web-based technologies and community involvement have improved coordination and decision-making. VGI is crucial for monitoring development, evaluating infrastructure, and assisting risk assessment during the recovery phase. It's important to pay attention to issues like data quality, privacy, inclusivity, complexity, obsolescence, appropriateness, accessibility, and technological infrastructure. Future studies should concentrate on automated systems for data validation, resolving privacy and security issues, and encouraging marginalized people to participate.
... In addition, few studies have taken into consideration of the unprecedented high infectivity, rapid transmission, and long duration of the COVID-19 pandemic which makes exploring its spatiotemporal psychological impact on the public more complicated. This is because collecting representative and reliable data using traditional approaches like surveys can be time-consuming and costly for large areas (Platt et al., 2016;Yan et al., 2017). As VGI is utilized to support crisis management and epidemiology, there is a new potential to explore the spatiotemporal characteristics of the public's panic levels aimed at infectious disease crises such as SARS, Ebola, and COVID-19. ...
... Fung et al. (2014) found that Americans expressed more anxiety about Ebola based on Twitter and Google Trends data. Yan et al. (2017) used Flicker photos to monitor and assess post-disaster tourism recovery in the central Philippines islands region. ...
Article
The existing crisis management research mostly reveals the patterns of the public's panic levels from the perspectives of public management, sociology, and psychology, only a few studies have revealed the spatiotemporal characteristics. Therefore, this study investigates the spatial distribution and temporal patterns and influencing factors on the general public's panic levels using the Baidu Index data from a geographic perspective. The results show that: (1) The public's panic levels were significantly correlated with the spatial distance between the epicenter and the region of investigation, and with the number of confirmed cases in different regions when the pandemic began to spread. (2) Based on the spatial distance between the epicenter and the region, the public's panic levels in different regions could be divided into three segments: core segment (0–500 km), buffer segment (500–1300 km), and peripheral segment (>1300 km). The panic levels of different people in the three segments were consistent with the Psychological Typhoon Eye Effect and the Ripple Effect can be detected in the buffer segment. (3) The public's panic levels were strongly correlated with whether the spread of the infectious disease crisis occurred and how long it lasted. It is suggested that crisis information management in the future needs to pay more attention to the spatial division of control measures. The type of crisis information released to the general public should depend on the spatial relationship associated with the place where the crisis breaks out.
... Such monitoring schemes and analyses could be used for different purposes. For instance, Yan et al. [52] used geotagged social media data for monitoring and assessing post-disaster tourism recovery. Moreover, social media imagery could also be used to encourage people for the donation. ...
... Previous studies have shown that people were likely to communicate situational updates and damages during disasters via social media. In this context, damage assessment and recovery can be carried out by analyzing the potential relationship between social media posts, disaster-related ratios, sentiment change of the public, and the social media activity of visitors (Yan et al., 2017;Zou et al., 2018). ...
Article
Full-text available
Detecting and collecting public opinion via social media can provide near real‐time information to decision‐makers, which plays a vital role in urban disaster management and sustainable development. However, there has been little work focusing on identifying the perception and the sentiment polarity expressed by users during and after disasters, particularly regional flood events. In this article, we comprehensively analyze tweets data related to the “European floods in 2021” over time, topic, and sentiment, forming a complete workflow from data processing, topic modeling, sentiment analysis, and topic and sentiment prediction. The aim is to address the following research questions: (1) What are the public perception and main concerns during and after floods? (2) How does the public sentiment change during and after floods? Results indicate that there is a significant correlation between a flood's trend and the heat of corresponding tweets. The three topics that receive the most public concern are: (1) climate change and global warming; (2) praying for the victims: and (3) disaster situations and information. Negative sentiments are predominant during the floods and will continue for some time. We tested five different classifiers, of which TextCNN‐attention turned out to deliver the best predictions in topic and sentiment prediction, and performed well for sparse flood tweets, it can be used to predict the topic and sentiment polarity of a single tweet in real‐time during the flood events. Our findings can help disaster agencies to better understand the dynamics of social networks and develop stronger situational awareness towards a disaster, which can contribute to scientifically justified decision‐making in urban risk management and also meet the challenges associated with the global sustainable development goal 11 (SDGs) on Sustainable Cities and Communities.
... Such monitoring schemes and analyses could be used for different purposes. For instance, Yan et al. [52] used geotagged social media data for monitoring and assessing post-disaster tourism recovery. Moreover, social media imagery could also be used to encourage people for the donation. ...
Chapter
The recent literature reports several practical and important use cases of social media informatics where artificial intelligence (AI), machine learning (ML), and other relevant technologies are employed to analyze human sufferings and infrastructure damage in natural disasters. While the textual content of social media platforms conveys relevant and useful information during a disaster, social media imagery content has also been proven very effective in analyzing the scale of damage to infrastructures such as roads, bridges, and buildings. Moreover, disaster-related visual content could also be analyzed to extract people’s perceptions, emotions, sentiments, and responses to disasters, which can help different stakeholders, such as humanitarian organizations and policy-makers. Assessing such aspects of disaster events requires effective and efficient image processing methods to process a large amount social media content. This chapter reviews state-of-the-art techniques and shows their utility in processing social media image streams during disaster response for a diversified set of applications. It also highlights the key applications, challenges, available shared resources (datasets and models), tasks, and future research directions. This chapter will provide a ground for future research and a good starting point for the researchers in the domain.
... As these data are uploaded by the users of the property websites, we postulate that real estate data may be considered as a latent form of volunteered geographic information (VGI), and one that warrants further investigations, as contemplated above (Goodchild 2007). To be more specific, for the first time, we deem that they are a type of passive and implicitly volunteered VGI (Craglia et al. 2012;See et al. 2016;Ghermandi & Sinclair 2019;Hopf 2018), as contributing spatial data is not the contributor's primary intention, similarly to social media and geo-tagged imagery such as Flickr (Yan et al. 2017(Yan et al. , 2018. Considering real estate data from such an angle, we posit Fig. 1 The idea of our research: real estate datasets (i.e. ...
Article
Full-text available
Acquiring spatial data of fine and dynamic urban features such as buildings remains challenging. This paper brings attention to real estate advertisements and property sales data as valuable and dynamic sources of geoinformation in the built environment, but unutilised in spatial data infrastructures. Given the wealth of information they hold and their user-generated nature, we put forward the idea of real estate data as an instance of implicit volunteered geographic information and bring attention to their spatial aspect, potentially alleviating the challenge of acquiring spatial data of fine and dynamic urban features. We develop a mechanism of facilitating continuous acquisition, maintenance, and quality assurance of building data and associated amenities from real estate data. The results of the experiments conducted in Singapore reveal that one month of property listings provides information on 7% of the national building stock and about half of the residential subset, e.g. age, type, and storeys, which are often not available in sources such as OpenStreetMap, potentially supporting applications such as 3D city modelling and energy simulations. The method may serve as a novel means to spatial data quality control as it detects missing amenities and maps future buildings, which are advertised and transacted before they are built, but it exhibits mixed results in identifying unmapped buildings as ads may contain errors that impede the idea.
... By applying modern innovation in Android gadgets, individuals seek data of a particular land through utilizing the Global Positioning System (GPS) and android advance mobile phone gadget Bhuvan Geotag for Urban Planning [11]. The use of geotagging was used to help the prevention, planning, and response processes of disaster management when a magnitude 7.2 Mw earthquake occurred in the Philippines in 2013 [12]. To provide a basis to enhance the crustal movement monitoring, three continuous GPS stations were installed on Mindanao Island [13]. ...
Thesis
Full-text available
Earthquake occurs due to tectonic movement of a fault. Davao Region has warned by the Philippine Institute of Volcanology and Seismology (PHIVOLCS) that new sets of fault lines appeared in the region's southern vicinity with a possible magnitude of 6.8Mw to 7.1Mw. This phenomenon is possible since Davao Region is part of the Philippine Fault Zone. This study was conducted to give specific information of the area within the vicinity's fault lines, such as soil profile and seismic information. This information is present on the area's geotagged images taken in a specific coordinate of the region. The coordinates were located using Global Positioning System (GPS), and it is used to geotag the images. The Bureau of Soils and Water Management (BSWM) provides the soil profile data such as depth of soil and its classification. In contrast, the Philippines Institute of Volcanology and Seismology provides seismic data such as magnitude and the fault line's name. In contrast, the seismic analysis was calculated through seismic provisions of the 2015 National Structural Code of the Philippines (NSCP 2015). This study shows that there are seven active fault lines Southern Davao Region namely Dacudao Fault with a magnitude of 6.5Mw, Lacson Fault with a magnitude of 6.8Mw, Tamugan Fault with a magnitude of 6.7Mw, Pangyan-Biao Fault with a magnitude of 6.8Mw, New Carmen fault with a magnitude of 6.3Mw, Tangbulan fault with a magnitude of 6.1Mw and Makilala-Malungon fault with a magnitude of 6.5Mw. This study concludes that adding detailed data on the fault line's geotagged images, such as soil and seismic information, will benefit the structural building since modern technology is applied.
Chapter
Tourism sites around the world which are often hit by calamities caused by climate change normally affect extremely the regions and economies. Disasters affect directly or indirectly the number of tourist arrival, the hotel industry, tourism receipts, employment, and the overall economy of a region (Naeem, Bhatti, & Khan, 2021). To thrive or adapt in this novel and rapidly changing environment, tourism communities need to be resilient in order to maintain the economic benefits (Wu, Chiu, & Chen, 2019). This requires strategic approach in local tourism development with strong public private partnership and collaboration. Economy, environment, emergency management and response, disaster risk management, community-based participation, post-disaster tourism recovery management, psychological behavior of people, nature-based tourism, dark tourism, responsive consumer behavior, and transportation are the key areas to focus on. Developing resilient and sustainable local tourism communities must be guided by the carefully defined goals and objectives depending on the dynamics and resources of the communities, and anchored of guidelines, pertinent laws and policies implemented by the local, national, and international governing and regulatory bodies.
Chapter
Full-text available
This study aims to determine the role of stakeholder orientation, strategic capability, and joint value creation on the competitiveness of Banten’s cultural tourism destinations. This research is located in Banten Province, Indonesia. This research was conducted by distributing electronic questionnaires to 321 respondents. Furthermore, focus group discussions among stakeholders were conducted to balance and strengthen the data collected. The findings of the test results indicate that Stakeholder Orientation (OS), Strategic Capabilities (KS), and Shared Value Creation (PNB) have a significant role in the competitiveness variable of cultural tourism destinations. The results of the partial test found that the diversity of OS and KS was not significant for the competitiveness variable of cultural tourism destinations (DS). This means that in the future the OS, KS, and GNP indicators must be improved, especially the shared value creation (PNB) variable which has the biggest role.
Conference Paper
Full-text available
This paper introduces the Recovery Project, which aims to identify indicators of post-disaster recovery using satellite imagery, internet-based statistics and advanced field survey techniques. This paper reviews the recovery literature as a means of introducing the recovery process and the considerations that must be made when evaluating recovery. This is followed by an introduction to the Recovery project and its two case study sites: 1. Ban Nam Khem, Thailand and 2. Muzaffarabad, Pakistan. A review of the recovery process at Ban Nam Khem is presented along with a diagram of potential indicators obtained from the literature research. The paper concludes with a short discussion on how remote sensing may be used to monitor some of these indicators.
Article
Full-text available
Understanding human activity patterns plays a key role in various applications in an urban environment, such as transportation planning and traffic forecasting, urban planning, public health and safety, and emergency response. Most existing studies in modeling human activity patterns mainly focus on spatiotemporal dimensions, which lacks consideration of underlying semantic context. In fact, what people do and discuss at some places, inferring what is happening at the places, cannot be simple neglected because it is the root of human mobility patterns. We believe that the geo-tagged semantic context, representing what individuals do and discuss at a place and a specific time, drives a formation of specific human activity pattern. In this paper, we aim to model human activity patterns not only based on space and time but also with consideration of associated semantics, and attempt to prove a hypothesis that similar mobility patterns may have different motivations. We develop a spatiotemporal-semantic model to quantitatively express human activity patterns based on topic models, leading to an analysis of space, time and semantics. A case study is conducted using Twitter data in Toronto based on our model. Through computing the similarities between users in terms of spatiotemporal pattern, semantic pattern and spatiotemporal-semantic pattern, we find that only a small number of users (2.72%) have very similar activity patterns, while the majority (87.14%) show different activity patterns (i.e., similar spatiotemporal patterns and different semantic patterns, similar semantic patterns and different spatiotemporal patterns, or different in both). The population of users that has very similar activity patterns is decreased by 56.41% after incorporating semantic information in the corresponding spatiotemporal patterns, which can quantitatively prove the hypothesis.
Article
Full-text available
This paper reports systematic attempts to measure and assess recovery after recent major earthquakes. The aim is to compare different methods, both quantitative and qualitative, and to assess which are more cost effective rather than detail the process of recovery after particular events. The paper also discusses how resilience relates to recovery. The methods trialled are all capable of measuring the speed, and to some extent the quality, of recovery, but the merits of each depends on the resources available and the level of detail or precision required. The recommended approach is to use the methods in combination. Specifically satellite imagery analysis can be combined with ground survey, social audit and published data to develop a spatial-temporal geo-database that can be used to monitor recovery. To date, however, it would appear to be challenging to measure resilience. An approach might be to attempt to isolate the factors underlying resilience and focus on measuring these. This might be achieved by analysing recovery after a wide variety of events and building models of resilience based on these factors. These predictions of resiliencey for a wide range of countries at risk might then be compared with the speed and quality recovery after future events.
Article
Full-text available
A devastating earthquake struck Bohol, Philippines, on 15 October 2013. The earthquake originated at 12 km depth from an unmapped reverse fault, which manifested on the surface for several kilometers and with maximum vertical displacement of 3 m. The earthquake resulted in 222 fatalities with damage to infrastructure estimated at USD 52.06 million. Widespread landslides and sinkholes formed in the predominantly limestone region during the earthquake. These remain a significant threat to communities as destabilized hillside slopes, landslide-dammed rivers and incipient sinkholes are still vulnerable to collapse, triggered possibly by aftershocks and heavy rains in the upcoming months of November and December. The most recent fatal temblor originated from a previously unmapped fault, herein referred to as the Inabanga Fault. Like the hidden or previously unmapped faults responsible for the 2012 Negros and 2013 Bohol earthquakes, there may be more unidentified faults that need to be mapped through field and geophysical methods. This is necessary to mitigate the possible damaging effects of future earthquakes in the Philippines.
Article
Full-text available
You can view the paper here: http://nora.nerc.ac.uk/510521/ The British Geological Survey (BGS) is the national geological agency for Great Britain that provides geoscientific information to government, other institutions and the public. The National Landslide Database has been developed by the BGS and is the focus for national geohazard research for landslides in Great Britain. The history and structure of the geospatial database and associated Geographical Information System (GIS) are explained, along with the future developments of the database and its applications. The database is the most extensive source of information on landslides in Great Britain with over 17,000 records of landslide events to date, each documented as fully as possible for inland, coastal and artificial slopes. Data are gathered through a range of procedures, including: incorporation of other databases; automated trawling of current and historical scientific literature and media reports; new field- and desk-based mapping technologies with digital data capture, and using citizen science through social media and other online resources. This information is invaluable for directing the investigation, prevention and mitigation of areas of unstable ground in accordance with Government planning policy guidelines. The national landslide susceptibility map (GeoSure) and a national landslide domains map currently under development, as well as regional mapping campaigns, rely heavily on the information contained within the landslide database. Assessing susceptibility to landsliding requires knowledge of the distribution of failures, an understanding of causative factors, their spatial distribution and likely impacts, whilst understanding the frequency and types of landsliding present is integral to modelling how rainfall will influence the stability of a region. Communication of landslide data through the Natural Hazard Partnership (NHP) and Hazard Impact Model contributes to national hazard mitigation and disaster risk reduction with respect to weather and climate. Daily reports of landslide potential are published by BGS through the NHP partnership and data collected for the National Landslide Database are used widely for the creation of these assessments. The National Landslide Database is freely available via an online GIS and is used by a variety of stakeholders for research purposes.
Article
Purpose The purpose of this paper is to synthesise critical success factors (CSFs) for advancing post-disaster infrastructure recovery and underpinning recovery authorities in decision making when facing future disasters. Design/methodology/approach The seismic recovery after the Canterbury (NZ) earthquake sequence in 2010-2011 was selected as a case study for identifying CSFs for an efficient recovery of infrastructure post-disaster. A combination of research approaches, including archival study, observations and semi-structured interviews were conducted for collecting data and evidences by engaging with participants involved at various tiers in the post-disaster recovery and reconstruction. The CSFs are evaluated and analysed by tracking the decision-making process, examining resultant consequences and foreseeing onwards challenges. Findings Six salient CSFs for strengthening infrastructure recovery management after disasters are identified. Furthermore, the study shows how each of these CSFs have been incorporated into the decision-making process in support of the post-disaster recovery and what difficulties encountered in the recovery process when implementing. Practical implications The proposed CSFs provide a future reference and guidance to be drawn on by decision makers when project-managing post-disaster recovery operations. Originality/value The value of the paper is that it bridges the gap between managerial contexts and technical aspects of post-disaster recovery process in an effort to rapidly and efficiently rebuild municipal infrastructure.
Article
Introduction Social media platforms present numerous challenges to empirical research, making it different from researching cases in offline environments, but also different from studying the “open” Web. Because of the limited access possibilities and the sheer size of platforms like Facebook or Twitter, the question of delimitation, i.e. the selection of subsets to analyse, is particularly relevant. Whilst sampling techniques have been thoroughly discussed in the context of social science research (Uprichard; Noy; Bryman; Gilbert; Gorard), sampling procedures in the context of social media analysis are far from being fully understood. Even for Twitter, a platform having received considerable attention from empirical researchers due to its relative openness to data collection, methodology is largely emergent. In particular the question of how smaller collections relate to the entirety of activities of the platform is quite unclear. Recent work comparing case based studies to gain a broader picture (Bruns and Stieglitz) and the development of graph theoretical methods for sampling (Papagelis, Das, and Koudas) are certainly steps in the right direction, but it seems that truly large-scale Twitter studies are limited to computer science departments (e.g. Cha et al.; Hong, Convertino, and Chi), where epistemic orientation can differ considerably from work done in the humanities and social sciences. The objective of the paper is to reflect on the affordances of different techniques for making Twitter collections and to suggest the use of a random sampling technique, made possible by Twitter’s Streaming API (Application Programming Interface), for baselining, scoping, and contextualising practices and issues. We discuss this technique by analysing a one percent sample of all tweets posted during a 24-hour period and introduce a number of analytical directions that we consider useful for qualifying some of the core elements of the platform, in particular hashtags. To situate our proposal, we first discuss how platforms propose particular affordances but leave considerable margins for the emergence of a wide variety of practices. This argument is then related to the question of how medium and sampling technique are intrinsically connected. Indeterminacy of Platforms A variety of new media research has started to explore the material-technical conditions of platforms (Rogers`; Gillespie; Hayles), drawing attention to the performative capacities of platform protocols to enable and structure specific activities; in the case of Twitter that refers to elements such as tweets, retweets, @replies, favourites, follows, and lists. Such features and conventions have been both a subject and a starting point for researching platforms, for instance by using hashtags to demarcate topical conversations (Bruns and Stieglitz), @replies to trace interactions, or following relations to establish social networks (Paßmann, Boeschoten, and Schäfer). The emergence of platform studies (Gillespie; Montfort and Bogost; Langlois et al.) has drawn attention to platforms as interfacing infrastructures that offer blueprints for user activities through technical and interface affordances that are pre-defined yet underdetermined, fostering sociality in the front end whilst mining for data in the back end (Stalder). Doing so, they cater to a variety of actors, including users, developers, advertisers, and third-party services, and allow for a variety of distinct use practices to emerge. The use practices of platform features on Twitter are, however, not solely produced by users themselves, but crystallise in relation to wider ecologies of platforms, users, other media, and third party services (Burgess and Bruns), allowing for sometimes unanticipated vectors of development. This becomes apparent in the case of the retweet function, which was initially introduced by users as verbatim operation, adding “retweet” and later “RT” in front of copied content, before Twitter officially offered a retweet button in 2009 (boyd, Golder, and Lotan). Now, retweeting is deployed for a series of objectives, including information dissemination, promotion of opinions, but also ironic commentary. Gillespie argues that the capacity to interface and create relevance for a variety of actors and use practices is, in fact, the central characteristic of platforms (Gillespie). Previous research for instance addresses Twitter as medium for public participation in specific societal issues (Burgess and Bruns; boyd, Golder, and Lotan), for personal conversations (Marwick and boyd; boyd, Golder, and Lotan), and as facilitator of platform-specific communities (Paßmann, Boeschoten, and Schäfer). These case-based studies approach and demarcate their objects of study by focussing on particular hashtags or use practices such as favoriting and retweeting. But using these elements as basis for building a collection of tweets, users, etc. to be analysed has significant epistemic weight: these sampling methods come with specific notions of use scenarios built into them or, as Uprichard suggests, there are certain “a priori philosophical assumptions intrinsic to any sample design and the subsequent validity of the sample criteria themselves” (Uprichard 2). Building collections by gathering tweets containing specific hashtags, for example, assumes that a) the conversation is held together by hashtags and b) the chosen hashtags are indeed the most relevant ones. Such assumptions go beyond the statistical question of sampling bias and concern the fundamental problem of how to go fishing in a pond that is big, opaque, and full of quickly evolving populations of fish. The classic information retrieval concepts of recall (How many of the relevant fish did I get?) and precision (How many fish caught are relevant?) fully apply in this context. In a next step, we turn more directly to the question of sampling Twitter, outlining which methods allow for accessing which practices – or not – and what the role of medium-specific features is. Sampling Twitter Sampling, the selection of subsets from a larger set of elements (the population), has received wide attention especially in the context of empirical sociology (Uprichard; Noy; Bryman; Gilbert; Gorard; Krishnaiah and Rao). Whilst there is considerable overlap in sampling practices between quantitative sociology and social media research, some key differences have to be outlined: first, social media data, such as tweets, generally pre-exist their collection rather than having to be produced through surveys; secondly, they come in formats specific to platforms, with analytical features, such as counts, already built into them (Marres and Weltevrede); and third, social media assemble very large populations, yet selections are rarely related to full datasets or grounded in baseline data as most approaches follow a case study design (Rieder). There is a long history to sampling in the social sciences (Krishnaiah and Rao), dating back to at least the 19th century. Put briefly, modern sampling approaches can be distinguished into probability techniques, emphasising the representative relation between the entire population and the selected sample, and non-probability techniques, where inference on the full population is problematic (Gilbert). In the first group, samples can either be based on a fully random selection of cases or be stratified or cluster-based, where units are randomly selected from a proportional grid of known subgroups of a population. Non-probability samples, on the contrary, can be representative of the larger population, but rarely are. Techniques include accidental or convenience sampling (Gorard), based on ease of access to certain cases. Purposive non-probability sampling however, draws on expert sample demarcation, on quota, case-based or snowball sampling techniques – determining the sample via a priori knowledge of the population rather than strict representational relations. Whilst the relation between sample and population, as well as access to such populations (Gorard) is central to all social research, social media platforms bring to the reflection of how samples can function as “knowable objects of knowledge” (Uprichard 2) the role of medium-specific features, such as built-in markers or particular forms of data access. Ideally, when researching Twitter, we would have access to a full sample, the subject and phantasy of many big data debates (boyd and Crawford; Savage and Burrows), which in practice is often limited to platform owners. Also, growing amounts of daily tweets, currently figuring around 450 million (Farber), require specific logistic efforts, as a project by Cha et al. indicates: to access the tweets of 55 million user accounts, 58 servers to collect a total amount of 1.7 billion tweets (Cha et al.). Full samples are particularly interesting in the case of exploratory data analysis (Tukey) where research questions are not set before sampling occurs, but emerge in engagement with the data. The majority of sampling approaches on Twitter, however, follow a non-probabilistic, non-representative route, delineating their samples based on features specific to the platform. The most common Twitter sampling technique is topic-based sampling that selects tweets via hashtags or search queries, collected through API calls (Bruns and Stieglitz, Burgees and Bruns; Huang, Thornton, and Efthimiadis) Such sampling techniques rest on the idea that content will group around the shared use of hashtags or topical words. Here, hashtags are studied with an interest in the emergence and evolution of topical concerns (Burgees and Bruns), to explore brand communication (Stieglitz and Krüger), during public unrest and events (Vis), but also to account for the multiplicity of hashtag use practices (Bruns and Stieglitz). The approach lends itself to address issue emergence and composition, but also draws attention to medium-specific use practices of hashtags. Snowball sampling, an extension of topic-based sampling, builds on predefined lists of user accounts as starting points (Rieder), often defined by experts, manual collections or existing lists, which are then extended through “snowballing” or triangulation, often via medium-specific relations such as following. Snowball sampling is used to explore national spheres (Rieder), topic- or activity-based user groups (Paßmann, Boeschoten, and Schäfer), cultural specificity (Garcia-Gavilanes, Quercia, and Jaimes) or dissemination of content (Krishnamurthy, Gill, and Arlitt). Recent attempts to combine random sampling and graph techniques (Papagelis, Das, and Koudas) to throw wider nets while containing technical requirements are promising, but conceptually daunting. Marker-based sampling uses medium-specific metadata to create collections based on shared language, location, Twitter client, nationality or other elements provided in user profiles (Rieder). This sampling method can be deployed to study the language or location specific use of Twitter. However, an increasing amount of studies develop their own techniques to detect languages (Hong, Convertino, and Chi). Non-probability selection techniques, topic-, marker-, and basic graph-based sampling struggle with representativeness (Are my results generalisable?), exhaustiveness (Did I capture all the relevant units?), cleanness (How many irrelevant units did I capture?), and scoping (How “big” is my set compared to others?), which does – of course – not invalidate results. It does, however, raise questions about the generality of derived claims, as case-based approaches only allow for sense-making from inside the sample and not in relation to the entire population of tweets. Each of these techniques also implies commitments to a priori conceptualisations of Twitter practices: snowball sampling presupposes coherent network topologies, marker-based sampling has to place a lot of faith in Twitter’s capacity to identify language or location, and topic-based samples consider words or hashtags to be sufficient identifiers for issues. Further, specific sampling techniques allow for studying issue or medium dynamics, and provide insights to the negotiation of topical concerns versus the specific use practices and medium operations on the platform. Following our interest in relations between sample, population and medium-specificity, we therefore turn to random sampling, and ask whether it allows to engage Twitter without commitments – or maybe different commitments? – to particular a priori conceptualisations of practices. Rather than framing the relation between this and other sampling techniques in oppositional terms, we explore in what way it might serve as baseline foil, investigating the possibilities for relating non-probability samples to the entire population, thereby embedding them in a “big picture” view that provides context and a potential for inductive reasoning and exploration. As we ground our arguments in the analysis of a concrete random sample, our approach can be considered experimental. Random Sampling with the Streaming API While much of the developer API features Twitter provides are “standard fare”, enabling third party applications to offer different interfaces to the platform, the so-called Streaming API is unconventional in at least two ways. First, instead of using the common query-response logic that characterises most REST-type implementations, the Streaming API requires a persistent connection with Twitter’s server, where tweets are then pushed in near real-time to the connecting client. Second, in addition to being able to “listen” to specific keywords or usernames, the logic of the stream allows Twitter to offer a form of data access that is circumscribed in quantitative terms rather than focussed on particular entities. The so called statuses/firehose endpoint provides the full stream of tweets to selected clients; the statuses/sample endpoint, however, “returns a small random sample of all public statuses” with a size of one percent of the full stream. (In a forum post, Twitter’s senior partner engineer, Taylor Singletary, states: “The sample stream is a random sample of 1% of the tweets being issues [sic] publicly.”) If we estimate a daily tweet volume of 450 million tweets (Farber), this would mean that, in terms of standard sampling theory, the 1% endpoint would provide a representative and high resolution sample with a maximum margin of error of 0.06 at a confidence level of 99%, making the study of even relatively small subpopulations within that sample a realistic option. While we share the general prudence of boyd and Crawford when it comes to the validity of this sample stream, a technical analysis of the Streaming API indicates that some of their caveats are unfounded: because tweets appear in near real-time in the queue (our tests show that tweets are delivered via the API approx. 2 seconds after they are sent), it is clear that the system does not pull only “the first few thousand tweets per hour” (boyd and Crawford 669); because the sample is most likely a simple filter on the statuses/firehose endpoint, it would be technically impractical to include only “tweets from a particular segment of the network graph” (ibid.). Yet, without access to the complete stream, it is difficult to fully assess the selection bias of the different APIs (González-Bailón, Wang, and Rivero). A series of tests in which we compared the sample to the full output of high volume bot accounts can serve as an indicator: in particular, we looked into the activity of SportsAB, Favstar_Bot, and TwBirthday, the three most active accounts in our sample (respectively 38, 28, and 27 tweets captured). Although Twitter communicates a limit of 1000 tweets per day and account, we found that these bots consistently post over 2500 messages in a 24 hour period. SportsAB attempts to post 757 tweets every three hours, but runs into some limit every now and then. For every successful peak, we captured between five and eight messages, which indicates a pattern consistent with a random selection procedure. While more testing is needed, various elements indicate that the statuses/sample endpoint provides data that are indeed representative of all public tweets. Using the soon to be open-sourced Digital Methods Initiative Twitter Capture and Analysis Toolset (DMI-TCAT) we set out to test the method and the insights that could be derived from it by capturing 24 hours of Twitter activity, starting on 23 Jan. 2013 at 7 p.m. (GMT). We captured 4,376,230 tweets, sent from 3,370,796 accounts, at an average rate of 50.65 tweets per second, leading to about 1.3GB of uncompressed and unindexed MySQL tables. While a truly robust approach would require a longer period of data capture, our main goal – to investigate how the Streaming API can function as a “big picture” view of Twitter and as baseline for other sampling methods – led us to limit ourselves to a manageable corpus. We do not propose our 24-hour dataset to function as a baseline in itself, but to open up reflections about representative metrics and the possibilities of baseline sampling in general. By making our scripts public, we hope to facilitate the creation of (background) samples for other research projects. (DMI-TCAT is developed by Erik Borra and Bernhard Rieder. The stream capture scripts are already available at https://github.com/bernorieder/twitterstreamcapture.) A Day of Twitter Exploring how the Twitter one percent sample can provide us with a contrast foil against other collection techniques, we suggest that it might allow to create relations between entire populations, samples and medium-specific features in different ways; as illustration, we explore four of them. a) Tweet Practices Baseline: Figure 1 shows the temporal baseline, giving indications for the pace and intensity of activity during the day. The temporal pattern features a substantial dip in activity, which corresponds with the fact that around 60% of all tweets have English language settings, which might indicate sleeping time for English-speaking users. Figure 1: temporal patterns Exploring the composition of users, the sample shows how “communicative” Twitter is; the 3,370,796 unique users we captured mentioned (all “@username” variants) 2,034,688 user accounts. Compared to the random sample of tweets retrieved by boyd et al. in 2009, our sample shows differences in use practices (boyd, Golder, and Lotan): while the number of tweets with hashtags is significantly higher (yet small in relation to all tweets), the frequency of URL use is lower. While these averages gloss over significant variations in use patterns between subgroups and languages (Poblete et al.), they do provide a baseline to relate to when working with a case-based collection. Tweets containing boyd et al. 2010 our findings a hashtag 5% 13.18% a URL 22% 11.7% an @user mention 36% 57.2% tweets beginning with @user 86% 46.8% Table 1: Comparison between boyd et al. and our findings b) Hashtag Qualification: Hashtags have been a focus of Twitter research, but reports on their use vary. In our sample, 576,628 tweets (13.18%) contained 844,602 occurrences of 227,029 unique hashtags. Following the typical power law distribution, only 25.8% appeared more than once and only 0.7% (1,684) more than 50 times. These numbers are interesting for characterising Twitter as a platform, but can also be useful for situating individual cases against a quantitative baseline. In their hashtag metrics, Bruns and Stieglitz suggest a categorisation derived from a priori discussions of specific use cases and case comparison in literature (Bruns and Stieglitz). The random sample, however, allows for alternative, a posteriori qualifying metrics, based on emergent topic clusters, co-appearance and proximity measures. Beyond purely statistical approaches, co-word analysis (Callon et al.) opens up a series of perspectives for characterising hashtags in terms of how they appear together with others. Based on the basic principle that hashtags mentioned in the same tweet can be considered connected, networks of hashtags can be established via graph analysis and visualisation techniques – in our case with the help of Gephi. Our sample shows a high level of connectivity between hashtags: 33.8% of all unique hashtags are connected in a giant component with an average degree (number of connections) of 6.9, a diameter (longest distance between nodes) of 15, and an average path length between nodes of 12.7. When considering the 10,197 hashtags that are connected to at least 10 others, the network becomes much denser, though: the diameter shrinks to 9 and the average path length of 3.2 indicates a “small world” of closely related topic spaces. Looking at how hashtags relate to this connected component, we detect that out of the 1,684 hashtags with a frequency higher than 50, 96.6% are part of it, while the remaining 3.4% are spam hashtags that are deployed by a single account only. In what follows, we focus on the 1,627 hashtags that are part of the giant component. Figure 2: Co-occurrence map of hashtags (spatialisation: Force Atlas 2; size: frequency of occurrence; colour: communities detected by modularity) As shown in Figure 2, the resulting network allows us to identify topic clusters with the help of “community” detection techniques such as the Gephi modularity algorithm. While there are clearly identifiable topic clusters, such as a dense, high frequency cluster dedicated to following in turquoise (#teamfollowback, #rt, #followback and #sougofollow), a cluster concerning Arab countries in brown or a pornography cluster in bright red, there is a large, diffuse zone in green that one could perhaps most fittingly describe as “everyday life” on Twitter, where food, birthdays, funny images, rants, and passion can coexist. This zone – the term cluster suggesting too much coherence – is pierced by celebrity excitement (#arianarikkumacontest) or moments of social banter (#thingsidowhenigetbored, #calloutsomeonebeautiful) leading to high tweet volumes. Figures 3 and 4 attempt to show how one can use network metrics to qualify – or even classify – hashtags based on how they connect to others. A simple metric such as a node’s degree, i.e. its number of connections, allows us to distinguish between “combination” hashtags that are not topic-bound (#love, #me, #lol, #instagram, the various “follow” hashtags) and more specific topic markers (#arianarikkumacontest, #thingsidowhenigetbored, #calloutsomeonebeautiful, #sosargentinosi). Figure 3: Co-occurrence map of hashtags (spatialisation: Force Atlas 2; size: frequency of occurrence; colour (from blue to yellow to red): degree)Figure 4: Hashtag co-occurrence in relation to frequency Another metric, which we call “user diversity”, can be derived by dividing the number of unique users of a hashtag by the number of tweets it appears in, normalised to a percentage value. A score of 100 means that no user has used the hashtag twice, while a score of 1 indicates that the hashtag in question has been used by a single account. As Figures 5 and 6 show, this allows us to distinguish hashtags that have a “shoutout” character (#thingsidowhenigetbored, #calloutsomeonebeautiful, #love) from terms that become more “insisting”, moving closer to becoming spam. Figure 5: Co-occurrence map of hashtags (spatialisation: Force Atlas 2; size: frequency of occurrence; colour (from blue to yellow to red): user diversity) Figure 6: Hashtag user diversity in relation to frequency All of these techniques, beyond leading to findings in themselves, can be considered as a useful backdrop for other sampling methods. Keyword- or hashtag-based sampling is often marred by the question of whether the “right” queries have been chosen; here, co-hashtag analysis can easily find further related terms – the same analysis is possible for keywords also, albeit with a much higher cost in computational resources. c) Linked Sources: Only 11% of all tweets contained URLs, and our findings show a power-law distribution of linked sources. The highly shared domains indicate that Twitter is indeed a predominantly “social” space, with a high presence of major social media, photo-sharing (Instagram and Twitpic) and Q&A platforms (ask.fm). News sources, indicated in red in figure 7, come with little presence – although we acknowledge that this might be subject to daily variation. Figure 7: Most mentioned URLs by domain, news organisations in red d) Access Points: Previously, the increase of daily tweets has been linked to the growing importance of mobile devices (Farber), and relatedly, the sample shows a proliferation of access points. They follow a long-tail distribution: while there are 18,248 unique sources (including tweet buttons), 85.7% of all tweets are sent by the 15 dominant applications. Figure 8 shows that the Web is still the most common access point, closely followed by the iPhone. About 51.7% of all tweets were sent from four mobile platforms (iPhone, Android, Blackberry, and Twitter’s mobile Web page), confirming the importance of mobile devices. This finding also highlights the variety and complexity of the contexts that Twitter practices are embedded in. Figure 8: Twitter access points Conclusion Engaging with the one percent Twitter sample allows us to draw three conclusions for social media mining. First, thinking of sampling as the making of “knowable objects of knowledge” (Uprichard 2), it entails bringing data points into different relations with each other. Just as Mackenzie contends in relation to databases that it is not the individual data points that matter but the relations that can be created between them (Mackenzie), sampling involves such bringing into relation of medium-specific objects and activities. Small data collection techniques based on queries, hashtags, users or markers, however, do not relate to the whole population, but are defined by internal and comparative relations, whilst random samples are based on the relation between the sample and the full dataset. Second, thinking sampling as assembly, as relation-making between parts, wholes and the medium thus allows research to adjust its focus on either issue or medium dynamics. Small sample research, we suggested, comes with an investment into specific use scenarios and the subsequent validity of how the collection criteria themselves are grounded in medium specificity. The properties of a “relevant” collection strategy can be found in the extent to which use practices align with and can be utilised to create the collection. Conversely, a mismatch between medium-specific use practices and sample purposes may result in skewed findings. We thus suggest that sampling should not only attend to the internal relations between data points within collections, but also to the relation between the collection and a baseline. Third, in the absence of access to a full sample, we propose that the random sample provided through the Streaming API can serve as baseline for case approaches in principle. The experimental study discussed in our paper enabled the establishment of a starting point for future long-term data collection from which such baselines can be developed. It would allow to ground a priori assumptions intrinsic to small data collection design in medium-specificity and user practices, determining the relative importance of hashtags, URLs, @user mentions. Although requiring more detailed specification, such accounts of internal composition, co-occurrence or proximity of hashtags and keywords may provide foundations to situate case-samples, to adjust and specify queries or to approach hashtags as parts of wider issue ecologies. To facilitate this process logistically, we have made our scripts freely available. We thus suggest that sampling should not only attend to the internal or comparative relations, but, if possible, to the entire population – captured in the baseline – so that medium-specificity is reflected both in specific sampling techniques and the relative relevance of practices within the platform itself. Acknowledgements This project has been initiated in a Digital Methods Winter School project called “One Percent of Twitter” and we would like to thank our project members Esther Weltevrede, Julian Ausserhofer, Liliana Bounegru, Guilio Fagolini, Nicholas Makhortykh, and Lonneke van der Velden. Further gratitude goes to Erik Borra for his useful feedback and work on the DMI-TCAT. Finally, we would like to thank our reviewers for their constructive comments. References boyd, danah, and Kate Crawford. “Critical Questions for Big Data.” Information, Communication & Society 15.5 (2012): 662–679. ———, Scott Golder, and Gilad Lotan. “Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter.” 2010 43rd Hawaii International Conference on System Sciences. IEEE, (2010). 1–10. Bruns, Axel, and Stefan Stieglitz. “Quantitative Approaches to Comparing Communication Patterns on Twitter.” Journal of Technology in Human Services 30.3-4 (2012): 160–185. Bryman, Alan. Social Research Methods. Oxford University Press, (2012). Burgess, Jean, and Axel Bruns. “Twitter Archives and the Challenges of ‘Big Social Data’ for Media and Communication Research.” M/C Journal 15.5 (2012). 21 Apr. 2013 ‹http://journal.media-culture.org.au/index.php/mcjournal/article/viewArticle/561›. Callon, Michel, et al. “From Translations to Problematic Networks: An Introduction to Co-word Analysis.” Social Science Information 22.2 (1983): 191–235. Cha, Meeyoung, et al. “Measuring User Influence in Twitter: The Million Follower Fallacy.” ICWSM ’10: Proceedings of the International AAAI Conference on Weblogs and Social Media. (2010). Farber, Dan. “Twitter Hits 400 Million Tweets per Day, Mostly Mobile.” cnet. (2012). 25 Feb. 2013 ‹http://news.cnet.com/8301-1023_3-57448388-93/twitter-hits-400-million-tweets-per-day-mostly-mobile/›. Garcia-Gavilanes, Ruth, Daniele Quercia, and Alejandro Jaimes. “Cultural Dimensions in Twitter: Time, Individualism and Power.” (2006). 25 Feb. 2013 ‹http://www.ruthygarcia.com/papers/cikm2011.pdf›. Gilbert, Nigel. Researching Social Life. Sage, 2008. Gillespie, Tarleton. “The Politics of ‘Platforms’.” New Media & Society 12.3 (2010): 347–364. González-Bailón, Sandra, Ning Wang, and Alejandro Rivero. “Assessing the Bias in Communication Networks Sampled from Twitter.” 2012. 3 Mar. 2013 ‹http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2185134›. Gorard, Stephan. Quantitative Methods in Social Science. London: Continuum, 2003. Hayles, N. Katherine. My Mother Was a Computer: Digital Subjects and Literary Texts. Chicago: University of Chicago Press, 2005. Hong, Lichan, Gregorio Convertino, and Ed H Chi. “Language Matters in Twitter : A Large Scale Study Characterizing the Top Languages in Twitter Characterizing Differences Across Languages Including URLs and Hashtags.” Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (2011): 518–521. Huang, Jeff, Katherine M. Thornton, and Efthimis N. Efthimiadis. “Conversational Tagging in Twitter.” Proceedings of the 21st ACM Conference on Hypertext and Hypermedia – HT ’10 (2010): 173. Krishnamurthy, Balachander, Phillipa Gill, and Martin Arlitt. “A Few Chirps about Twitter.” Proceedings of the First Workshop on Online Social Networks – WOSP ’08. New York: ACM Press, 2008. 19. Krishnaiah, P R, and C.R. Rao. Handbook of Statistics. Amsterdam: Elsevier Science Publishers, 1987. Langlois, Ganaele, et al. “Mapping Commercial Web 2 . 0 Worlds: Towards a New Critical Ontogenesis.” Fibreculture 14 (2009): 1–14. Mackenzie, Adrian. “More Parts than Elements: How Databases Multiply.” Environment and Planning D: Society and Space 30.2 (2012): 335 – 350. Marres, Noortje, and Esther Weltevrede. “Scraping the Social? Issues in Real-time Social Research.” Journal of Cultural Economy (2012): 1–52. Marwick, Alice, and danah boyd. “To See and Be Seen: Celebrity Practice on Twitter.” Convergence: The International Journal of Research into New Media Technologies 17.2 (2011): 139–158. Montfort, Nick, and Ian Bogost. Racing the Beam: The Atari Video Computer System. MIT Press, 2009. Noy, Chaim. “Sampling Knowledge: The Hermeneutics of Snowball Sampling in Qualitative Research.” International Journal of Social Research Methodology 11.4 (2008): 327–344. Papagelis, Manos, Gautam Das, and Nick Koudas. “Sampling Online Social Networks.” IEEE Transactions on Knowledge and Data Engineering 25.3 (2013): 662–676. Paßmann, Johannes, Thomas Boeschoten, and Mirko Tobias Schäfer. “The Gift of the Gab. Retweet Cartels and Gift Economies on Twitter.” Twitter and Society. Eds. Katrin Weller et al. New York: Peter Lang, 2013. Poblete, Barbara, et al. “Do All Birds Tweet the Same? Characterizing Twitter around the World Categories and Subject Descriptors.” 20th ACM Conference on Information and Knowledge Management, CIKM 2011, ACM, Glasgow, United Kingdom. 2011. 1025–1030. Rieder, Bernhard. “The Refraction Chamber: Twitter as Sphere and Network.” First Monday 11 (5 Nov. 2012). Rogers, Richard. The End of the Virtual – Digital Methods. Amsterdam: Amsterdam University Press, 2009. Savage, Mike, and Roger Burrows. “The Coming Crisis of Empirical Sociology.” Sociology 41.5 (2007): 885–899. Stalder, Felix. “Between Democracy and Spectacle: The Front-End and Back-End of the Social Web.” The Social Media Reader. Ed. Michael Mandiberg. New York: New York University Press, 2012. 242–256. Stieglitz, Stefan, and Nina Krüger. “Analysis of Sentiments in Corporate Twitter Communication – A Case Study on an Issue of Toyota.” ACIS 2011 Proceedings. (2011). Paper 29. Tumasjan, A., et al. “Election Forecasts with Twitter: How 140 Characters Reflect the Political Landscape.” Social Science Computer Review 29.4 (2010): 402–418. Tukey, John Wilder. Exploratory Data Analysis. New York: Addison-Wesley, 1977. Uprichard, Emma. “Sampling: Bridging Probability and Non-Probability Designs.” International Journal of Social Research Methodology 16.1 (2011): 1–11.
Article
Undoubtedly, the tourism industry is one of the most susceptible and vulnerable industries to crises. Recent major events that had devastating impacts on the industry ranges from natural disasters to epidemics, and from mismanagement to terrorist attacks. These kinds of episodes are not confined to any geographical region, as crises respect no political or cultural boundaries. Two major recent events illustrate this point: the BSE crisis in the UK in the 1990s, which was followed by the foot and mouth disease in 2000 and 2001, crippled the industry in several regions of England. Most recently, the events of September 11th in New York and Washington changed the way the industry operates forever. Crises are not new to the tourism industry. However, it has been observed that tourism management capability and ability to deal with complex and critical situations are limited.This paper discusses the concept of crisis management and its relevance to tourism. It presents an overview of the general trends in tourism crises events of the last two decades, assesses the impacts of major man-made crises on the industry, and argues for the importance of crisis management in tourism management. The paper also discusses the complex issue of crisis definition and its implications for organizations, and provides an operational definition of crisis management. Critical issues in crisis management, such as crisis anatomy, crisis incubation, risk perception in tourism and destination image, are discussed. Finally, the paper explores and analyses, in the context of crisis anatomy, the public sector handling of a major resort pollution crisis in Southern Brazil.