Conference PaperPDF Available

Integration and Dissemination of Citizen Reported and Seismically Derived Earthquake Information via Social Network Technologies

Authors:

Abstract and Figures

People in the locality of earthquakes are publishing anecdotal information about the shaking within seconds of their occurrences via social network technologies, such as Twitter. In contrast, depending on the size and location of the earthquake, scientific alerts can take between two to twenty minutes to publish. We describe TED (Twitter Earthquake Detector) a system that adopts social network technologies to augment earthquake response products and the delivery of hazard information. The TED system analyzes data from these social networks for multiple purposes: 1) to integrate citizen reports of earthquakes with corresponding scientific reports 2) to infer the public level of interest in an earthquake for tailoring outputs disseminated via social network technologies and 3) to explore the possibility of rapid detection of a probable earthquake, within seconds of its occurrence, helping to fill the gap between the earthquake origin time and the presence of quantitative scientific data.
Content may be subject to copyright.
Integration and Dissemination of Citizen Reported and
Seismically Derived Earthquake Information via Social
Network Technologies
Michelle Guy1, Paul Earle1, Chris Ostrum1, Kenny Gruchalla2, Scott Horvath3
1U.S. Geological Survey National Earthquake Information Center, Golden, CO, USA
2National Renewable Energy Laboratory, Golden, CO, USA
3U.S. Geological Survey National Earthquake Information Center, Reston, VA, USA
Abstract. People in the locality of earthquakes are publishing anecdotal
information about the shaking within seconds of their occurrences via social
network technologies, such as Twitter. In contrast, depending on the size and
location of the earthquake, scientific alerts can take between two to twenty
minutes to publish. We describe TED (Twitter Earthquake Detector) a system
that adopts social network technologies to augment earthquake response
products and the delivery of hazard information. The TED system analyzes data
from these social networks for multiple purposes: 1) to integrate citizen reports
of earthquakes with corresponding scientific reports 2) to infer the public level
of interest in an earthquake for tailoring outputs disseminated via social network
technologies and 3) to explore the possibility of rapid detection of a probable
earthquake, within seconds of its occurrence, helping to fill the gap between the
earthquake origin time and the presence of quantitative scientific data.
Keywords: Twitter, micro-blogging, social network, citizen reporting,
earthquake, hazard, geospatial-temporal data, time series
1. Introduction
Social network technologies are providing the general public with anecdotal
earthquake hazard information before scientific information has been published from
authoritative sources [1]. The United States Geological Survey (USGS) National
Earthquake Information Center (NEIC) rapidly determines the location and size of felt
earthquakes within the U.S. and most magnitude 5.0 and greater earthquakes
worldwide. The USGS rapidly disseminates this information to National and
international agencies, scientists and the general public. Due to the propagation time
of seismic energy from an earthquake’s hypocenter to globally-distributed
seismometers and the latencies in the collection, analysis, and validation of these
global seismic data, published scientific alerts can take between two and twenty
minutes to produce, depending on the size and location of the quake. In contrast,
people in the vicinity of earthquakes are publishing information within seconds of
their occurrence via social networking and micro-blogging technologies. This paper
describes how the analysis of geospatial-temporal data from social networking sites is
being adopted by the USGS in an attempt to augment its earthquake response
products and the delivery of hazard information. While the anecdotal and qualitative
information from social networking sites is not a replacement for the high quality
quantitative earthquake information from the USGS, mining and publishing this
rapidly available information can provide 1) integration of first hand hazard accounts
with scientific information, 2) a wide spread outreach tool and 3) potentially provide
early detections of reported shaking events.
TED (Twitter Earthquake Detector) is a software application developed to mine
real-time data from popular social networking and micro-blogging sites (e.g., Twitter,
Jaiku), searching for indicators of earthquake (or other hazard) activity directly from
the public. In addition, TED integrates traditional scientific earthquake information,
location and magnitude, from the USGS internal global earthquake data stream with
geospatial-temporal corresponding citizen reports from popular social networking and
micro-blogging sites. One indication of the level of public interest can be inferred
when the density of hazard-related chatter in time and a geographic locality
corresponds to that of an actual hazard event. When an earthquake is picked up from
the USGS internal global earthquake data stream, the system immediately integrates
citizen reported firsthand accounts of experienced shaking with the corresponding
scientific information. TED then uses these same social networking and micro-
blogging technologies to rapidly disseminate the combined scientific and citizen
information to a large number of people potentially already “listening”. Additionally,
analysts working on earthquake response products currently have only scientifically
derived location, corresponding population and magnitude information available in
the minutes following an earthquake. The rapid integration of firsthand hazard
accounts can potentially help guide the initial response actions taken to meet NEIC’s
mission.
This same detected increase in earthquake related chatter used to infer public
interest in an earthquake is being investigated for use as a real-time preliminary
indicator of a potential earthquake. Early work has indicated that such detections are
possible within seconds of an earthquake and could potentially be used to create
preliminary alerts (e.g., emails, pages, and micro-blog updates) for USGS operations
staff as an early hazard warning, thus filling the gap from when an earthquake occurs
until the time scientific data become available to then confirm or refute the reported
shaking event.
We describe the collection, filtering, archiving, and analysis of Twitter data and
show how these data can be effectively correlated against the USGS internal
earthquake stream as one indication of public interest in an earthquake. Integration of
these data successfully augments current earthquake response products produced by
the USGS. We also evaluate the usage of these Twitter data as a real-time hazard
detection tool. Preliminary results suggest that these data, if handled carefully, can be
useful as an early detection indicator.
2. Related Work
Twitter [2] is one of the more widely used micro-blogging platforms, with a global
outreach spreading from developed, urban nations to developing countries [3]. It
enables a form of blogging that allows users to send short status update messages
(maximum of a 140 characters) called tweets. Twitter provides access to thoughts,
activities, and experiences of millions of users in real-time, with the option of sharing
the user’s location. This rich source of data is motivating a growing body of scientific
literature about micro-blogging. Most of the work has focused on social aspects such
as studying user motivations [4,5] and user collaboration [6,7,8]. Some micro-
blogging collaboration research has focused specifically on crisis management and
collective problem solving in mass emergency events [9,10,11].
Our interest in the use of Twitter data is not for the crisis management that follows
a hazard event, rather it is in the rapid assessment, reporting, and potentially the near
real-time detection of a hazard event. De Longueville, et al. [12] performed a
postmortem analysis of tweets related to a wild fire near the French city of Marseille.
Their analysis showed that the Twitter traffic was generally well synchronized to the
temporal and spatial dynamics of the Marseille fire event, but warns that tweets from
media sources and aggregators (users that compile and republish existing sources)
will complicate automatic event detection. Intelligent blog-based event detection has
not been limited to hazard events. Online chatter has been used to predict the rank of
book sales [13] and recommend topical news items [14]. Cheong & Lee [3] describe a
general collective intelligence retrieval methodology that can be used to mine micro-
blogs to identify trends for decision-making.
The USGS has an established history of Internet-based citizen reporting using the
“Did You Feel It?” system (“DYFI?”) [15], which generates ground shaking intensity
maps based on volunteered Internet questionnaires. The DYFI questionnaires allow a
calibrated assignment of Modified Mercalli Intensity to each submission, producing
quantitative map of intensity. The Modified Mercalli Intensity scale [16] is based on
postal questionnaires where respondents summarize shaking effects, damage maps
produced by emergency response agencies, and reports produced by the earthquake
engineering community. The DYFI?” system provides a calibrated quantitative
assessment of an earthquake event; however, it depends on users visiting the USGS
website and completing a questionnaire. Collecting a sufficient amount of data to
generate an intensity map typically takes on the order of minutes. The data mined
from Twitter are neither calibrated nor quantitative; however, an earthquake can be
detected on the order of seconds and does not require direct interaction with the
USGS website.
3. Methodology
3.1. Gathering Data
TED harvests real-time tweets by establishing a continuous HTTP connection to
Twitter's Streaming API applying a query parameter to reduce the stream to only
tweets that contain one or more of the specified keywords: namely earthquake, quake
and tsunami in several languages. The stream of tweets returned from Twitter is in
JSON format which is then parsed locally and inserted into a MySQL database. All of
this runs 24x7 in multiple separated redundant processes, in order to compensate for
network interruptions or other failures.
In addition to the keyword filtering, other data cleaning techniques are applied to
the incoming tweets. Tweets from the multiple processes are merged, ordering the
tweets, accounting for duplicates, and filling any data gaps. Data from aggregators,
users who regularly redistribute second hand earthquake information, are removed
from the data set. The number of aggregator users has thus far remained below one
half of a percent of all users that have sent earthquake related tweets over the past five
months. Additionally, tweets containing strings commonly used to indicate retweeting,
rebroadcasting a tweet from another user, are removed. All of these removed tweets
are archived in a separate table in the database currently preserved for historical
analysis, as necessary.
For each keyword filtered tweet TED archives the tweet creation time, text,
Twitter user location, Twitter tweet ID, Twitter user ID, and the time the tweet was
inserted into the TED database. Additionally, after each tweet is inserted into the
database, the latitude and longitude estimate of the sender’s location, if provided, is
determined via the Google Maps API Geocoding Service [17] and stored with the
tweet. Roughly 15% of the earthquake related tweets that we have archived have
come from GPS enabled devices, generally providing very accurate locations at the
time of each tweet. Another 35% percent of the tweets have generic user locations
such as “123 A St. San Francisco, CA, USA”, or “San Francisco, CA, USA”, or “San
Francisco”, or “The Bay Area”. The remaining 50% of the tweets do not provide a
specific location and are not used by the TED system.
The TED system also ingests seismically derived earthquake information from the
USGS near real-time internal global earthquake stream 24x7. From these earthquake
messages TED archives the earthquake origin time, region name, hypocenter (latitude,
longitude, and depth), the magnitude, and the authoritative source of the scientifically
derived earthquake information. These scientific earthquake messages arrive
anywhere from two minutes up to around twenty minutes after an earthquake’s origin
time, depending on the size and location of the earthquake.
3.2. Integrating Seismically Derived and Citizen Reported Earthquake
Information
TED integrates firsthand public accounts of shaking with the corresponding
scientific information for an earthquake. For earthquakes in areas populated with
active Twitter users, TED can then gauge a potential level of public interest in that
earthquake by detecting a “significant” and rapid increase in the number of related
tweets. When an earthquake location, from the USGS near real-time internal global
earthquake stream, is inserted into the system, the tweet archive is searched for geo-
spatially and temporally correlated tweets. Geo-spatial correlation is determined by
computing distance from the hypocenter (latitude, longitude and depth) for which
ground shaking may have been felt. We define an earthquake’s possible felt area as
all points on the Earth’s surface whose hypo-central distance is less than an estimated
maximum felt distance Rf, in km, which is a function of magnitude M defined as:
Rf = 10 0.3204*M+0.602
(0)
We derived Rf empirically from felt observations submitted to the “Did You Feel
It?” system. This relation does not take into account such factors as spatial variation
in ground-motion attenuation and rupture finiteness. However, for our current system
this simple approximation has proved sufficient over the three months the system has
been running.
Temporal correlation is accomplished by collecting the tweets, in the TED archive,
from five minutes before the earthquake origin time up to the time when the data sets
are being integrated, which may range anywhere from two to sixteen minutes after the
origin time. The time frame before the event origin time measures the present noise
level. The time frame after the event is limited to a maximum of sixteen minutes to
help limit the input to context relative tweets with firsthand accounts of shaking rather
than much of the conversational chatter, retweets, media reports and geographically
wider spread reactions that occur in longer time frames following an earthquake.
3.3. Significance Detection
TED uses a geospatial-temporal correlated earthquake tweet data set to infer a
level of public interest in an earthquake. Since there are dozens of located, unfelt
earthquakes on the planet every day, a check for a significant increase in related
tweets helps prevent flooding users with information that they may not find useful and
cause users to ignore the information all together. We use a significance ratio
function, S, to determine if an earthquake has generated a significant increase in tweet
traffic to warrant public distribution. A trigger is declared if S exceeds one. The
significance ratio function accounts for the possibility of zero pre-event noise and is
defined:
S = A/(mB+Z)
(1)
Where A is the tweets-per-minutes after the event, B is the tweets-per-minute
before the event, Z is a constant that defines the required value for A when B is zero to
cause a trigger, and m is a constant that controls how much A must increase with
increasing noise levels to cause a trigger. For earthquakes with S greater than one,
the TED system produces 1) an alert tweet with hypocenter, preliminary magnitude,
and region, 2) an interactive map of the plotted epicenter and tweets, 3) a histogram of
tweets per time unit around the earthquake origin time, 4) a downloadable KML file
that plots tweets over time, 5) a list of the top ten cities with the highest number of
tweets, and 6) a web page that includes all of the above and the actual text for all
correlated tweets. The purpose of these integrated output products is to rapidly
provide a summary of personal accounts from the impacted region to earthquake
responders and the public. It is anticipated that TED products will be replaced as
validated and calibrated information becomes available. TED will also rapidly provide
information via Twitter (instead of only the web and email) and hopefully draw users
to the USGS website for detailed information. These output products can augment
current earthquake response information provided to USGS analysts and to the public.
3.4. Preliminary Hazard Detection
The analysis of the real-time spatio-temporal data being captured by the TED
system may also allow for the rapid detection of an earthquake before quantitative
scientific information is available. In fact, creating a time series of earthquake-related
tweets and monitoring this time series for spatio-temporal spikes is analogous to how
ground motion data from a seismometer are evaluated for earthquake activity. As a
proof of concept, three months of filtered and archived tweets were discretized per
time unit to create a time series of their temporal distribution. This time series was
then scanned for spikes, which are temporally correlated indications of a citizen
reported earthquake. The times of these spikes were then compared against the USGS
scientifically confirmed catalog of earthquake events [18] as confirmation of an actual
earthquake. The early results are promising however, more sophisticated heuristics
need to be defined from historical data analysis to better characterize these spikes of
chatter and further reduce false detections. This has been left for future work.
4. Difficulties and Issues
It is clear that significant limitations exist in a system based on citizen reporting.
The issues that tend to plague the system are lack of quantitative information, out of
context tweets, incorrect or lack of geo-locations, and the robustness of external data
sources such as Twitter and geo-locating services. The main drawback, because the
NEIC's mission is scientific information about earthquakes, is the lack of quantitative
information such as epicenter and magnitude. Without quantitative verified data, alerts
provoking response measures are not possible. The main advantage of Twitter is
speed, especially in sparsely instrumented areas.
Not all tweets containing the word earthquake or quake, in any language,
correspond to people feeling shaking caused by an earthquake. Analysis of data for
the past few months indicates that the background level of noise (out of context tweets
geographically and time clustered) is generally very low, except following major
earthquakes. For example, after the magnitude 4.3 Morgan Hill, CA earthquake on
March 30, 2009 the number of earthquake tweets sent from the surrounding region
increased from roughly one tweet per hour before the event to 150 tweets per minute
for a full five minutes after the event [1]. This is a signal to noise ratio of 9000.
However, background noise levels are not constant. For example, in the hours and
days following the magnitude 7 earthquake in Haiti in mid January 2010, people all
over the planet were tweeting about earthquakes. Fortunately, this kind of chatter is
generally not geographically centered and dies down a few days after the event.
However, there are other types of chatter that could produce geographically and time
centered “earthquake” tweets. For example, a geographically concentrated spike of
tweets was observed during the Great California Shake Out [19] in October 2009. One
can imagine a large enough group of twitter users enjoying a fun game of Quake
while eating Dairy Queen's Oreo Brownie Earthquake dessert producing misleading
data for an automated system.
Inaccurate tweet geo-locations are a serious issue when using geospatially related
tweets for threshold detections and to map indications of the region exposed to the
hazard, or shaking in the case of an earthquake. The location of a tweet is only as
accurate as the location string the user entered in their Twitter profile, as this is the
location provided with tweets. A location is not required to set up a Twitter account
and can be as vague or specific as the user wants. Some Twitter applications for GPS
enabled devices update the location string on a per tweet basis, this is about 15% of
the earthquake tweets we have seen in the past three months. However, most tweets
that provide a location use the static location in the user’s profile. Given this, a tweet
from a New Yorker on vacation in San Francisco will most likely mis-locate to New
York. Since these tweets are likely not spatially correlated, requiring a minimum
number of earthquake tweets in a region before declaring it a felt region will reduce
their contaminating effect. We expect that tweet location accuracy will increase with
time due to both the increased use of GPS enabled devices and Twitter’s introduction,
in November 2009, of its Geolocation API that will allow users to have their tweets
tagged with their current location.
Citizen reporting based hazard detection is only as good as the reporting. It is
conceivable that a motivated group of citizens could attempt to spoof a hazard. To
avoid “attacks” aimed at fooling the system, refined characterization of detection
spikes would help to reduce malicious attacks, but unlikely eliminate them.
5. Results and Evaluation
Analyzing the outputs produced from integrating geo-spatially and temporally
correlated citizen reports of earthquakes with seismically derived earthquake
information, confirms their potential to augment existing earthquake products
produced by the USGS. For example, Fig. 1 shows an interactive Google Map with
the earthquake epicenter and correlated tweets plotted. It provides an indication of
areas with perceived shaking and provides access to the geo-located tweets’ text.
Comparing the geospatial distribution of the tweets against the scientifically
Fig. 1. Earthquake epicenter (circled waveform) and geospatially and temporally corresponding
earthquake tweets (balloons) plotted on an interactive Google Map for the magnitude 4.3
earthquake in Southern California on January 16, 2010.
calibrated “DYFI?” intensity map indicates that the early arriving tweets can roughly
correspond with perceived shaking as shown in Fig. 2. This correlation is further
explored in Earle et al. 2010, [1].
Fig. 2. Comparison of the intensity map (upper left) produced using Internet questionnaires
submitted to the USGS “Did You Feel It?” system (DYFI?) [15] to maps produced by counting
geospatially and temporally correlated tweets (remaining plots at discrete time intervals after
the earthquake) for the magnitude 4.3 earthquake in the San Francisco Bay Area on March 30th,
2009. The colors of the plotted circles indicate the number of tweets in that region. Tweets with
precise latitude and longitude geo-locations are plotted as triangles.
Fig. 3. Histogram of number of earthquake tweets every thirty seconds before and after the
magnitude 3.7 earthquake in Pleasanton, CA on October 13, 2009 (local date), with earthquake
origin time indicated by the red vertical line at 2009-10-14 03:27:41 UTC.
At a glance, the main advantage of mining citizen reports via Twitter is the speed
of information availability, especially compared to areas that are sparsely
instrumented with seismometers. Even using data from hundreds of globally
distributed sensors we cannot detect many earthquakes below magnitude 4.5, due to a
lack of available local instrumentation. In limited cases, these earthquakes can be
identified. By manually scanning a real-time Twitter search for earthquake tweets, we
detected two earthquakes in 2009 that were missed by our real-time seismometer-
based earthquake association algorithm. The first was a magnitude 4.7 earthquake
near Reykjavik, Iceland. The second was a magnitude 3.1 earthquake near Melbourne,
Australia. These earthquakes likely would have been detected in days to weeks using
late arriving data and locations from contributing foreign seismic networks, however,
Twitter enabled quicker USGS distribution of earthquake magnitude and epicenter.
To further investigate the possibility of detecting earthquakes based on citizen
reports, we compared earthquake related tweet activity against the USGS earthquake
catalog. To do this comparison we created a time series of tweets-per-minute using a
month and a half of archived keyword filtered tweet data as shown in Fig. 4. We then
searched the time series for sudden increases in temporally related tweets and then
correlated these peaks with earthquakes. All of the major spikes, with the exception of
one on October 15th, coincide with earthquakes. The one on October 15th was an
emergency preparedness drill conducted by the state of California [19]. It is
interesting to note for this spike the onset was much more gradual than the onset for
earthquakes. Fig. 3 shows an example of the rapid onset for an actual earthquake.
Correct differentiation between rapid and gradual increases in tweet frequency may
reduce false detections. It is important to note that earthquakes detected by tweets
will only be those felt by human observers. There are dozens of located earthquakes
on any given day that are not felt, because they are too deep, and or in sparsely
populated areas. A tweet-based system will not detect such earthquakes. This
comparison has demonstrated a match of tweet-based detection with actual felt
earthquakes.
Fig. 4. Plotted time series of earthquake tweets per minute from September 20, 2009 through
November 8, 2009 with major spikes identified with corresponding earthquake region and
magnitude.
For a tweet based detection system to be viable, the number of false detections
needs to be low. Background noise from out of context earthquake tweets can
increase false detections. An evaluation of the background noise around geospatially
and temporally related tweets was performed by comparing the number of tweets
before and after a verified earthquake. For every event that TED picks up from the
USGS near real-time internal global earthquake stream, it calculates the average
number of geospatially and temporally correlated tweets-per-minute for ten minutes
prior and post the event origin time. Looking at this noise level for all earthquakes
from December 1, 2009 through January 27, 2010 (2291 total) 99% of the
earthquakes had 0.1 or less tweets-per-minute in the ten minutes prior to the event.
The remaining 1% were typically in the range of 0.1 to 1.8 tweets-per-minute for the
ten minutes prior to the earthquake. Figure 3 shows an example of an earthquake with
high pre-event noise and still the onset of the earthquake related tweets is clear. The
influence of background noise in a geospatially and temporally related tweet data set
is small.
In order for tweet based earthquake detection to be useful, it must precede the
availability of seismically derived information. From TED outputs we have seen
many earthquakes that show an increase in earthquake related tweet frequency that
precedes the availability of seismically derived data. For example, the USGS alert for
a small, magnitude 3.7, earthquake in a densely instrumented region in central
California was available 3.2 minutes after the earthquake’s origin time, while a
detectable increase in correlated tweet frequency was seen in 20 seconds. In a second
case, we examined a large, magnitude 7.6, earthquake in a moderately instrumented
region of Indonesia. Initial seismically derived information was available 6.7 minutes
after the earthquake’s origin time, while a rapid increase in "gempa" (Indonesian for
earthquake) tweets was seen in 81 seconds. In both cases, numerous earthquake
tweets were available in considerably less time than it took to distribute the initial
seismically derived estimates of magnitude and location. This demonstrates that rapid
tweet based detection can potentially fill the gap between when an earthquake occurs
and when seismically derived information is available.
6. Future Work
Future work includes developing a more sophisticated hazard detection algorithm
for the real-time input tweet stream. The current plan is to group keyword filtered
tweets into five or ten second time intervals as the tweets are being archived from the
Twitter stream in order to produce a continuous real-time tweet time series. A short
time interval was chosen to both reduce latency and to help make the time series
continuous around clustered earthquake chatter. From this real-time time series both a
long term average (LTA) and short term average (STA) will be calculated and used to
produce a second time series of STA/LTA ratio, which should further improve the
signal to noise ratio (just as it does for automatic seismic phase arrival detection from
ground motion time series data from seismometers). This noise reduced time series is
what will be monitored for significant increases in the temporal density of hazard
related chatter, with the goal of reducing false detections. Current evaluation has
shown significant increases in the temporal density of hazard related chatter with a
rapid, almost instantaneous, onset within seconds of an earthquake’s occurrence.
Heuristics need to be further refined from continued historical data analysis to better
characterize these spikes of chatter and further reduce false detections. This kind of
real-time time series analysis is quite similar to how real-time waveform time series
data from seismometers are monitored for seismic activity.
Additionally, further timing analysis is necessary to get a better handle on how
long after a hazard event, or categorized types of events (i.e. small earthquake in
densely populated area, large earthquake in a sparsely populated area, etc.) tweet
based hazard detection can work. We anticipate that relying on external sources for
services such as geocoding will be an inherent bottleneck, and may require more
robust or internal solutions for such services in order to meet timing requirements in
the long term. One step in reducing geocoding time dependencies will be
incorporating the use of Twitter’s newly added geolocation tags when provided with
incoming keyword filtered tweets. This will improve over time as more Twitter client
applications incorporate this feature. From a more detailed analysis of our current
system we hope to move from a proof of concept to an operational system with
defined expectations of reliability and accuracy.
7. Conclusions
While TED's detection and distribution of anecdotal earthquake information
cannot replace instrumentally based earthquake monitoring and analysis tools, we
have demonstrated that TED can integrate citizen reported and seismically derived
earthquake information and then, based on inferred degrees of interest, rapidly
disseminate the information to large numbers of people via social networking
technologies. Additionally, we have shown that mining and publishing this
information can fill the gap between the time an earthquake occurs and the time
confirmed scientific information is available. The anticipated impacts of this novel
use of social networking sites for the earth sciences include:
Rapid preliminary indicators of shaking or other hazards in populated areas,
potentially before the arrival of seismically validated alerts.
The ability to pool together and make readily accessible citizen contributed
earthquake information (e.g. eye witness reports, shaking felt, photos) from
individuals local to a hazard.
Improved public outreach by providing authoritative earthquake alerts thru
social media outlets.
Provide useful data and products that augment the existing suite of USGS
earthquake response products.
There is a high degree of uncertainty and variability in these data derived from
anecdotal micro-blogs. Therefore, TED outputs and notifications based on citizen
reporting alone cannot definitively state an earthquake occurred but will state that
social network chatter about earthquakes, has increased in a specified area at a
specified time and seismically derived information will follow. For more information
on this project, please e-mail USGSted@usgs.gov or follow @USGSted on Twitter.
Acknowledgments. Funding provided by the American Recovery and Reinvestment
Act supported a student, Chris Ostrum, for the development of the TED system.
Chris is currently at Sierra Nevada Corp, Englewood, CO, USA. We thank M. Hearne
and H. Bolton for internal USGS reviews of this manuscript. Any use of trade,
product, or firm names is for descriptive purposes!only and does not!imply
endorsement by the U.S. Government.
References
1. Earle, P., Guy, M., Buckmaster, R., Ostrum, C., Horvath, S., and Vaughan, A. “OMG
Earthquake! Can Twitter Improve earthquake response?” Seismological Research Letters To
appear (2010)
2. O'Reilly, Tim, and Sarah Milstein. The Twitter Book. O'Reilly Media, Inc., (2009)
3. Cheong, M., and Lee, V. “Integrating web-based intelligence retrieval and decision-making
from the twitter trends knowledge base.” In SWSM '09: Proceeding of the 2nd ACM workshop
on Social web search and mining, 1–8. (2009)
4. Zhao, D., and Rosson, M.B. “How and why people Twitter: the role that micro-blogging
plays in informal communication at work.” In Proceedings of the ACM 2009 international
conference on Supporting group work, 243-252. (2009)
5. Java, A., Song, X., Finin, T., and Tseng, B. “Why We Twitter: An Analysis of a Micro-
blogging Community.” In Advances in Web Mining and Web Usage Analysis, 118–138, (2009)
6. Honeycutt, C., and Herring S. “Beyond Microblogging: Conversation and Collaboration via
Twitter.” In HICSS '09: Proceedings of the 42nd Hawaii International Conference on System
Sciences, 1–10. (2009)
7. Dixon, J., and Tucker, C.R. “We use technology, but do we use technology?: using existing
technologies to communicate, collaborate, and provide support.” In SIGUCCS '09: Proceedings
of the ACM SIGUCCS fall conference on User services conference, 309–312. (2009)
8. McNely, B. “Backchannel persistence and collaborative meaning-making.” In SIGDOC '09:
Proceedings of the 27th ACM international conference on Design of communication, 297–304.
(2009)
9. Starbird, K., Palen, L., Hughes, A., and Vieweg, S. “Chatter on The Red: What Hazards
Threat Reveals about the Social Life of Microblogged Information.” In CSCW 2010:
Proceedings of the ACM 2010 Conference on Computer Supported Cooperative Work. (2010)
10. Hughes, A., and Palen, L. “Twitter Adoption and Use in Mass Convergence and Emergency
Events.” In ISCRAM 2009: Proceedings of the 2009 Information Systems for Crisis Response
and Management Conference. (2009)
11. Vieweg, S., Palen, L., Sophia, L., and Hughes, A. “Collective Intelligence in Disaster: An
Examination of the Phenomenon in the Aftermath of the 2007 Virginia Tech Shootings.” In
ISCRAM ‘08: Proceedings of the Information Systems for Crisis Response and Management
Conference. (2009)
12. De Longueville, B., Smith, R.S., and Luraschi, G. “"OMG, from here, I can see the
flames!": a use case of mining location based social networks to acquire spatio-temporal data
on forest fires.” In LBSN '09: Proceedings of the 2009 International Workshop on Location
Based Social Networks, 73–80. (2009)
13. Gruhl, D., Guha, R., Kumar, R., Novak, J., and Tomkins, A. “The predictive power of
online chatter.” In KDD '05: Proceedings of the eleventh ACM SIGKDD international
conference on Knowledge discovery in data mining, 78–87. (2005)
14. Phelan, O., McCarthy, K. and Smyth, B. “Using twitter to recommend real-time topical
news.” In RecSys '09: Proceedings of the third ACM conference on Recommender systems,
385–388. (2009)
15. Wald, D. J., Quitoriano, V., Dengler, L., and Dewey, J. W. “Utilization of the Internet for
Rapid Community Intensity Maps.” Seismological Research Letters 70, 680–697. (1999)
16. Wood, and Neumann. “Modified Mercalli Intensity Scale of 1931.” Bulletin of the
Seismological Society of America 21: 227-283. (1931)
17. http://code.google.com/apis/maps/documentation/geocoding/index.html
18. “Earthquakes.” http://earthquake.usgs.gov/earthquakes/
19. “The Great California Shake Out.” http://www.shakeout.org
... In the case of hurricanes, for example, studies have focused on descriptive, citywide evacuation statistics and correlative factors of evacuation by using post-disaster surveys with relatively small samples, such as the National Household Travel Survey 26,29,[35][36][37][38][39][40][41] . In an effort to capture the spatiotemporal dynamics of event response, some studies have used social media data like Facebook or Twitter to understand disaster-related online behavior changes and detect crisis regions during natural disaster events 15,[42][43][44][45][46][47] . While research using these digital traces can help to measure the overall impact of a disaster, social media data are characterized by representativeness bias and often require aggregate spatial resolutions, such as the county or city scale, to capture sufficient geo-tagged samples 15,[42][43][44][45][46][47] . ...
... In an effort to capture the spatiotemporal dynamics of event response, some studies have used social media data like Facebook or Twitter to understand disaster-related online behavior changes and detect crisis regions during natural disaster events 15,[42][43][44][45][46][47] . While research using these digital traces can help to measure the overall impact of a disaster, social media data are characterized by representativeness bias and often require aggregate spatial resolutions, such as the county or city scale, to capture sufficient geo-tagged samples 15,[42][43][44][45][46][47] . Despite the increased interest in mobility data by scholars in disaster management fields, limited attention has been paid to neighborhood-level evacuation and recovery patterns at scale and the disparate behavioral responses across communities with divergent socioeconomic and demographic characteristics. ...
Article
Full-text available
While conceptual definitions provide a foundation for the study of disasters and their impacts, the challenge for researchers and practitioners alike has been to develop objective and rigorous measures of resilience that are generalizable and scalable, taking into account spatiotemporal dynamics in the response and recovery of localized communities. In this paper, we analyze mobility patterns of more than 800,000 anonymized mobile devices in Houston, Texas, representing approximately 35% of the local population, in response to Hurricane Harvey in 2017. Using changes in mobility behavior before, during, and after the disaster, we empirically define community resilience capacity as a function of the magnitude of impact and time-to-recovery. Overall, we find clear socioeconomic and racial disparities in resilience capacity and evacuation patterns. Our work provides new insight into the behavioral response to disasters and provides the basis for data-driven public sector decisions that prioritize the equitable allocation of resources to vulnerable neighborhoods.
... Twitter is often useful for rapidly delivering and finding information on the latest events, particularly for a public agency needing to release time-sensitive information on an incoming or occurring disaster (Olofinlua and Murthy, 2019;Eriksson and Olsson, 2016). Twitter data have been analyzed to understand different aspects of emergency management including service attributes (Guy et al., 2010;Li and Rao, 2010), retweeting behavior (Kogan et al., 2015;Lin et al., 2016;Rao et al., 2020;Son et al., 2019), crisis awareness (Power et al., 2014Vieweg et al., 2010, online communication of emergency responders (Hughes et al., 2014;Hiltz et al., 2013;Nan and Lu, 2014), text categorization and event identification (Caragea et al., 2011;Imran et al., 2013;Kumar et al., 2014;Lachlan et al., 2019;Paul et al., 2012;Sakaki et al., 2010;Son et al., 2019), developing strategies for early crisis perception (Kryvasheyeu et al., 2015), and quantification of human mobility (Wang and Taylor, 2014;Roy et al., 2019). Previous studies focused on the role of Twitter users, tweet hashtags, and tweet keywords in enhancing situational awareness and community resilience during disasters. ...
... Using online media platforms, like Twitter, can quickly facilitate the sharing of information and voicing of opinions on local affairs and health issues [2][3][4][5]. COVID-19 has been a prominent topic on Twitter since January 2020 and is still being contested today. ...
Article
Full-text available
This study explores the application of sentiment analysis in business and entrepreneurship development by examining the sentiments of COVID-19-related tweets. We propose a transparent decision-making system that leverages Explainable Artificial Intelligence (XAI) to categorize tweet sentiments as positive or negative, offering valuable insights for businesses and entrepreneurs. A dataset of 264,800 tweets is collected using keywords such as #COVID-19, #coronavirus, #lockdown, #newcases, #stayhome, #covidpandemic, and #staysafe. Following preprocessing and annotation with TextBlob and the VADER dictionary, the data is represented using the TF-IDF vectorizer. We employ three Machine Learning (ML) algorithms-Random Forest, Logistic Regression, and Support Vector Machine-to assess the efficacy of the generated data. Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP) are utilized to enhance model interpretability. Results reveal that Logistic Regression combined with the ensemble of XAI and the TF-IDF feature extraction technique achieves the highest performance, with an average accuracy of 82% compared to other algorithms. This research highlights the potential of sentiment analysis in informing business and entrepreneurship strategies during times of crisis, such as the COVID-19 pandemic.
... Frequently, even traditional major press agencies collect information from these sources prior to breaking news and providing information, for example, they allowed to recognize in real time the outbreaks of infectious diseases [2] as in the case of Covid-19 [3], or to early detect the spread of seasonal influenza in order to organize containment measures [4][5][6]. OSNs were also used in the recent past to detect information about natural disasters, both to assist in managing rescue as well as in promptly warning affected populations to help as many people as possibile [7][8][9][10][11]. ...
Article
Full-text available
A real-time news spreading is now available for everyone, especially thanks to Online Social Networks (OSNs) that easily endorse gate watching, so the collective intelligence and knowledge of dedicated communities are exploited to filter the news flow and to highlight and debate relevant topics. The main drawback is that the responsibility for judging the content and accuracy of information moves from editors and journalists to online information users, with the side effect of the potential growth of fake news. In such a scenario, trustworthiness about information providers cannot be overlooked anymore, rather it more and more helps in discerning real news from fakes. In this paper we evaluate how trustworthiness among OSN users influences the news spreading process. To this purpose, we consider the news spreading as a Susceptible-Infected-Recovered (SIR) process in OSN, adding the contribution credibility of users as a layer on top of OSN. Simulations with both fake and true news spreading on such a multiplex network show that the credibility improves the diffusion of real news while limiting the propagation of fakes. The proposed approach can also be extended to real social networks.
... Social sensors' natural disaster situation awareness and event detection function depends on their advantages of wide distribution, real time, flexibility, and low deployment cost and is the most widely used function in natural disaster emergency management [82], [91]. The social sensor is important for constructing the natural disaster knowledge management system as a practical data perception approach for knowledge sharing, reuse, and decision-making in natural disaster environments [92]. ...
Article
Natural disasters are public emergencies characterized by suddenness, universality, and nonconventionality. Realizing the early warning, monitoring, and intervention of natural disasters and their derivative social impacts is significant for reducing the disasters’ damage and benefits the maintenance of social stability. Social sensors are ubiquitous sensors based on social network platforms. It uses the concepts and methods of physical space to mine social signals that integrate human perception and intelligence in cyberspace. Compared with traditional physical sensors, social sensors represent a crucial data acquisition channel in the emergency management of natural disasters and have the advantages of real time, comprehensive coverage, low cost, and flexible deployment. This article reviews the application of social sensors in natural disasters emergency management. We summarize the application functions of social sensors into three categories: natural disaster situation awareness and event detection, disaster information dissemination and communication, and disaster sentiment analysis and public opinion mining. Based on the above functions, this article analyzes the research status, data, technical methods, and application systems. Finally, this article proposes a research trend of applying social sensors in natural disaster emergency management according to the requirements of real scenarios.
... There is substantial empirical evidence that such large-scale data has the statistical power to reveal patterns of public opinion on ongoing societal issues (34). For instance, many researchers used Twitter to study the service characteristics (35,36), retweeting activity (37,38), situational awareness (39,40), online communication of emergency responders (41,42), text classification and event detection (43)(44)(45)(46)(47), devise sensor techniques for early awareness (48), human mobility (49,50), and disaster relief efforts (51). Recently, transportation researchers used these data sources extensively for problems related to human mobility patterns (52), origin-destination demand estimation (53)(54)(55)(56)(57), activity-pattern modelling (58)(59)(60)(61), social influence in activity patterns (62), travel survey methods (63,64), transit service characteristics (65), and crisis informatics (66) among others. ...
Preprint
Full-text available
The adoption of transportation policies that prioritized highway expansion over public transportation has disproportionately impacted minorities and low-income people by restricting their access to social and economic opportunities and thus resulting in residential segregation. Policymakers, transportation researchers, planners, and practitioners have started acknowledging the need to build a diverse, equitable, inclusive, and accessible (DEIA) transportation system. Traditionally, this has been done through survey-based approaches that are time-consuming and expensive. While there is recent attention on leveraging social media data in transportation, the literature is inconclusive regarding the use of social media data as a viable alternative to traditional sources to identify the latent DEIA indicators based on public reactions and perspectives on social media. This study utilized large-scale Twitter data covering eight counties around the New York City (NYC) area during the initial phase of the Covid-19 lockdown to address this research gap. Natural language processing techniques were used to identify transportation-related major DEIA issues for residents living around NYC by analyzing their relevant tweet conversations. The study revealed that citizens, who had negative sentiments toward the DEIA of their local transportation system, broadly discussed racism, income, unemployment, gender, ride dependency, transportation modes, and dependent groups. Analyzing the socio-demographic information based on census tracts, the study also observed that areas with a higher percentage of low-income, female, Hispanic, and Latino populations share more concerns about transportation DEIA on Twitter.
Preprint
Full-text available
This study analyzes the crisis communication posts of public agencies on Twitter during a major natural disaster, Hurricane Irma. Analyzing engagement metrics of tweets’ relationship with factors such as communication topics, tweet and Twitter account characteristics, and time of posting tweets, the research aims to uncover effective aspects for enhancing public engagement and response during extreme events. Using machine learning and statistical approaches, we analyze tweets from prominent public agencies active in response to Hurricane Irma. An engagement metric, such as the number of retweets, was used to measure the effectiveness of crisis communication posts. Results indicate that real-time updates on storm prediction, preparedness activity, user concern and recovery gained higher engagement. Besides posting frequent tweets by agencies, time of tweeting and an agency’s popularity measured by the number of likes and followers, level of the agencies (e.g., federal, or regional) are associated with higher engagement. The study emphasizes the significance of efficient messaging and clear communication in capturing public attention during crises. It provides valuable insights for public and emergency management agencies seeking to improve their crisis-related social media strategies, specifically on Twitter. This study also assists public agencies in refining their social media communication strategies for future crises by identifying key elements of successful engagement in social media.
Article
Full-text available
This article presents 14 quick tips to build a team to crowdsource data for public health advocacy. It includes tips around team building and logistics, infrastructure setup, media and industry outreach, and project wrap-up and archival for posterity.
Article
Full-text available
The U.S. Geological Survey (USGS) is investigating how the social networking site Twitter, a popular service for sending and receiving short, public, text messages, can augment its earthquake response products and the delivery of hazard information. The goal is to gather near real-time, earthquake-related messages (tweets) and provide geo-located earthquake detections and rough maps of the corresponding felt areas. Twitter and other social Internet technologies are providing the general public with anecdotal earthquake hazard information before scientific information has been published from authoritative sources. People local to an event often publish information within seconds via these technologies. In contrast, depending on the location of the earthquake, scientific alerts take between 2 to 20 minutes. Examining the tweets following the March 30, 2009, M4.3 Morgan Hill earthquake shows it is possible (in some cases) to rapidly detect and map the felt area of an earthquake using Twitter responses. Within a minute of the earthquake, the frequency of ``earthquake'' tweets rose above the background level of less than 1 per hour to about 150 per minute. Using the tweets submitted in the first minute, a rough map of the felt area can be obtained by plotting the tweet locations. Mapping the tweets from the first six minutes shows observations extending from Monterey to Sacramento, similar to the perceived shaking region mapped by the USGS ``Did You Feel It'' system. The tweets submitted after the earthquake also provided (very) short first-impression narratives from people who experienced the shaking. Accurately assessing the potential and robustness of a Twitter-based system is difficult because only tweets spanning the previous seven days can be searched, making a historical study impossible. We have, however, been archiving tweets for several months, and it is clear that significant limitations do exist. The main drawback is the lack of quantitative information such as epicenter, magnitude, and strong-motion recordings. Without quantitative data, prioritization of response measures, including building and infrastructure inspection, are not possible. The main advantage of Twitter is speed, especially in sparsely instrumented areas. A Twitter based system potentially could provide a quick notification that there was a possible event and that seismographically derived information will follow. If you are interested in learning more, follow @USGSted on Twitter.
Article
Full-text available
We report on the results of an investigation about the "informal," public-side communications that occurred in the aftermath of the April 16, 2007 Virginia Tech (VT) Shooting. Our on-going research reveals several examples of on-line social interaction organized around the goal of collective problem-solving. In this paper, we focus on specific instances of this distributed problem-solving activity, and explain, using an ethnomethodological lens, how a loosely connected group of people can work together on a grave topic to provide accurate results.
Conference Paper
Full-text available
The microblogging service Twitter is in the process of being appropriated for conversational interaction and is starting to be used for collaboration, as well. In an attempt to determine how well Twitter supports user-to-user exchanges, what people are using Twitter for, and what usage or design modifications would make it (more) usable as a tool for collaboration, this study analyzes a corpus of naturally-occurring public Twitter messages (tweets), focusing on the functions and uses of the @ sign and the coherence of exchanges. The findings reveal a surprising degree of conversationality, facilitated especially by the use of @ as a marker of addressivity, and shed light on the limitations of Twitter's current design for collaborative use.
Conference Paper
Full-text available
Twitter as a microblogging platform has vast potential to become a collective source of intelligence that can be used to obtain opinions, ideas, facts, and sentiments. This paper addresses the issue on collective intelligence retrieval with activated knowledge-base decision making. Our methodology differs from the existing literature in the sense that we are doing analysis on Twitter microblog messages as opposed to traditional blog analysis in the literature which deals with the conventional 'blogosphere'. Another key difference in our methodology is that we apply visualization techniques in conjunction with artificial intelligence-based data mining methods to classify messages dealing with the trend topic. Our methodology also analyzes demographics of the authors of such Twitter messages and attempt to map a Twitter trend into what's going on in the real world. Our findings reveal a pattern behind trends on Twitter, enabling us to see how it 'ticks' and evolves though visualization methods. Our findings also enable us to understand the underlying characteristics behind the 'trend setters', providing us a new perspective on the contributors of a trend.
Conference Paper
Full-text available
The emergence of innovative web applications, often labelled as Web 2.0, has permitted an unprecedented increase of content created by non-specialist users. In particular, Location-based Social Networks (LBSN) are designed as platforms allowing the creation, storage and retrieval of vast amounts of georeferenced and user-generated contents. LBSN can thus be seen by Geographic Information specialists as a timely and cost-effective source of spatio-temporal information for many fields of application, provided that they can set up workflows to retrieve, validate and organise such information. This paper aims to improve the understanding on how LBSN can be used as a reliable source of spatio-temporal information, by analysing the temporal, spatial and social dynamics of Twitter activity during a major forest fire event in the South of France in July 2009.
Article
INTRODUCTION The most common information available immediately following a damaging earthquake is its magnitude and the epicentral location. However, it is also desirable to know the extent of the felt area, and, more important, the range of shaking experienced and the areal extent of strongest shaking. For most of the United States, there is insufficient seismic strong-motion station coverage to portray quickly and accurately the extent of strong shaking. Seismic intensity has been traditionally used worldwide as a method for quantifying the shaking pattern and the extent of damage for earthquakes. Though developed prior to the advent of today's modern seismometric instrumentation, seismic intensity scales nonetheless provide a useful framework to describe, in a simplified fashion, the complexity of ground motions and the extent and nature of the damage. A limitation of traditional intensity mapping has been the long time required to generate a detailed seismic intensity map, typically weeks...
Chapter
Microblogging is a new form of communication in which users describe their current status in short posts distributed by instant messages, mobile phones, email or the Web. We present our observations of the microblogging phenomena by studying the topological and geographical properties of the social network in Twitter, one of the most popular microblogging systems. We find that people use microblogging primarily to talk about their daily activities and to seek or share information. We present a taxonomy characterizing the the underlying intentions users have in making microblogging posts. By aggregating the apparent intentions of users in implicit communities extracted from the data, we show that users with similar intentions connect with each other.
Conference Paper
An increasing fraction of the global discourse is migrating online in the form of blogs, bulletin boards, web pages, wikis, editorials, and a dizzying array of new collaborative technologies. The migration has now proceeded to the point that topics reflecting certain individual products are sufficiently popular to allow targeted online tracking of the ebb and flow of chatter around these topics. Based on an analysis of around half a million sales rank values for 2,340 books over a period of four months, and correlating postings in blogs, media, and web pages, we are able to draw several interesting conclusions.First, carefully hand-crafted queries produce matching postings whose volume predicts sales ranks. Second, these queries can be automatically generated in many cases. And third, even though sales rank motion might be difficult to predict in general, algorithmic predictors can use online postings to successfully predict spikes in sales rank.
Conference Paper
We are all familiar with the cliché of the shrinking budget, and the expectation to do more with less. The reality is that our customers demand excellent service, and they do not care about the cost of support. The Campus Information Technologies and Educational Services (CITES) Help Desk at the University of Illinois has implemented several strategies to provide a high quality of support to our customers, and to collaborate with our colleagues. In this paper, we will explore communication vectors, including chat based support, Twitter, Facebook, remote access, conference bridges, and web conferencing. When determining which channels to use several factors must be taken into account. Know your customer base and identify the existing technologies they use. Do not duplicate the services; instead, leverage them. Look for ways to save time. Does the situation call for everyone to be physically present? While no approach is perfect, creative thinking provides ways to remain connected with our clients and peers.