Social Distance Integrated Gravity Model for Evacuation Destination Choice
Yuqin Jianga*, Zhenlong Lia, Susan L. Cutterb
a Geoinformation and Big Data Research Laboratory, Department of Geography, University of
b Hazards and Vulnerability Research Institute, Department of Geography, University of South
Evacuation is an effective and commonly taken strategy to minimize death and injuries from an
incoming hurricane. For decades, interdisciplinary research has contributed to a better
understanding of evacuation behavior. Evacuation destination choice modeling is an essential
step for hurricane evacuation transportation planning. Multiple factors are identified associated
with evacuation destination choices, in which long-term social factors have been found essential,
yet neglected, in most studies due to difficulty in data collection. This study utilized long-term
human movement records retrieved from Twitter to (1) reinforce the importance of social factors
in evacuation destination choices, (2) quantify individual-level familiarity measurement and its
relationship with an individual’s destination choice, (3) develop a big data approach for
aggregated county-level social distance measurement, and (4) demonstrate how gravity models
can be improved by including both social distance and physical distance for evacuation
destination choice modeling.
Keywords: big data, disaster management, evacuation, social media, social distance
Hurricanes are one of the most common yet costliest natural hazards in the United States.
In 2016, hurricanes and associated heavy rainfall, storm surges, and strong winds caused death,
injuries, and economic losses to coastal areas in the United States (NOAA, n.d.). One of the
primary mechanisms for protecting people from impending hurricanes and their hazards is
evacuating the potentially affected area. Many disciplines, including geography, sociology,
engineering, and political science, have contributed to a better understanding of evacuation
behavior. Generally speaking, there are two different perspectives or foci for the research
(Trainor et al., 2013). Transportation engineering studies focus on routing and destinations
employing three-step models for evacuation: trip generation, trip distribution, and route
assignment. The trip generation step models the evacuating population size and their response
time (Herrera et al., 2019; Zhu et al., 2018). The second step, trip distribution, models where
trips end based on opportunities provided by each potential destination using origin-destination
metrics (et al., Bian et al., 2019; Cheng G. 2011). Then, in the last step, trips are assigned to
different routes to optimize infrastructure usage (Bayram & Yaman, 2018; Ukkusuri et al.,
Social scientists, however, focused more on individual- and household-level decision-
making, aiming to gain a better understanding of how different factors affect evacuation
decisions and behaviors. For example, studies have identified factors that affect evacuation
decisions, including vehicle ownership, age, income, housing type, and other factors (Dow &
Cutter, 1998; 2002; Huang et al., 2016). Also, multiple social factors have been found essential
to understanding evacuation time estimate (Lindell & Perry, 1992; Lindell & Prater, 2007),
departure time distribution (Huang et al., 2012; 2017), transportation mode (Lindell & Perry,
1992; Lindell & Prater, 2007), evacuation route choices (Dow & Cutter, 2002; Prater et al.,
2000), and other evacuation logistics, including travel time, destinations, and accommodation
(Bian et al., 2019; Lindell et al., 2011; Wu et al., 2012). Findings of social factors in evacuation
models have been reviewed by Lindell et al. (2019), Lindell et al. (2020), and Sorensen et al.
One challenge here is that some potentially useful social factors have not been examined
in previous evacuation research. This is especially true for long-term related social factors. For
example, how many counties has an individual visited in the last three years, and how much time
he/she has visited each county? One potential solution is to collect social media data to retrieve
digital footprints left by social media users. Social media data have been widely used in natural
hazard-related research to understand evacuation behavior (Kumar & Ukkusuri, 2018; Martin et
al., 2017; Sadri et al., 2017). Besides the ability to provide rapid and easily accessible data,
another advantage of social media is that long-term records can be retrieved. However, only
limited studies have utilized the long-term records from social media for evacuation behavior
studies (Jiang et al., 2019b).
This study aims to extend the functionality of social media data in evacuation behavior
studies by utilizing users’ long-term traveling records from Twitter. Specifically, this study asks
the following questions:
1) Are social media users more likely to evacuate to places they are familiar with?
2) How can social distance derived from social media data help to improve evacuation
To address these two questions, this study first reinforces findings from existing studies
that social factors do play important roles in evacuation destination choices by quantifying the
individual-level familiarity of each evacuated Twitter user, and then introduces a big data
approach to measure county-to-county social distance based on geotagged tweets. Lastly, this
study demonstrates how social distance can be integrated into gravity models to improve
evacuation transportation planning.
2. Literature Review
2.1 Human Mobility Measured by Distance
Power law is one of the most commonly used distributions to model displacement
distance in human movements. For example, the trip occurrence probability decreases as travel
distance increases (Eq. 1), with the power law written as
where is the trip occurrence probability, is the trip distance, is a constant, and is the
scaling parameter (Brockmann et al., 2006; Mandelbrot, 1983). Researchers further confirmed
that the scaling parameter should be larger than 1 and smaller than 3. When , the trip
occurrence probability forms an inverse proportional relationship with trip distance. When
, this movement is Brownian, where the length of trip exhibits a Gaussian distribution (Jiang et
Benefitting from the prevalence of Global Positioning System (GPS) and location-
enabled social media platforms, the availability of geotagged social media posts provides
researchers with opportunities to advance understanding of human mobility. Noulas et al. (2011)
collected 12 million user check-in data on Foursquare generated by more than 679,000 users in
111 days. This study provided an exploratory analysis of spatiotemporal distribution of users’
check-in locations for multiple categories of places. Similarly, Cheng Z. et al. (2011) collected
about 22 million check-in data from nine social media platforms and modeled individuals’ travel
distance with power-law distribution. Their power-law model agrees with previous human
mobility studies using non-social media data sources (Brockmann et al., 2006; Gonzalez et al.,
2008). In a cross-city study conducted by Noulas et al. (2012), more than 35 million trips were
retrieved from Foursquare check-in data in 31 cities from different countries. This study shows
that power law governs human mobility patterns in all the cities, though varies with city size
and population density.
Human mobility patterns were also studied during previous natural hazards using social
media data. Based on geotagged tweets, for example, Wang and Taylor (2014) examined New
York City residents’ daily travel patterns under the impact of Hurricane Sandy. This study found
that although an individual’s activity center was shifted, their daily travel distances still follow
power-law distribution. The shift of the activity center was caused by evacuation from flood-
prone areas to safer areas. However, the shift of activity center was not the focus of their study
and thus was not further examined. In another study by Wang and Taylor (2016), they further
confirmed their findings by testing whether human travel distance follows power-law distribution
under multiple natural hazards. They collected Twitter users’ movement data during four
typhoons (two in Japan and two in the Philippines), three earthquakes (in the Philippines, Chile,
and the U.S.), three winter storms (in Britain, Germany, and the U.S.), three extreme rainstorms
in the U.S., and two wildfires in Australia.
These studies provided two important contributions. First, although natural hazards
caused perturbation for human movements, an individual’s travel distance distribution was still
governed by the power-law distribution. Second, a shift of an individual’s activity space center
can be observed, but the relationship between traveling center shift distance and an individual’s
daily activity space was unrelated. These studies demonstrated the feasibility of using Twitter
data to study human mobility patterns during natural hazards and the fitness of power-law
distribution of travel distance during natural hazards. Also, the two studies by Wang and Taylor
(2014, 2016) revealed the shift in activity centers caused by hazard-related evacuations, but
patterns of such shifts were not further examined. This latter point raised the question about
whether evacuation distances of individuals still fit power-law distribution, which was examined
in this paper.
2.2 Evacuation Destination Choices
Existing studies have developed multiple origin-destination models for evacuation
transportation management at the aggregated geographical level. Evacuation destination choice
is an important factor in determining evacuation transportation distribution (Murray-Tuite &
Wolshon, 2013). Evacuation destination choice decisions vary among evacuees and are affected
by multiple factors. Among those factors, accommodation is an important one that may decide
evacuation destination. Common accommodation choices include, but are not limited to, friends’
relatives’ places, hotels/motels, and public shelters. Post-hurricane survey data showed that most
evacuees chose to stay at a friend’s or relative’s home, as illustrated from evacuation behavior
studies for Hurricane Floyd (Cheng et al., 2008) and Hurricane Ivan (Mesa-Arango et al., 2013).
To examine factors that affect evacuees’ destination choices, Cheng et al. (2008) developed
multinomial logit models for evacuees who went to friend’s/relative’s places and to
hotels/motels. Their study found influential variables, including evacuation distance, whether the
destination is affected by hurricane, population composition of destination, whether the
destination is in a metropolitan statistical area, transportation convenience, and the probability of
finding a place to stay at the destination. With evacuation-specified Traffic Analysis Zones
(TAZs), Wilmot et al. (2006) developed three models for zonal aggregated evacuation
destination choice. Comparing their gravity model, intervening opportunity model, and an
extended intervening opportunity model that considered evacuation direction and hurricane path,
only small differences were found (Wilmot et al., 2006). They further tested the transferability of
the gravity model. The gravity model calibrated using data from Hurricane Floyd in South
Carolina also worked with Hurricane Andrew in Louisiana (Wilmot et al., 2006). With the same
survey data from Hurricane Floyd in South Carolina, Cheng et al. (2011) further extended the
gravity model with dynamic features considering the storm path, road situation, and destination
accommodation availability. However, the transferability of this dynamic model has not been
tested (Cheng et al., 2011).
Current gravity-derived trip distribution models calculate opportunities to each
destination based on several factors, including pushing factors at origin, pulling at destination,
and travel distance between origin and destination. Social factors included in existing models
focus heavily on the pushing force at origins, such as information hurricane characteristics and
evacuees’ vehicle ownership, risk perceptions of local residents (Dow & Cutter, 2002; Lindell et
al., 2005), and pulling force at destinations, such as lodging options and cost (Lindell et al.,
Since most evacuation studies rely on survey data, very limited long-term travel
information can be collected. One of the advantages of using social media data is the availability
of long-term data. For example, Jiang et al. (2019b) revealed that evacuated social media users
have significantly larger long-term activity space than non-evacuated social media users.
3. Data and Study Area
3.1 Hurricane Matthew and Its Affected Area
Hurricane Matthew was formed on September 28, 2016, and rapidly developed into a
Category 5 storm, becoming the first Category 5 hurricane since 2007 in Atlantic basin (Stewart,
2017). It caused 585 direct deaths, with 34 in the United States. Based on the predicted track and
the intensity of Hurricane Matthew, coastal residents from Georgia, South Carolina, and North
Carolina were ordered to evacuate.
This study focuses on evacuation behavior of Twitter users living in these ten coastal
counties before Hurricane Matthew: Chatham, GA; Brunswick, NC; Beaufort, Berkeley,
Charleston, Colleton, Dorchester, Georgetown, Horry, and Jasper in SC (Figure 1).
Figure 1. Hurricane Matthew Path and the 10 Selected Counties
3.2 Data Collection and Preprocessing
Geotagged tweets were collected with the Twitter Stream Application Programming
Interface (API) between July 2016 and December 2016. Streamed tweets were stored in a
Hadoop environment and queried with Apache Impala in this study. We defined the resident
county of a Twitter user as the county from which the user has posted the largest number of
tweets (Martin et al., 2017; McNeill et al., 2017; Martin et al., 2020a; Jiang et al., 2019b). From
the massive Twitter dataset we collected, we identified local users whose resident county was
one of these 10 counties.
The Twitter selection process followed Martin et al. (2017) and Jiang et al. (2019b).
Based on the predicted path of Hurricane Matthew, the governor of South Carolina issued a
mandatory evacuation order on October 4th, 2016 (hereinafter “10/4”), followed by the governor
of North Carolina and the governor of Georgia, who issued evacuation orders on October 6th.
Given these evacuation orders, we considered the pre-evacuation time span as October 2nd, 2016
(hereinafter “10/2”) to 10/4, as evacuation was assumed to start after the evacuation order (10/4).
The selected counties were under the impact of Hurricane Matthew between October 7th
(hereinafter “10/7”) and October 9th (hereinafter “10/9”). We assumed the evacuation process
had finished before the arrival of Hurricane Matthew (10/7) and that evacuees would not return
before Hurricane Matthew had left (10/9). Therefore, we considered the post-evacuation time
span as 10/7 to 10/9.
If a local user identified from the previous step posted during both pre-evacuation and
post-evacuation periods, this user was collected for further analysis. Then, we compared each
user’s posting location during the pre-evacuation and post-evacuation periods. If this user posted
from within the 10 counties during pre-evacuation time and posted from outside the 10 counties
during post-evacuation time, this user was considered an evacuated user. Also, each user’s pre-
evacuation and post-evacuation locations must be at county level or finer to be considered a valid
location. Users with state-level locations were eliminated as they could not be located in a
All the users we collected so far were manually checked to insure they were real,
personally owned accounts. We used Twitter API to retrieve the most recent 3,200 tweets those
users had posted (3,200 was the maximum number of historical tweets allowed to be queried by
Twitter). Those who denied permission or deleted their accounts were eliminated from our
dataset. Eventually, we collected 1,286 evacuated users with accessible historical tweets for
4. Power-Law Distribution
Existing studies agree that power-law distribution governs individuals’ daily travel
distance distribution during multiple natural hazards (Wang & Taylor, 2014; 2016), but not many
tested whether evacuation distance follows power-law distribution (Martin et al., 2017). As
evacuation distance is one of the most important factors for evacuation transportation planning,
understanding the distribution of evacuation distance is an essential step.
The poweRlaw package (Gillespie, 2015) in R was used for this test. This package applies the
bootstrap method to search for the best parameter using maximum likelihood estimation (MLE).
The null hypothesis in this test was that evacuation distance distribution follows power-law
distribution. The bootstrap process converged at about 3,500 iterations and remained stable
through 5,000 iterations. The estimated value was with a 95% confidence interval
between 2.176 and 2.178. The scaling variable . The final estimation generated a
power-law distribution function as:
The p-value for the power-law fitness test was 0.526. As a result, we could not reject the
null hypothesis that evacuation distance distribution follows power-law distribution. Also, the
resulted agreed with previous human mobility studies that (Cheng et al.,
2011; Jurdak et al., 2015).
Figure 2 shows a histogram of evacuation distance distribution among the evacuees. There were
a few evacuees who traveled less than 50 miles. They stopped immediately after they left the
evacuation zone. Most of the evacuees evacuated to places about 150 miles away from coastal
areas. That is about the distance from Charleston to Columbia in South Carolina. As one of the
state evacuation strategies was to reverse lanes of Interstate Highway 26 (I-26) so that all the
lanes of I-26 were directed from Charleston to Columbia to accelerate the evacuation process,
29.2% of evacuees from Charleston ended in the Columbia metropolitan area.
Unlike ideal power-law distribution (the green line in Figure 2), the peak evacuation
distance appears at the range between 100 and 150 miles, rather than at the beginning. Also, a
bump can be found in Figure 2, about 600 miles at the distance. The power-law distribution of
evacuation distances implicitly assumes that hotels, shelters, and other accommodations are
uniformly distributed on a featureless space. However, in reality, accommodation opportunities
are nonuniformly distributed. This explains why the evacuation distance distribution pattern
differs from the power-law distribution.
Figure 2. Evacuation Distance Distribution
5. Familiarity Measurement Using Twitter Data for Destination Choice
Survey data collected from multiple hurricane evacuations reported that over half of
evacuees chose friends’ or relatives’ places as evacuation destinations (Mesa-Arango et al.,
2013; Bian et al., 2019; Lindell et al., 2019; Smith & McCarty, 2009). Mesa-Arango et al. (2013)
developed a household-level nested logit model to analyze demographic and socioeconomic
characteristics that affect evacuation destination choices based on survey data. Variables used in
the model were directly from survey, such as race, income, previous experience with hurricanes,
and whether need to work during evacuation (Mesa-Arango et al., 2013). Long-term travel
behaviors were undiscussed since they were directly unavailable from survey data. Bian et al.
(2019) tackled this problem using community-level data from the American Community Survey
(ACS) as a surrogate measurement for social factors. For example, length of living in the current
community was used to measure the social network size in that study.
This section examines the relationship between evacuation destination choices and long-
term social factors retrieved from social media. We used social media data to quantify the
familiarity of destination for evacuees, using the assumption that people who evacuated to
friends’ or relatives’ have a high degree of familiarity with that destination. Specifically, we
focused on all the places an individual had visited before, and the likelihood that this individual
would choose a place where he/she spent more time than a place he/she spent less time.
For all the evacuated users, we retrieved each user’s most recent 3,200 historical tweets;
the maximum number of historical tweets allowed to access using Twitter API. An independent
dataset was built to store each user’s historical tweets for further individual-level analysis. Then,
we applied three steps to test these two hypotheses. First, for each evacuated Twitter user, we
searched all the counties from which this user tweeted. Second, we retrieved all the available
tweets for users identified from the previous step. As all the users’ historical tweets can be traced
back to 2014, we identified how many days a user tweeted from each county since 2014 as
tweeting frequency. Third, we ranked familiarity for all the counties from which this user had
tweeted based on tweeting frequency identified in the previous step. For example, if an
individual user tweeted from County A 200 days and County B 100 days, for this specific user,
County A was ranked as the highest familiarity. If this user evacuated to County A, we counted
this user chose the first in rank.
We excluded evacuation origin county from the familiarity rank, so this rank represents
each user’s familiarity rank to evacuation destination county. In other words, the more days a
user tweeted from the county, the more likely this user would evacuate to the county. All the
counties an individual had been to were ranked. This process was applied to all evacuated users,
and their destination choice rank was summarized. Since evacuation origin county, the
residential county of each user, was excluded from the familiarity ranking, all the counties
included in the rank can be viewed as a potential evacuation destination for the specific user.
Figure 3. Evacuation Destination Popularity vs. Familiarity Rank
Figure 3 shows evacuees’ destination choices. The x-axis is the familiarity rank, and each
bar represents the percentage of evacuees who chose to evacuate to a county with a
corresponding familiarity rank. Among 1,286 evacuees, 82.4% (1,060 evacuees) chose to
evacuate to a county he/she had visited before. Specifically, 24.7% (318 evacuees) chose the
county with the highest familiarity rank as evacuation destination. Also, 22.9% (295 evacuees)
chose to evacuate to the county with the second highest familiarity rank. This result was further
tested with Spearman’s rank order correlation test (Spearman, 1904). This test resulted in p <
0.001, indicating that the evacuee number and familiarity rank are significantly negatively
correlated, whereby the former decreases with the latter’s increase.
This analysis illustrated that familiarity with places is associated with evacuation
destination decisions. Most evacuees chose their evacuation destination to be a county with a
high familiarity rank. The higher familiarity rank a county has, the more likely an evacuation trip
6. Improved Gravity Model
The gravity model is commonly used to model economic activities, trades, and human
travel between a pair of places (Kepaptsoglou et al., 2010; Lewer & Van den Berg, 2008;
Santana-Gallego et al., 2016). It can be written as Eq. 3:
. Eq. 3
When used for evacuation, represents the evacuation population from the origin to the
destination . and
are the total population sizes of the origin and the destination
respectively. The denominator part is a fringe function, interpreted as the cost from traveling
between the origin and the destination . In the traditional gravity model, the fringe function is
based on physical distance () between the origin and the destination . As power-law
distribution indicates, when the distance between two places increases, the probability of travel
occurrence decreases. is the gravitational constant, functioning as a scaling parameter. ,
and are heuristic parameters for the origin population (), the destination population (
the physical distance ().
Previous studies have examined the fitness of the gravity model and the intervening
opportunity model. The relationship between the gravity model and the intervening opportunity
model, and their extended forms, are reviewed by Chen (2005). Existing evacuation models only
consider the physical distance (
) as the difficulty of making the trip between each pair of
origin and destination. As indicated in Eq. 3, an increase in the physical distance decreases the
trip occurrence when other parameters are unchanged. We argued that social distance between a
pair of places also functions in such gravity-based evacuation models. An increase in the social
distance decreases trip occurrence when other parameters are unchanged. Section 6.2 provides a
test of how social distance improves the accuracy of traditional gravity model. In this study,
social distance was represented as the inverse of the familiarity measurement aggregated at
county level, which was calculated as a social connection measurement.
6.1 Social Connection Measurement
Social connection measurements have been developed and used by multiple urban studies
to measure the strength of connectivity between two places (Browning & Cagney, 2002; Zhong
et al., 2014). Among the different variables used in the social connection model, human
movements always play important roles, although different types of human movement data are
deployed in different measurements.
The social connection measurement developed in this study was based on travels
retrieved from Twitter users’ records. It represented the likelihood of a trip occurring between
the given two counties in the long term. It was based on the assumption that the more the travels
between two counties, the stronger the social connection between two counties, the shorter the
social distance, and therefore, the more likely an evacuation trip occurred. Specifically, the
measurement was calculated as the percentage of Twitter users traveled between the given two
counties based on geotagged tweets collected in a six-month period (July 2016 to December
2016) following Eq. 4.
where is the number of Twitter users found in both the origin county and the
destination county and is the total number of Twitter users in the origin county .
The calculation process involved four main steps. First, we identified users who sent
tweets from the 10 coastal counties in the six-month period. Second, for each individual user, we
found all the counties he/she had tweeted as a user’s active counties. If a user was active in more
than one county, this user built a connection between each pair of active counties. For example,
if a user posted geotagged tweets from County A, County B, and County C, connections were
strengthened between Counties A and B, between Counties B and C, and between Counties C
and A. In the third step, we aggregated to the county level. Since our focus was the social
connection between the 10 counties to other counties, connections between a pair of counties
within the 10 counties or a pair of counties not in the 10 counties were not calculated. Finally,
the results from the previous step were divided by the total Twitter user of the origin county to
standardize this measurement. For example, between July 1st and December 31st, 2016, we found
that 667 Twitter users had tweeted from both Brunswick County, NC, and Mecklenburg County,
NC, and that the total number of Twitter users found in Brunswick County was 25,150. In this
case, and . The social connection between Brunswick County
(Wilmington in Figure 4a) and Mecklenburg County (Charlotte in Figure 4a) was 2.65%, based
on Eq. 4.
Figure 4. The Social Connection of the Four Selected Counties
Figure 4 shows the social connections of (a) Brunswick County, NC, (b) Chatham
County, GA, (c) Charleston County, SC, and (d) Horry County, SC. For better visual illustration,
connections to some counties that are too weak to be visible or counties that are too far to be
included in this map scale level were eliminated in this figure. The width of the red line
represents the strength of social connection between two counties. In Figure 4a, Brunswick
County has the strongest connection with Mecklenburg County, stronger than connection with
other counties having shorter physical distances. Chatham County (Figure 4b) has the strongest
social connection with Fulton County, GA. Although some other counties have shorter physical
distance from Chatham County, social connection is actually weaker than the connection
between Fulton County and Chatham County. In Figure 4c, Charleston County has strong
connections with counties near Columbia, SC. It also has a relatively strong connection with
Fulton County, GA, and Orange County, FL. Both counties have larger physical distance than
counties in South Carolina, but social connections with Charleston County are stronger. Figure
4d shows the strongest connection Horry County, SC, has with Mecklenburg County, NC. Also,
it has relatively strong connections with counties near Nashville, TN. Figure 4 shows that social
connections are not proportional to physical driving distance. Therefore, when modeling
evacuation destination choice, the social connection should also be considered for inclusion in
the fringe function to better model human mobility.
6.2 Social Distance Integrated Gravity Model
Social connection was integrated into the fringe function of the gravity model as an
additional measurement of distance (considered as social distance) besides physical driving
distance (Eq. 5).
represents the county-to-county social connection between the origin and
destination . is the heuristic parameter and is the scaling factor.
Since the driving distance was used in this model as the physical distance, driving was
the only transportation mode we considered in this study. Therefore, counties exceeding 1000
miles away from the 10 origin counties were eliminated. Also, counties without observed
traveling with any of the 10 counties were also eliminated. Specifically, if a county did not
receive any evacuees during Hurricane Matthew and no common users were found with any of
the 10 counties in the 6-month period, this county was also eliminated even if it was within 1000
miles. This step eliminated 38 counties from the observed 326 destination counties and left 288
counties for model calibration. The social connection was calculated using the method described
in Section 6.1. A nonlinear optimization function was used in R to optimize scaling parameters
( and ) and heuristic parameters (, , , and ).
For comparison, we first optimized the traditional gravity model (Eq. 3). The
optimization was run in R with nonlinear model optimization. The optimized traditional model is
We conducted an exhaustive cross-validation of this model. This cross-validation process
included two rounds of leave-one-out cross-validation (Molinaro et al., 2005). This process re-
sampled all the data into training and testing datasets to avoid overfitting problems. We
organized the dataset into the following table (Figure 5), where each column is an evacuation
origin and each row is an evacuation destination.
Figure 5. Cross-validation Process
The first round is leave-one-row-out cross-validation. This includes multiple runs. In
each run, one row is left out as the test dataset. The remaining 287 rows are used to train the
model (Eq. 3). After the model was finished training in each run, the one being left out was used
to test model performance in this run. A root-mean-squared-error (RMSE) is calculated by
comparing the difference between the observed value and the output from the trained model.
Since we have 288 evacuation destinations, the first round of cross-validation includes 288 runs
and generates 288 RMSE values. The second round of cross validation is leave-one-column-out.
In this round, we leave one column out as the test dataset and use the remaining nine columns to
train the model (Eq. 6). Like the previous cross-validation round, an RMSE value is calculated in
each round. The second round includes 10 runs, as we have 10 evacuation origins. Therefore, 10
RMSE values were generated in the second round of cross-validation. After two rounds of leave-
one-out cross-validation, a total of 298 RMSE values were generated. The overall average of
RMSE for all the validation runs is 1.24, and the standard deviation is 1.68.
With the social distance integrated into the model, the improved gravity model is shown
in Eq. 5. Like the previous model optimization process, the nonlinear model optimization
procedure was run in R for the improved gravity model. The result is shown in Eq. 7.
Like the traditional gravity model, two rounds of leave-one-out cross-validation were
conducted to avoid the overfitting problem. After two rounds of cross validation, a total of 298
RMSE values were generated. The overall average RMSE was 0.80 and the standard deviation
Comparing these two models, the improved gravity model reduced the overall average
RMSE from 1.24 to 0.80, which was a 35% error reduction. In other words, the social distance
integrated gravity model shows an improvement of 35% accuracy in predicting evacuation
destinations comparing to the gravity model that only considered physical distance. This
demonstrates the utility of social distance in evacuation destination prediction models and can be
applied to practical applications, such as evacuation transportation planning.
7. Limitations and Future Research
Although the proposed model significantly reduced RMSE, we realized some limitations
to this research. The first is the Twitter representativeness issue (Jiang et al., 2019a; Malik et al.,
2015). Using Twitter data introduces population biases toward a certain group and may not
represent all populations with various demographic and socioeconomic characteristics. Although
the representativeness issue of social media data is recognized and recent studies have advanced
understanding of the demographic and socioeconomic characteristics of social media users using
a different method, no unanimous solution has been reached. One potential solution is to develop
a better sampling method that integrates both survey and social media data. For example, Martin
et al. (2020b) compared age, gender, and race between users collected from surveys and social
media in evacuation studies. Integrating multiple data sources and developing a better sampling
method are required for a better understanding of evacuation destination choices of different
The second limitation is variable choices. To demonstrate the functionality of social
distance, the models in this study were only modified regarding the distance (
) in the gravity
model (Eq. 5). Undoubtably, distance draws the most attention in evacuation transportation
planning, but it is not the only factor. Various other social factors identified in existing studies
also play important roles in evacuation destination choices, such as family size, hotel/motel
availability, financial budget, and more. These variables could be used to calibrate
in Eq. 5. The proposed model can potentially be further improved by integrating more variables
in the optimization function.
The third limitation concerns the evacuation transportation mode. This study eliminated
evacuees who traveled more than 1000 miles, a reasonable estimate of the maximum distance
that households would travel by car. However, people with long distance travel during
evacuation times were observed. Evacuees were observed to travel to the west coast, including
Los Angeles and Seattle. How to integrate the multiple transportation modes into the evacuation
model optimization process requires further investigation.
This study responded to the calls for interdisciplinary models for evacuation behavior
studies by improving current evacuation destination choice models through integrating social
distance with traditional gravity models. It offered a potential solution to the challenge of lacking
long-term data for essential social factors for evacuation behavior studies using a traditional data
collection method (e.g., survey). The main contributions of this study came from the following
three perspectives. First, this study reinforced and extended the important roles of social factors
in evacuation modeling by confirming that familiarity with a previously visited place was
associated with evacuation destination choice decisions. Second, it developed an approach to
quantitatively measure county-to-county social distance using geotagged tweets. Third, it
demonstrated how long-term social factors improved the evacuation destination choice model by
integrating social distance into the gravity model.
Evacuation mobility patterns are complicated. Hardly could one generic mathematic
model accurately represent such patterns. This study sheds light on how long-term traveling
information retrieved from social media can quantitatively improve current transportation
modeling for evacuation destination choice. With the increasing usage of social media during
time-critical situations, methodological development in related research areas should be pushed
further. Given the improvement observed in this study, we expected to see more studies using
hazard-related social media data for evacuation model improvement.
The research is supported by Office of the Vice President for Research, University of South
Carolina [grant number 13540-19-49772] and National Science Foundation (NSF) [grant number
2028791]. We thank the anonymous reviewers for their insightful comments that significantly
improved the manuscript.
Bayram, V., and H. Yaman. 2018. Shelter location and evacuation route assignment under
uncertainty: A benders decomposition approach. Transportation science, 52(2): 416-436.
Browning, C. R., and K. A. Cagney. 2002. Neighborhood structural disadvantage, collective
efficacy, and self-rated physical health in an urban setting. Journal of health and social
behavior, 43(4): 383-399.
Brockmann, D., L. Hufnagel, and T. Geisel. 2006. The scaling laws of human travel. Nature,
Bian, R., C. G. Wilmot, R. Gudishala, and E. J. Baker. 2019. Modeling household-level
hurricane evacuation mode and destination type joint choice using data from multiple
post-storm behavioral surveys. Transportation research part C: emerging
technologies, 99: 130-143
Chen, B. 2005. Modeling destination choice in hurricane evacuation with an intervening
opportunity model. Master thesis. Louisianan State University.
Cheng, G., C. G. Wilmot, and E. J. Baker. 2011. Dynamic gravity model for hurricane
evacuation planning. Transportation research record, 2234(1): 125-134.
Cheng, G., C. G. Wilmot, and E. J. Baker. 2008. A destination choice model for hurricane
evacuation. Proceedings of the 87th Annual Meeting Transportation Research Board: 13-
Cheng, Z., J. Caverlee, K. Lee, and D. Z. Sui. 2011. Exploring millions of footprints in location
sharing services. Proceeding of 5th International AAAI Conference on Weblogs and
Social Media (ICWSM ’11): 81-88.
Dow, K., and S. L. Cutter. 1998. Crying wolf: Repeat responses to hurricane evacuation orders.
Coastal Management, 26(4): 237-252.
Dow, K., and S. L. Cutter. 2002. Emerging hurricane evacuation issues: hurricane Floyd and
South Carolina. Natural hazards review, 3(1): 12-18.
Gillespie, C. S. 2015. Fitting Heavy Tailed Distributions: The poweRlaw Package. Journal of
Statistical Software, 64(2): 1–16.
Gonzalez, M. C., C. A. Hidalgo, and A. L. Barabasi. 2008. Understanding individual human
mobility patterns. Nature, 453(7196): 779-782.
Herrera, N., T. Smith, S. A. Parr, and B. Wolshon. 2019. Effect of trip generation time on
evacuation time estimates. Transportation research record, 2673(11): 101-113.
Huang, S.K., Lindell, M.K., Prater, C.S., Wu, H.C. & Siebeneck, L.K. 2012. Household
evacuation decision making in response to Hurricane Ike. Natural Hazards Review, 13,
Huang, S. K., M. K. Lindell, and C. S. Prater. 2016. Who leaves and who stays? A review and
statistical meta-analysis of hurricane evacuation studies. Environment and
Behavior, 48(8): 991-1029.
Huang, S-K, Lindell, M.K. & Prater, C.S. (2017). Toward a multi-stage model of hurricane
evacuation decision: An empirical study of Hurricanes Katrina and Rita. Natural Hazards
Review, 18(3), 05016008 1-15. DOI: 10.1061/(ASCE)NH.1527-6996.0000237.
Jiang, B., J. Yin, and S. Zhao. 2009. Characterizing the human mobility pattern in a large street
network. Physical Review E, 80(2): 021136.
Jiang, Y., Z. Li, and X. Ye. 2019a. Understanding demographic and socioeconomic biases of
geotagged twitter users at the county level. Cartography and Geographic Information
Science, 46(3): 228-242.
Jiang, Y., Z. Li, and S. L. Cutter. 2019b. Social network, activity space, sentiment, and
evacuation: what can social media tell us?. Annals of the American Association of
Geographers, 109(6): 1795-1810.
Jurdak, R., K. Zhao, J. Liu, M. AbouJaoude, M. Cameron, and D. Newth. 2015. Understanding
human mobility from Twitter. PloS one, 10(7): e0131469.
Kepaptsoglou, K., M. G. Karlaftis, and D. Tsamboulas. 2010. The gravity model specification
for modeling international trade flows and free trade agreement effects: a 10-year review
of empirical studies. The open economics journal, 3(1): 1-13.
Kumar, D., and S. V. Ukkusuri, 2018. Utilizing geo-tagged tweets to understand evacuation
dynamics during emergencies: A case study of Hurricane Sandy. Companion
Proceedings of the The Web Conference 2018: 1613-1620.
Lewer, J. J., and H. van den Berg. 2008. A gravity model of immigration. Economics
letters, 99(1): 164-167.
Lindell, M. K., J. C. Lu, and C. S. Prater. 2005. Household decision making and evacuation in
response to Hurricane Lili. Natural hazards review, 6(4): 171-179.
Lindell, M. K., Murray-Tuite, P., Wolshon, B. & Baker, E. J. 2019. Large-scale evacuation: The
analysis, modeling, and management of emergency relocation from hazardous areas. New
Lindell, M. K., J. E. Kang, and C. S. Prater. 2011. The logistics of household hurricane
evacuation. Natural hazards, 58(3): 1093-1109.
Lindell, M.K. & Perry, R.W. 1992. Behavioral Foundations of Community Emergency Planning.
Washington DC: Hemisphere Press.
Lindell, M.K. & Prater, C.S. 2007. Critical behavioral assumptions in evacuation analysis for
private vehicles: Examples from hurricane research and planning. Journal of Urban
Planning and Development, 133, 18-29.
Lindell, M.K., Sorensen, J.H. Baker, E.J. & Lehman, W.P. 2020. Community response to
hurricane threat: Estimates of household evacuation preparation time distributions.
Transportation Research D: Transport and Environment, 85, 102457. doi:
Malik, M. M., H. Lamba, C. Nakos, and J. Pfeffer. 2015. Population bias in geotagged tweets.
Paper presented at the Ninth international AAAI conference on web and social media,
Mandelbrot, B. B. 1983. The fractal geometry of nature. New York, USA: W. H. Freeman and
Martín, Y., Z. Li, and S. L. Cutter. 2017. Leveraging Twitter to gauge evacuation compliance:
Spatiotemporal analysis of Hurricane Matthew. PLoS one, 12(7): e0181701.
Martin, Y., Li, Z., & Ge, Y. 2020a. Towards real-time population estimates: introducing Twitter
daily estimates of residents and non-residents at the county level. arXiv preprint
Martín, Y., S. L. Cutter, and Z. Li. 2020b. Bridging twitter and survey data for evacuation
assessment of Hurricane Matthew and Hurricane Irma. Natural hazards review, 21(2):
McNeill, G., Bright, J., & Hale, S. A. (2017). Estimating local commuting patterns from
geolocated Twitter data. EPJ Data Science, 6(1), 24.
Mesa-Arango, R., S. Hasan, S. V. Ukkusuri, and P. Murray-Tuite. 2013. Household-level model
for hurricane evacuation destination type choice using hurricane Ivan data. Natural
hazards review, 14(1): 11-20.
Molinaro, A. M., R. Simon, and R. M. Pfeiffer. 2005. Prediction error estimation: a comparison
of resampling methods. Bioinformatics, 21(15): 3301-3307.
Murray-Tuite, P., and B. Wolshon. 2013. Evacuation transportation modeling: An overview of
research, development, and practice. Transportation Research Part C: Emerging
Technologies, 27: 25-45.
National Oceanic and Atmospheric Administration (NOAA). (n.d.). Hurricane Costs. Accessed
August 14, 2020. https://coast.noaa.gov/states/fast-facts/hurricane-costs.html
Noulas, A., S. Scellato, C. Mascolo, and M. Pontil. 2011. An empirical study of geographic user
activity patterns in foursquare. Paper presented at Fifth international AAAI conference on
weblogs and social media, 2011
Noulas, A., S. Scellato, R. Lambiotte, M. Pontil, and C. Mascolo. 2012. A tale of many cities:
universal patterns in human urban mobility. PloS one, 7(5): e37027.
Prater, C.S., Wenger, D., & Grady, K. 2000. Hurricane Bret Post Storm Assessment: A Review
of the Utilization of Hurricane Evacuation Studies and Information Dissemination. Texas
A&M University Hazard Reduction and Recovery Center, College Station TX. Available
Sadri, A. M., S. V. Ukkusuri, and H. Gladwin. 2017. Modeling joint evacuation decisions in
social networks: The case of Hurricane Sandy. Journal of choice modelling, 25: 50-60
Santana-Gallego, M., F. J. Ledesma-Rodríguez, and J. V. Pérez-Rodríguez. 2016. International
trade and tourism flows: An extension of the gravity model. Economic Modelling, 52:
Smith, S. K., and C. McCarty. 2009. Fleeing the storm(s): An examination of evacuation
behavior during Florida’s 2004 hurricane season. Demography, 46(1): 127-145.
Sorensen, J.H., Lindell, M.K., Baker, E.J. & Lehman, W.P. 2020. Community response to
hurricane threat: Estimates of warning issuance time distributions. Weather, Climate and
Society, 12, 837-846.
Spearman C. (1904). "The proof and measurement of association between two things". American
Journal of Psychology. 15 (1): 72–101. doi:10.2307/1412159. JSTOR 1412159
Stewart, S. R. 2017. National Hurricane Center tropical cyclone report: Hurricane Matthew (AL
142016). Accessed July 19, 2020.
Trainor, J. E., P. Murray-Tuite, P. Edara, S. Fallah-Fini, and K. Triantis, K. 2013.
Interdisciplinary approach to evacuation modeling. Natural Hazards Review, 14(3): 151-
Ukkusuri, S. V., S. Hasan, B. Luong, K. Doan, X. Zhan, P. Murray-Tuite, and W. Yin. 2017. A-
RESCUE: An Agent based regional evacuation simulator coupled with user enriched
behavior. Networks and Spatial Economics, 17(1): 197-223.
Wang, Q., and J. E. Taylor. 2016. Patterns and limitations of urban human mobility resilience
under the influence of multiple types of natural disaster. PLoS one, 11(1), e0147299.
Wang, Q., and J. E. Taylor. 2014. Quantifying human mobility perturbation and resilience in
Hurricane Sandy. PLoS one, 9(11), e112608
Wilmot, C. G., N. Modali, and B. Chen. 2006. Modeling hurricane evacuation traffic: testing the
gravity and intervening opportunity models as models of destination choice in hurricane
evacuation (No. FHWA/LA. 06/407). Louisiana State University. Department of Civil
and Environmental Engineering.
Wu, H. C., M. K. Lindell, and C. S. Prater. 2012. Logistics of hurricane evacuation in Hurricanes
Katrina and Rita. Transportation research part F: traffic psychology and
behaviour, 15(4): 445-461
Zhong, C., S. M. Arisona, X. Huang, M. Batty, and G. Schmitt. 2014. Detecting the dynamics of
urban structure through spatial network analysis. International Journal of Geographical
Information Science, 28(11): 2178-2199.
Zhu, Y., K. Xie, K. Ozbay, and H. Yang. 2018. Hurricane evacuation modeling using behavior
models and scenario-driven agent-based simulations. Procedia computer science, 130: