Conference PaperPDF Available

Agent-based modeling to simulate road travel using Big Data from smartphone GPS: An application to the continental United States


Abstract and Figures

Growing concerns about urban sustainability, economic and public health vitality, and climate change are common features across the world. Transportation is often inextricably linked to these concerns and this necessitates the development of robust and scalable tools that can assist in timely understanding of the agent-system interactions. Such expedient but accurate analyses are critical for policymaking, especially in the current environment where urban mobility is witnessing a rapid transformation. To support such analyses, we demonstrate a novel methodology that implements a top-down large-scale agent-based simulation of urban travel using Global Positioning System (GPS) derived raw sightings. Specifically, we constructed the daily activity and travel patterns of devices (i.e. agents) using GPS data for a single day (Wednesday, March 6, 2019) for the entire continental United States. Data filtering techniques were applied to identify approximately 2.7 million smart devices (out of a daily total of 30.5 million) that were highly visible and mobile. We sourced roadway network data for the entire North America from Open Street Maps (OSM). We then fed the daily activity and travel records of agents along with the roadway network data into MATSim, an agent-based travel simulator, to produce highly spatiotemporally resolved agent activities along with their estimated travel trajectories. We processed these travel trajectories (1.5 billion records) to estimate vehicle miles traveled (VMT) for each U.S. state and modeled vehicle volumes per roadway link in the continental U.S. Overall, we found strong rank correlations between our results and Federal Highway Administration’s VMT estimates, although absolute measures displayed a higher variability. We observed similar trends (i.e. low rank correlation errors but higher absolute errors) at the disaggregate roadway link level when comparing our extrapolated traffic volumes against roadway count station data from a select state (Florida). Finally, root mean squared error of our roadway volume estimates are comparatively similar to those for Florida’s regionwide travel demand models indicating a satisfactory model performance. The proposed methodology in our study demonstrates that such big data-powered large-scale agent-based simulations may provide value in estimating and predicting travel demand.
Content may be subject to copyright.
xxx-x-xxxx-xxxx-x/xx/$xx.00 © xxxx xxxx
Agent-based modeling to simulate road travel
using Big Data from smartphone GPS: An
application to the continental United States
Sashikanth Gurram1, Vijayaraghavan Sivaraman1, Jonathan T. Apple1, Abdul R. Pinjari2
1Research and Data Science, AirSage, Atlanta, USA
2Department of Civil Engineering, Indian Institute of Science, Bangalore, India
AbstractGrowing concerns about urban sustainability,
economic and public health vitality, and climate change are
common features across the world. Transportation is often
inextricably linked to these concerns and this necessitates the
development of robust and scalable tools that can assist in timely
understanding of the agent-system interactions. Such expedient
but accurate analyses are critical for policymaking, especially in
the current environment where urban mobility is witnessing a
rapid transformation. To support such analyses, we demonstrate
a novel methodology that implements a top-down large-scale
agent-based simulation of urban travel using Global Positioning
System (GPS) derived raw sightings. Specifically, we constructed
the daily activity and travel patterns of devices (i.e. agents) using
GPS data for a single day (Wednesday, March 6, 2019) for the
entire continental United States. Data filtering techniques were
applied to identify approximately 2.7 million smart devices (out of
a daily total of 30.5 million) that were highly visible and mobile.
We sourced roadway network data for the entire North America
from Open Street Maps (OSM). We then fed the daily activity and
travel records of agents along with the roadway network data into
MATSim, an agent-based travel simulator, to produce highly
spatiotemporally resolved agent activities along with their
estimated travel trajectories. We processed these travel
trajectories (1.5 billion records) to estimate vehicle miles traveled
(VMT) for each U.S. state and modeled vehicle volumes per
roadway link in the continental U.S. Overall, we found strong rank
correlations between our results and Federal Highway
Administration’s VMT estimates, although absolute measures
displayed a higher variability. We observed similar trends (i.e. low
rank correlation errors but higher absolute errors) at the
disaggregate roadway link level when comparing our extrapolated
traffic volumes against roadway count station data from a select
state (Florida). Finally, root mean squared error of our roadway
volume estimates are comparatively similar to those for Florida’s
regionwide travel demand models indicating a satisfactory model
performance. The proposed methodology in our study
demonstrates that such big data-powered large-scale agent-based
simulations may provide value in estimating and predicting travel
Keywordsgeospatial big data, agent-based modeling, daily
activities and travel, travel metrics
Growing cities where transportation choices and
infrastructure are rapidly evolving need robust tools that can
accurately predict travel demand [1]. This becomes especially
critical with the emergence of alternative travel modes and
services including ridesharing, delivery, and micro-mobility
services [2,3]. Quantifying such shifts in travel demand is
important as these disruptive innovations may significantly
impact urban sustainability and infrastructure, public health,
economic vitality, and the environment [4] [6].
Predominantly, the traditional four-step modeling
framework is widely used across the U.S. to understand shifts
in travel demand under different scenarios. In this framework,
a trip originating (or destined) from (to) a specific zone in a
study area forms the unit of analysis which allows for the
estimation of aggregate travel demand between zones in an
area. Although such framework may be suitable for long range
planning (example: lane addition), it restricts the ability to
understand individual sensitivities to emerging travel choices.
The activity-based framework, constructed from individual
daily activity-travel schedules allows one to evaluate the
aforementioned sensitivities, and subsequent shifts in aggregate
demand. However, the complexity of this modeling framework
makes analyses of scenarios with short turnaround times highly
resource intensive. Due to these constraints, numerous small
and medium sized communities (populations less than 500,000)
across the U.S. either do not have these frameworks or utilize
outdated four-step models [7]. Furthermore, both the four-step
and activity-based travel demand paradigms use survey data as
input to travel models. Although the surveys are well designed,
they suffer from two major issues. First, survey sample sizes
are generally smaller and hence they may not provide enough
spatial representation for high-resolution analyses. Second,
survey data is cross-sectional and so may be inadequate to
perform longitudinal analyses. Moreover, administering
surveys is a cost and time intensive effort. Considering this,
researchers have been exploring alternate data sources.
In the recent past, there have been a growing number of
studies using passive big location data originating from telecom
carriers, smartphone-based applications, and Bluetooth devices
for transportation applications. For example, call detail record
(CDR) data was used in the context of transportation planning
for transport mode imputation [8], aggregated travel origin-
destination (O-D) matrices [9,10], and link-level traffic volume
estimation [11]. Few research efforts have used the more
geospatially accurate Global Positioning System (GPS) data to
study transportation issues such as understanding transport
mode [12], route choice [13], activity and travel patterns of
individuals [14], and freight travel behavior [15,16]. Studies
involving GPS data are far fewer compared to those that used
CDR data, however the overall application of big data (i.e. both
CDR and GPS) for transport applications is in its infancy and
there is a need for more studies.
Our study builds on previous work in this area while
addressing the following limitations. Most of the prior studies
utilizing passive data (e.g. CDR, GPS) used aggregated O-D
trip estimates to understand travel. As mentioned earlier, trip-
level data as opposed to a full-day tour data may be limited in
capturing travel sensitivities to transport policy changes. In
addition, the sample sizes in studies that used GPS data,
especially those focusing on personal travel, are small because
they adopted in-house GPS-based data collection instruments
that may not have a wide reach. Thus, such studies may not
provide meaningful and spatially representative insights into
travel demand for an urban region. Finally, as most such passive
data or travel modeling implementations are restricted to a city
or urban region, they are incapable of answering questions on
variation in travel demand across geographies. This is
particularly important for rural geographies where travel
demand efforts are lacking or not up to date. To address these
issues, this paper proposes a novel method where we construct
device-level daily activity and travel schedules from
smartphone-based GPS data for the entire continental United
States on a select date. These estimated daily activity and travel
schedules are then fed as an input into MATSim, a framework
for large-scale agent-based transport simulation. Through this
exercise, this research aims to understand i) variation in the
spatial distribution of GPS sample across continental United
States (U.S.), ii) overall travel demand of continental U.S.
residents through aggregated travel metrics such as vehicle
miles traveled (VMT), iii) variation in travel demand across
different U.S. states, iv) uncertainties in our roadway volume
estimates for a specific U.S. state, and v) complexity involved
with estimating nationwide urban travel demand.
The paper is organized as follows. Section II presents
methodological details of the study including the spatial scope,
simulation tool, input datasets to the simulation (i.e. the activity
data derived from GPS sightings and the roadway network
data), simulation run specifications, and analysis and validation
approach for the results. Results and discussion pertaining to
the U.S. state-level GPS sample size distribution and travel
demand, and validation metrics are presented in section III.
Following this, limitations of the study along with directions for
future research are presented in section IV and conclusions are
presented in section V.
A. Study Area
We chose continental United States as our study area. This
includes all the U.S. states except for Hawaii and
unincorporated territories such as Puerto Rico. The study area
covers approximately 320 million residents of the U.S.
according to the 5-year estimates from the 2017 American
Community Survey [17].
B. Simulation Tool
MATSim, an agent-based large-scale framework for
transport simulations ( is used to simulate
daily activities and travel of the study area population.
MATSim is a traffic model that tracks agents’ activity and
travel decisions to provide a microscopic description of the
travel demand on the transportation system [18]. To start the
simulation, MATSim requires an initial travel demand (termed
plans), represented by a detailed chronological set of fixed-
location activities (e.g. home, work, shopping) and their
location coordinates, activity start times, and durations for each
agent. The simulator takes these initial plans and loads them
onto the travel network for routing. Agents’ spatiotemporal
movements (i.e. enter and exit timestamps for each roadway
link along the estimated travel route) along with their activity
participation (i.e. start and end timestamps for each fixed-
location activity) during the simulation are tracked by MATSim
and kept in memory as events. As agents go about their day and
travel between fixed-location activities, some agents may arrive
at their destinations on time while others may be delayed due to
network congestion. Activity participation is associated with a
positive utility score while travel and late arrivals are penalized
with a negative score. Thus, total utility score for an agent is
higher if they spend less time in travel, arrive at activity
locations on time or early, and spend more time participating in
activities. In addition to this scoring scheme, MATSim utilizes
a few innovation strategies, discussed under simulation
specifications (section E), to optimize the plans and arrive at a
relaxed state generally known as user equilibrium. This user
equilibrium may be defined as a state where a user may no
longer improve their score by unilaterally changing their
strategy. Further details on the input datasets to MATSim, e.g.
the activity data used to create initial plans and the
transportation network, are described in the following sections.
C. Activity Data
To create plans for MATSim we used AirSage’s
geolocation data for a single day: Wednesday, March 6, 2019
as shown in Fig. 1. As individuals access and use applications
on their smart devices (smartphones and tablets), the
applications may collect the spatial information (e.g. the
latitude and longitude associated with a particular sighting) of
the device along with the timestamp depending on the
‘permissions’ of the application. Typically, applications use the
inbuilt GPS of the smart device to collect the spatial coordinates
but may use the less accurate Wi-Fi-based location if GPS is
unavailable. These per-device sequential records of spatial
coordinates can help identify the spatiotemporal movements of
The single-day geolocation data for the continental U.S.
comprises approximately 30.5 million unique sample devices
with about 1.1 billion spatial points. It should be noted that this
data corresponds to the single day identified above, i.e.
Wednesday, March 6, 2019. However, not all these devices and
their data points can be used to generate a representative sample
of daily activity and travel patterns of a population. This is
because some devices tend to be visible (i.e. produce geospatial
pings) for a short period of time in a day, while others may not
be mobile enough to derive meaningful travel patterns.
MATSim as an agent-based microsimulator, needs the full day
or near a full day of fixed-location activity records. This
requires devices that are both visible for a significant portion of
the day and are also mobile. Hence, we sampled highly visible
and mobile devices from the raw dataset to construct plans for
MATSim using heuristics as discussed below.
To identify the highly visible sample, we estimated the
number of unique hours a device is visible on a given day. For
example, a device seen at 3:47 am and 11:23 pm on a given day
is considered to be visible for 2 unique hours. In a previous
research effort, we identified devices that were visible for at
least 16 unique hours (out of 24) in a specific day to have a
relatively stable pollutant exposure distribution [19]. Drawing
on this effort, we excluded all the devices that were visible for
less than 20 unique hours on the day of analysis.
Next, to build a chronological record of fixed-location
activities including their end times and durations for each agent,
the duplicate spatial points corresponding to an activity episode
and the spatial points along the travel trajectory were excluded.
Here, activity episode refers to the discrete combination of the
spatial location of the device and the corresponding stay
duration. For example, consider a device that leaves for work at
7:45 am, stays at work from 8:30 am to 4:30 pm, and returns
home at 5:30 pm. In this case, the device is considered to have
two home activity episodes (one from 12 am to 7:45 am and the
other from 5:30 pm to 11:59 pm) and one work episode (from
8:30 am to 4:50 pm). Thus, to create a chronological set of
activity episodes we retained the last spatial point at each
distinct activity episode. To achieve this, we first calculated the
straight-line distance between chronologically ordered
subsequent spatial points for each device and excluded all
points that fell within 250 meters. This step allowed us to
identify and remove the potential duplicate spatial points during
an activity episode and retain only the last point that
corresponds to the end of a distinct activity episode. However,
the above method may not be able to exclude spatial points
during travel. To identify and exclude such points, we classified
each spatial point as a home point (HP), work point (WP),
transient point (TP), or end point (EP), along with the stay
duration of those points; HP, WP and EP are broadly referred
to as stationary points and TP may be classified as a moving
point in travel. We assumed that such moving points (i.e. TPs)
and any other point types with an extremely small stay duration
(≤ 60 seconds) reasonably represent points during travel and
filtered them out. In essence, as shown in Fig. 1, by applying
the unique hours seen, subsequent distance between points,
point type, and stay duration filters, we created input plans (i.e.
daily activity and travel records) for MATSim with a resulting
sample of 2.7 million unique devices. Finally, we assumed that
all the agents (i.e. sample devices) traveled between fixed-
activity locations using automobile as their travel mode.
Fig. 1. Simulation framework to estimate daily activity and travel patterns using location-based GPS data
D. Road Network Data
To simulate the plans, we used the roadway network
provided by OpenStreetMaps [20]. Specifically, we obtained
the Protocol buffer binary format (PBF) file with all the roads
in North America from
america.html as shown in Fig. 1. We then used Osmosis [21] to
exclude residential roads (e.g. cul-de-sacs) from the roadway
file for North America. The final network file comprises about
8.4 million roadway links. It should be noted that the final
roadway network file we used in simulation retained roads from
Canada and Mexico to allow for edge cases in the U.S.
OpenStreetMap uses WGS84 (EPSG: 4326), a spherical
coordinate system, as its default reference coordinate system,
but MATSim recommends using a cartesian coordinate system
for computational efficiency. So, we reprojected the above
roadway network file into the Conus Albers (EPSG:5070)
projection system.
E. Simulation Specifications, Infrastructure, and Tools
Fig. 1 shows the simulation specifications for our run.
MATSim is a scalable tool that can simulate the activity and
travel patterns of an entire population; however, information is
often available only for a sample of the population. When
simulating a sample, MATSim uses adjustment parameters to
appropriately scale up the sample-based network demand to the
entire population. Our sample in this study (2.7 million
devices), represents about 0.8% of the continental U.S.
population. However, it should be noted that this sample is
entirely comprised of travelers on the given day. According to
the 2017 National Household Travel Survey [22],
approximately 83% of survey respondents are travelers. Thus,
the study sample of 2.7 million unique devices represents about
1% of continental U.S. travelers. Previous studies using
MATSim have suggested that a one percent sample runs yield
results that are comparable to large-sample simulations [23,24].
To scale up the network demand due to the study sample, we
set the flow and storage capacity factors to be 0.02 and 0.03,
respectively. The flow capacity factor helps in the appropriate
scaling of network demand. Typically, sample sizes of 100%,
50%, and 10% use flow factors of 1.0, 0.5, and 0.1,
respectively. Storage capacity factor indicates the number of
vehicles that can be stored on a roadway link. Theoretically,
both flow and storage capacity factors are set to the same value,
however this may lead to unstable behavior in simulations with
small sample sizes. Thus, storage capacity factor is set to a
higher value than flow capacity factor. The capacity factors
used in this study are based on our experimentations in previous
studies [25].
To optimize the agent plans we used MATSim’s replanning
innovation strategies time allocation mutator plus reroute,
and best score with weights of 0.2 and 0.8, respectively. The
time allocation mutator plus reroute strategy randomly picks
about 20% of agents (since we used a weight of 0.2) to alter
their activity end times by 15 minutes and re-route their travel
trajectories in every iteration. The remaining 80% of agents
keep their previously estimated best score plan using the best
score strategy. We also set the agent memory to 5, thus every
agent can store a maximum of 5 plans in memory, and if this
threshold is exceeded, the plan with the lowest utility score will
be dropped. Using these co-evolutionary algorithms and
configuration parameters, MATSim arrives at a user
equilibrium state where agents’ daily activity and travel
patterns are optimized and may not be further improved
Since the activity data for the continental U.S. spans across
multiple time zones, we used Alaskan Standard Time (AKST)
as the reference time zone for the simulation and the analysis.
The simulation therefore corresponds to the time period from
12:00 am to 11:59 pm AKST on March 6, 2019.
To run the simulation, we used an Amazon Elastic
Compute Cloud (EC2) instance of the type r5.24xlarge with 96
(virtual) CPU cores and 768 GB of RAM. The simulation was
run for 50 iterations and the run time was approximately 100
hours. The mean plan utility score started at -110.7 for the first
iteration and converged to -75.6 at the end of 50 iterations.
F. Analysis and Validation
MATSim outputs a highly spatiotemporally resolved and
optimized activity and travel schedules for each agent in an
events file. As the output was over 90 GB in size with over 1.5
billion activity and travel records, we analyzed the events file
using the distributed computing framework Apache Spark
2.4.3. Specifically, we processed the output events to generate
aggregated travel statistics, including travel volumes, speed,
and volume/capacity per roadway-link and per hour. To
validate the model, we obtained U.S. state-level vehicle miles
of travel (VMT), by aggregating and extrapolating our roadway
link-specific travel estimates and compared them against
Federal Highway Administration’s VMT statistics [26].
To understand the variability in our traffic volume
estimates, we compared our extrapolated traffic volumes
against the traffic counts for a select state i.e. Florida; our
modeled and extrapolated volume estimates were compared
against select 2018 annual average daily traffic (AADT)
estimates gathered across 9,768 portable traffic monitoring
sites (PTMS) in Florida [27]. Although Florida has a total of
18,189 PTMS sites, we filtered the PTMS to retain only those
locations that measured traffic flow in both roadway directions,
and for which estimated link volumes are available. The PTMS
near uni-directional roadway links were excluded because a
significant portion of these links were near on/off-ramps with
an approximate geometric representation in OSM (e.g. most of
the curved ramps were represented as straight segments in
OSM). This posed an issue in accurately assigning a count
station to the uni-directional ramps. Thus, we used data from
9,768 PTMS to validate our traffic volumes. To compare the
estimated roadway volumes against the PTMS data, we
calculated the nearest distance from each PTMS to every OSM
roadway link in the state of Florida and picked the two nearest
roadway links. We then obtained the sum of sample volumes
estimated on those two links and extrapolated them, to compare
against the count data from the corresponding PTMS.
A. Distribution of study sample in continental U.S.
To understand the distribution of our study sample, we
compared the number of devices (that are mobile and with high
daily visibility) per U.S. state with the corresponding tripmaker
sample from the 2017 National Household Travel Survey [22],
and with the mobility-adjusted population from the 2017 5-year
ACS estimates [17], as shown in Figs. 2a2b and Figs. 2c
2d, respectively. Fig. 2a shows the ratio of study devices over
the number of tripmakers from 2017 NHTS per U.S. state
whereas Fig. 2b shows the statistical distribution of this metric.
This ratio indicates the order of magnitude by which the study
sample (on a select date) exceeds the tripmaker sample from
2017 NHTS. It should be noted that NHTS sample is gathered
from respondents over an annum. In essence, NHTS assigns
respondents to a predefined date in a given year and considers
their reported travel behavior on that date as being
representative of their typical daily travel. All such reported
travel by respondents over the year are consolidated together to
arrive at daily national travel pattern.
As seen in Fig. 2a, the study sample (from a single day) is
greater than the NHTS sample for every continental U.S. state.
However, this metric is variable across states. Representation
of tripmakers is the highest for Louisiana (with a ratio of 119)
and lowest for Wisconsin (2). Overall, the mean and median
values are 36 and 31, respectively. Thus, at an aggregated state-
level geography, the sample in this study exceeds that of 2017
NHTS by orders of magnitude on average. It is to be noted that
unlike this study sample, survey data tends to provide more
refined information on socio-demographics, household
characteristics, and travel behavior without the need for data
Fig. 2c shows the sample size as a percent of mobility-
adjusted ACS population estimate per U.S. state and Fig. 2d
shows its statistical distribution. Here, mobility-adjusted
population estimate is calculated as a product of U.S. state-
specific mobility factor and ACS population estimate, with
mobility factor for each state calculated as the ratio of trip
makers over total persons from 2017 NHTS. Similar to
comparisons with the NHTS travel survey, our comparisons
with ACS estimates revealed variability in sample sizes across
states. However, this population-based sample variability is
more tightly bound and ranges between 0.5% (for Alaska) and
1.4% (Alabama). The mean and median values for population-
based sample are 0.9% and 0.8%, respectively.
Although we see relatively higher sample sizes compared to
the 2017 NHTS travel survey at state level, it is unclear if this
pattern persists across sub-geographies such as counties and
census tracts. For example, sample sizes could be relatively
higher in urban areas compared to rural geographies. Moreover,
the demographic biases in the sample that arise from
smartphone-based application usage patterns are not well
understood. The implications of this sample variability and
potential demographic biases are rather ambiguous for
transportation researchers and practitioners as there is sparse
published literature addressing these issues [28]. Thus, there is
a need for future studies that focus on the quantification of such
potential demographic and geographic biases at different spatial
Fig. 2. Distribution of sample sizes across the continental United States, a) ratio of unique sample devices over tripmaker sample
from the 2017 National Household Travel Survey, b) statistical distribution of ratio of unique sample devices over NHTS tripmaker
sample from Fig. 2a, c) sample unique devices as a percent of 2017 U.S. ACS estimate (that is assumed to be mobile), and d)
statistical distribution of unique devices as a percent of 2017 U.S. ACS from Fig. 2c. The boxplot whiskers correspond to 1.5*IQR
(i.e. inter-quartile range) whereas the point corresponds to mean.
percent (%) sample ratio (GPS/NHTS)
Sample sizes are relatively lower when compared to
population estimates, however this may be an artifact of a
stringent filtering scheme that excludes any device that is
visible less than 20 hours in a day; the filtering criteria
discarded close to 91% of the devices in this study. Despite this,
a 1% sample that is both visible for a significant portion of the
daily activity and travel record (e.g. 83% of daily time as used
in this study) and persistent across multiple days is valuable and
may help better understand the travel demand of the population
along with its variability due to the longitudinal nature of the
data. Composite datasets that are synthesized by combining
such highly spatiotemporally resolved big data samples with
demographic information may be well positioned to study the
impacts of infrastructure investment and policy decisions on
equity. For example, these composite datasets could be used to
quantify the impact of adding toll lanes on travel in an urban
region, and whether the benefits (or negative outcomes) are
equitably distributed across demographics.
B. Spatial distribution of travel in continental U.S.
To understand the spatial variability in aggregated travel
demand, we obtained estimates of VMT and the associated
errors per state as shown in Fig. 3. Specifically, Figs. 3a and 3b
show the simulation-based sample and extrapolated VMT per
U.S. state, respectively. Fig. 3c shows the U.S. state-level daily-
average VMT estimates from FHWA for March 2019; the
daily-average VMT estimates were obtained by reducing
monthly VMT by a factor of 31 (i.e. the number of days in
March). Finally, Fig. 3d shows the percent error between
extrapolated VMT in this study and daily-average VMT from
FHWA for each U.S. state.
The U.S. state-level distribution of VMT from this study is
largely consistent with that of FHWA, as seen in Figs. 3b and
3c. However, we observe a few differences in the ranking. For
example, California is ranked 3rd in our study but according to
FHWA California accounts for the highest VMT in the U.S.
Similarly, Texas ranks 1st in our study but is ranked 2nd
according to FHWA’s VMT estimates. Thus, VMT for
California is underestimated whereas it is overestimated for
Texas. This rank bias in our VMT estimates is not entirely
surprising given we employed a top-down approach to simulate
travel. Specifically, we used a single extrapolation factor based
on total device sample across the continental U.S. to uniformly
scale up the network demand.
Fig. 3. Distribution of estimated VMT in this study along with the associated errors in comparison to Federal Highway
Administration’s VMT estimates across the continental U.S., a) VMT estimates based on the sample data in this study, b) VMT
estimates extrapolated to population, c) FHWA’s VMT estimates for an average day in March, and d) percent error in extrapolated
VMT when compared with FHWA’s VMT estimates
error (%) vehicle miles traveled (miles)
This single factor approach may work satisfactorily if the
rank of tripmaker sample (i.e. number of filtered devices) by
U.S. state is consistent with that of mobility-adjusted ACS
estimates. However, as seen in Fig. 2c we observed variability
in the percent of tripmakers across the states in continental U.S.
Specifically, sample size for California is 0.65% (i.e. less than
mean continental U.S. sample size of 1%) and hence leads to
underestimation of travel. Similarly, the sample size for Texas
is 1.2% (i.e. greater than mean continental U.S. sample size of
1%) and leads to overestimation of VMT. Thus, our VMT
estimation errors range from -55% to 78% as seen in Fig. 3d.
The mean and median errors are 7.9% and 8.3%, respectively.
To understand the impact of device sampling percent (from
Fig. 2c) on errors in VMT estimation (Fig. 3d) we calculated
the Spearman’s Rank correlation between them. We relied on
guidance provided by Mukaka [29] to categorize the strength of
Spearman’s Rank correlation (e.g. negligible, low, moderate,
high, or very high) observed in this study. Overall, we observed
a high positive correlation (Rs=0.80, p-value < 2.2e-16)
between the percent sample of tripmakers and the resulting
error in VMT estimates. This shows that sampling size biases
are highly correlated with VMT estimation errors and thus may
influence the relative spatial distribution of VMT when a top-
down approach is used. Here, sample size bias is defined as
deviation from the 1% sampling rate assumed for the study and
could either refer to under-sampling (<1%) or over-sampling
We observed a very high positive rank correlation (Rs=0.95,
p-value < 2.2e-16) between VMT estimates (Fig. 3b) and those
from FHWA (Fig. 3c). This suggests that despite the sample
size bias, the ranking of our modeled VMT estimates largely
match with that of FHWA estimates. Thus, a top-down agent-
based modeling approach that uses activity and travel records
derived from a GPS data source may still provide insights that
are of practical significance to estimating the spatial
distribution of travel demand.
Overall, comparison with FHWA estimates show that our
model is sensitive to spatial differences (i.e. state-level
differences) in activity and travel demand but tends to slightly
overestimate VMT on average (by about 8%). As previously
discussed, these estimation errors are largely a function of
sample size biases between geographies. Nonetheless, it
remains largely unknown if these errors can be mitigated by
employing a bottom-up approach where the activity and travel
demand of each U.S. state or metro area is simulated separately,
and results are combined together to obtain a mosaic of travel
metrics for the entire U.S. However, if the spatial
disaggregation is too fine (e.g. implementing separate
simulations for each U.S. county), then a large percent of the
inter-region (or long-distance) travel may be lost. Considering
the above, there is a need for more studies that use agent-based
frameworks to quantify relative and absolute errors in estimates
of travel metrics. Moreover, such studies can provide guidance
on optimal run parameters that can address sample bias issues
without significantly losing the long-distance portion of travel.
C. State-level validation
To understand the bias in our travel estimates at the roadway
level, we compared the modeled roadway volume data with the
annual average daily traffic (AADT) counts from the PTMS
network for the state of Florida in 2018. We observed a high
Spearman’s Rank correlation (Rs=0.81, p-value < 2.2e-16)
between the extrapolated volumes and observed AADT counts.
This suggests that the proposed simulation framework may be
helpful in understanding the spatial distribution of traffic at a
finer geographic scale.
We computed the normalized root mean square error
(NRMSE) statistic to understand the absolute differences
between our roadway volume estimates and the AADT counts
from PTMS. The NRMSE value for the state of Florida is
53.1% based on comparisons with data from 9,768 PTMS.
However, as shown in Fig. 4, the NRMSE is variable across
counties with the lowest for Lafayette County (28.6%) and the
highest for Monroe County (118%). Our NRMSE is slightly
higher than those observed for different modeling regions in
Florida. For example, area wide NRMSE for the Tampa Bay
Regional Planning Model [30] and the Southeast Regional
Planning Model [31] was 31% and 39%, respectively; NRMSE
for the same regions in our implementation is 48% and 45%,
respectively. A previous study that used highly aggregated
passive data (i.e. trip data from an origin-destination matrix
built using CDR data) found NRMSE for Asheville, NC to be
49% [32]. Finally, the Pearson correlation between our
estimated volumes and observed counts was about 0.85 as seen
in Fig. 5.
Although the relative distribution of our volume estimates
is generally consistent with the count data, the absolute
differences are on the higher end compared to those reported
for different model regions in Florida. One of the significant
factors that may have resulted in such high NRMSE is the
incompatibility between the OSM roadway network and the
PTMS. We retained only non-residential roads from OSM data
for computational efficiency before the roadway network was
fed into MATSim. This posed an issue for our validation efforts
against the count data. As outlined in methods (section II.F), we
validated the estimated volumes by identifying the nearest two
links for every count station. However, if the count station was
located near a residential road our search algorithm would
detect the next nearest non-residential road since we excluded
residential roads in our simulation. For example, the top three
relative errors ((AADT-Estimated)/AADT) that we found were
about -136,247%, -115,400%, and -67,444% respectively
because the count stations were located near residential areas,
but the nearest non-residential roads happened to be interstate
highways. Thus, while our NRMSE appears to be high, it is
unclear if this is a result of naive validation strategy or sampling
bias issues. We intend to investigate and quantify this aspect in
future studies.
Fig. 4. a) Distribution of population across counties in Florida and b) normalized root mean square error (NRMSE) values for each
county in Florida.
Fig. 5. Pearson correlation between estimated roadway volumes and the count data from 2018 PTMS
The strength of our results is affected by uncertainties that
are inherent to assumptions with big data and large-scale
simulation frameworks. First, we used a single extrapolation
factor that ignores spatial variability in sample sizes to estimate
U.S. state-level VMT and roadway link volumes. Although this
approach may serve well to represent the relative spatial
distribution of travel metrics, it poses a challenge to match the
absolute metrics. Future studies can focus on this particular
issue and attempt to conduct a bottom-up simulation where
travel metrics are estimated at smaller geographies and
combined together to produce nationwide assessments. Such
studies can provide guidance on optimal methods for
implementing agent-based simulation using big location data.
Second, we assumed that all travel in this study is
automobile-based. This naive mode scheme may introduce
uncertainties when validating the simulated volumes against
AADT counts. Further, factors such as difference in time scales
(i.e. count data is for 2018 and simulation is for a specific day
in March 2019) and incompatibility between OSM network and
PTMS locations (e.g. matching count locations with wrong
roadway links due to missing residential roads) may exaggerate
the errors in our estimates. Future studies may try to address
these issues by either employing heuristics that estimate travel
mode which can be fed into the simulation or by leveraging the
mode choice modules provided by MATSim to estimate the
travel mode. In addition, the validation approach in this study
can be improved by following a more stringent process to filter
out PTMS locations that are beyond a certain distance threshold
from the nearest link. However, estimating this distance
threshold could be a challenge because it may be region-
specific. For example, urban regions may have a lower distance
threshold compared to rural areas as the road network is
comparatively denser in urban areas.
Third, although we conducted a large-scale simulation to
understand the travel demand for the entire continental U.S., it
was implemented for a single day. Thus, the current study is
incapable of answering questions on the variability of travel
demand across longer spatiotemporal scales (e.g. differences in
travel demand across days and geographies). This is evident
from the high NRMSEs resulting from the validation of our
volume estimates (based on a single day measure) against
AADT (an annualized measure). This warrants future work
using big location data powered large-scale agent-based
simulations across different time and spatial scales.
Finally, this study relied on a heuristic-based data filtering
scheme combined with OSM network data to simulate the daily
activity and travel patterns of agents. The heuristics used in our
study to develop agent plans can further be improved by
leveraging land use data. For example, misclassifications in
activity type associated with each spatial point can be reduced
by using land use data. Although OSM road network data is
generally regarded as of fair quality, it is neither fully complete
nor up to date [33,34]. The discrepancies in the roadway feature
attributes can alter the simulated travel demand. For example, a
single inaccurately coded capacity value on an interstate can
alter traffic flow and cause spillover effects. Moreover, since
we did not account for toll roads in our simulation, our volume
estimates on toll-roads could be exaggerated and vice versa for
non-toll highways. Thus, future efforts may leverage land use
and more accurate road network data (e.g. commercially
sourced maps) to understand the travel demand through
simulation frameworks.
In this study, we implemented a large-scale agent-based
simulation to mimic the daily activity and travel patterns of the
residents of the continental United States using GPS sightings
from approximately 2.7 million smart devices for Wednesday,
March 6, 2019. To our knowledge, the spatial scale for this type
of study is unprecedented. We found that big location data
samples in our study which are based on GPS sightings can
provide meaningful estimates for aggregated travel metrics,
although we observed variability in sample sizes across the
continental U.S. states. Despite the sample size variation, we
found strong correlation between estimated and observed travel
metrics (i.e. VMT and traffic volumes). Our model
performance is comparable to those of regionwide travel
demand models for the state of Florida. However, we also found
evidence for relatively higher errors when comparing the
absolute values of our estimates against observed metrics.
These errors may be precipitated due to a multitude of factors
including spatial mismatches in input roadway network and
traffic count stations, a single sampling factor that ignores
spatial variability in sample sizes, and inconsistencies between
the OSM road network and ground truth. Future studies can
address these specific limitations and improve upon the novel
methodology presented in this study to conduct agent-based
simulation using big data sources, thus providing a valuable
resource to estimate the travel demand of the population.
Map data copyrighted OpenStreetMap contributors and
available from
The first three authors are employees with AirSage. AirSage
may have a financial interest in the work presented in this paper.
However, the first three authors state that such financial
considerations did not influence the conclusions drawn in this
[1] National Academies of Sciences, Engineering, and Medicine. Statewide
and megaregional travel forecasting models: Freight and passenger.
Washington, DC: The National Academies Press.
[2] M. Kamargianni, W. Li, M. Matyas, and A. Schäfer, “A critical review of
new mobility services for urban transport,” Transportation Research
Procedia, vol. 14, pp. 3294-3303, 2016.
[3] C. Wagner and S. Shaheen, “Car Sharing and Mobility Management:
Facing new challenges with technology and innovative business
planning,” World Transport Policy and Practice, vol. 4, pp. 39-43, 1998.
[4] C. Vlek and L. Steg, “Human Behavior and Environmental Sustainability:
Problems, Driving Forces, and Research Topics,” Journal of social issues,
vol. 63, no. 1, pp. 1-19, 2007.
[5] B. Nykvist and L. Whitmarsh, “A multi-level analysis of sustainable
mobility transitions: Niche development in the UK and Sweden,”
Technological forecasting and social change, vol. 75, no. 9, pp. 1373-
1387, 2008.
[6] P. Rode, G. Floater, N. Thomopoulos, J. Docherty, P. Schwinger, A.
Mahendra, and W. Fang, “Accessibility in cities: transport and urban
form,” in Disrupting Mobility, G. Meyer and S. Shaheen, Eds. Cham:
Springer, 2017, pp. 239-273.
[7] M. S. Ullah, U. Molakatalla, R. Morocoima-Black, and A.Z. Mohideen,
“Travel demand modeling for the small and medium sized MPOs in
Illinois,” Illnois Center for Transportation, Urbana, USA, Tech. Report.
FHWA-ICT-11-091, September 2011.
[8] H. Wang, F. Calabrese, G. D. Lorenzo, and C. Ratti, “Transportation
mode inference from anonymized and aggregated mobile phone call detail
records,” In Proc. 13th International IEEE Conference on Intelligent
Transportation Systems, 2010, pp. 318-323.
[9] S. Çolak, L. P. Alexander, B. G. Alvim, S. R. Mehndiratta, and M. C.
González, “Analyzing cell phone location data for urban travel: current
methods, limitations, and opportunities,” Transportation Research
Record, vol. 2526, no. 1, pp. 126-135, 2015.
[10] M. S. Iqbal, C.F. Choudhury, P. Wang, and M. C. González,
“Development of origindestination matrices using mobile phone call
data,” Transportation Research Part C: Emerging Technologies, vol. 40,
pp. 63-74, 2014.
[11] Y. Hasegawa, Y. Sekimoto, T. Kashiyama, and H. Kanasugi,
"Transportation melting pot Dhaka: road-link based traffic volume
estimation from sparse CDR data." In Proc. First International Conference
on IoT in Urban Space, 2014, pp. 105-107.
[12] P. Widhalm, P. Nitsche, and N. Brändie, “Transport mode detection with
realistic smartphone sensor data,” In Proc. 21st International Conference
on Pattern Recognition (ICPR2012), 2012, pp. 573-576.
[13] M. Bierlaire, J. Chen, and J. Newman, “A probabilistic map matching
method for smartphone GPS data,” Transportation Research Part C:
Emerging Technologies, vol. 26, pp.78-98, 2013.
[14] L. Montini, S. Prost, J. Schrammel, N. Rieser-Schüssler, and K. W.
Axhausen, “Comparison of travel diaries generated from smartphone data
and dedicated GPS devices,” Transportation Research Procedia, vol. 11,
pp. 227-241, 2015.
[15] A. Kuppam, J. Lemp, D. Beagan, V. Livshits, L. Vallabhaneni, and S.
Nippani, “Development of a tour-based truck travel demand model using
truck GPS data,” presented at the Transportation Research Board 93rd
Annual Meeting, Washington, DC, USA, Jan. 12-16, 2014, Paper 14-
[16] A. B. Zanjani, A. R. Pinjari, M. Kamali, A. Thakur, J. Short, V. Mysore
and S. F. Tabatabaee, “Estimation of statewide origindestination truck
flows from large streams of GPS data: Application for Florida statewide
model,” Transportation Research Record, vol. 2494, no. 1, pp. 87-96,
[17] 2017 American Community Survey 5-Year Estimates, U.S. Census
Bureau, Feb. 2019. [Online].
[18] A. Horni, K. Nagel, K. W. Axhausen, Eds., The multi-agent transport
simulation MATSim. London: Ubiquity Press, 2016.
[19] S. Gurram, V. Sivaraman, A. L. Stuart, and J. Apple, “An assessment of
population exposure to NOx using GPS data: A case study for Tampa,
FL,” unpublished. 2019. [Online]. Available:
[20] OpenStreetMap contributors, “Planet dump retrieved from [Online]. 2019. Available: 2019.
[21] Osmosis. (2019). OpenStreetMap. [Online]. Available:
[22] U.S. Department of Transportation, Federal Highway Administration,
“2017 National Household Travel Survey,” [Online]. 2017. Available:
[23] A. Neumann, D. Röder, and J. W. Joubert, “Toward a simulation of
minibuses in South Africa,” Journal of Transport and Land Use, vol. 8,
no. 1, pp. 137-154, 2015.
[24] M. Zilske and K. Nagel, “Building a minimal traffic model from mobile
phone data,” In Proc. Third International Conference on the Analysis of
Mobile Phone Datasets (NetMob), 2013, doi:
[25] S. Gurram, A. L. Stuart, and A. R. Pinjari, “Agent-based modeling to
estimate exposures to urban air pollution from transportation: Exposure
disparities and impacts of high-resolution data,” Computers, Environment
and Urban Systems, vol. 75, pp. 22-34, 2019.
[26] U.S. Federal Highway Administration (FHWA), “Travel monitoring
trends: 2019 March,” Department of Transportation, Washington, DC.
[Online]. 2019. Available:
[27] Transportation Data and Analytics Office, “2018 Traffic data,” Florida
Department of Transportation, Tallahassee, FL. [Online]. 2019.
[28] C. Chen, J. Ma, Y. Susilo, Y. Liu, and M. Wang, “The promises of big
data and small data for travel behavior (aka human mobility) analysis,”
Transportation research part C: emerging technologies, vol. 68, pp. 285-
299, 2016.
[29] M. M. Mukaka, “A guide to appropriate use of correlation coefficient in
medical research,” Malawi Medical Journal, vol. 24, no. 3, pp.69-71,
[30] Florida Department of Transportation, “Tampa Bay Regional Planning
Model v8.0: Validation report,” [Online]. April 2015, Available:
[31] Parsons Brinckerhoff, The Corradino Group, and BCC Engineering,
“Southeast Florida Regional Planning Model (SERPM 7.0): Model
development report,” [Online]. February 2015, Available:
[32] J. Kressner, G. S. Macfarlane, L. F. Huntsinger, R. Donnelly, “Using
passive data to build an agile tour-based model: a case study in
Asheville,” In Proc. 6th Innovations in Travel Modeling Conference,
2016. [Online]. Available:
[33] C. Barrington-Leigh and A. Millard-Ball, “The world’s user-generated
road map is more than 80% complete,” PloS one, vol. 12, no. 8, 2017.
[34] I. Ludwig, A. Voss, and M. Krause-Traudes, “A Comparison of the Street
Networks of Navteq and OSM in Germany,” in Advancing
geoinformation science for a changing world, S. Geertman, W. Reinhardt,
F. Toppen, Eds. Berlin: Springer, 2011, pp. 65-84.
... And if we increase the resolution of the data (Jimenez et al., 2019), we can also understand the causal relationships related to consumption. In terms of infrastructure load, patterns of population movement (Gurram et al., 2019) offer exciting opportunities, but can also be integrated with the condition of buildings (Gouveia and Palma, 2019), which also supports the satisfaction of urban planning tasks (Milojevic-Dupont et al., 2020) at a higher level. ...
... Coastal tourism monitoring (Kubo et al., 2020) can be integrated with traffic data (Hu et al., 2020) to optimize traffic management and thereby reduce pollutant emissions. The effect of transport on plant damage can be included (Meineke et al., 2020) as a factor to be analyzed, or we can use it (Gurram et al., 2019) to identify patterns in population movement. ...
... Population movements (Gurram et al., 2019) affect water consumption (Mourtzios et al., 2021), can damage plants (Meineke et al., 2020), show the popularity of coastal areas (Kubo et al., 2020), but are also suitable for improving transport planning (Hu et al., 2020). Because the movement of residents is closely related to the infrastructure (Milojevic-Dupont et al., 2020), it is a very valuable input in urban planning. ...
Full-text available
The aim of this paper is to provide an overview of the interrelationship between data science and climate studies, as well as describes how sustainability climate issues can be managed using the Big Data tools. Climate-related Big Data articles are analyzed and categorized, which revealed the increasing number of applications of data-driven solutions in specific areas, however, broad integrative analyses are gaining less of a focus. Our major objective is to highlight the potential in the System of Systems (SoS) theorem, as the synergies between diverse disciplines and research ideas must be explored to gain a comprehensive overview of the issue. Data and systems science enables a large amount of heterogeneous data to be integrated and simulation models developed, while considering socio-environmental interrelations in parallel. The improved knowledge integration offered by the System of Systems thinking or climate computing has been demonstrated by analysing the possible inter-linkages of the latest Big Data application papers. The analysis highlights how data and models focusing on the specific areas of sustainability can be bridged to study the complex problems of climate change.
Experiment Findings
Full-text available
This study aims to understand the applicability of smartphone-based GPS data to estimate population exposure to traffic pollution and the associated inequalities.
Full-text available
OpenStreetMap, a crowdsourced geographic database, provides the only global-level, openly licensed source of geospatial road data, and the only national-level source in many countries. However, researchers, policy makers, and citizens who want to make use of OpenStreetMap (OSM) have little information about whether it can be relied upon in a particular geographic setting. In this paper, we use two complementary, independent methods to assess the completeness of OSM road data in each country in the world. First, we undertake a visual assessment of OSM data against satellite imagery, which provides the input for estimates based on a multilevel regression and poststratification model. Second, we fit sigmoid curves to the cumulative length of contributions, and use them to estimate the saturation level for each country. Both techniques may have more general use for assessing the development and saturation of crowd-sourced data. Our results show that in many places, researchers and policymakers can rely on the completeness of OSM, or will soon be able to do so. We find (i) that globally, OSM is ∼83% complete, and more than 40% of countries—including several in the developing world—have a fully mapped street network; (ii) that well-governed countries with good Internet access tend to be more complete, and that completeness has a U-shaped relationship with population density—both sparsely populated areas and dense cities are the best mapped; and (iii) that existing global datasets used by the World Bank undercount roads by more than 30%. © 2017 Barrington-Leigh, Millard-Ball. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Full-text available
The growing pressure on urban passenger transport systems has increased the demand for new and innovative solutions to increase its efficiency. One approach to tackle this challenge has been the slow but steady shift towards shared mobility services (car-, bike-sharing etc.). Building on these new modes and the developments in information and communication technologies, the concept of “Mobility as a Service” (MaaS) has recently come to light and offers convenient door-to-door transport without the need to own a private vehicle. The term Mobility as a Service (MaaS) stands for buying mobility services based on consumer needs instead of buying the means of mobility. In recent years, various MaaS schemes have been arisen around the world. The objective of this paper is to review these newly existing mobility services and develop an index to evaluate the level of mobility integration for each based on the assumption that higher level of integration is more appealing to travellers. The review presented in this paper allows a comparison among the schemes and provides the background and the key points of MaaS systems that the research community could use for designing surveys. It also provides significant insights to transport operators and authorities on the elements they should take into account to apply an attractive MaaS scheme that could effectively shift demand away from private vehicles.
Full-text available
The last decade has witnessed very active development in two broad, but separate fields, both involving understanding and modeling of how individuals move in time and space (hereafter called “travel behavior analysis” or “human mobility analysis”). One field comprises transportation researchers who have been working in the field for decades and the other involves new comers from a wide range of disciplines, but primarily computer scientists and physicists. Researchers in these two fields work with different datasets, apply different methodologies, and answer different but overlapping questions. It is our view that there is much, hidden synergy between the two fields that needs to be brought out. It is thus the purpose of this paper to introduce datasets, concepts, knowledge and methods used in these two fields, and most importantly raise cross-discipline ideas for conversations and collaborations between the two. It is our hope that this paper will stimulate many future cross-cutting studies that involve researchers from both fields.
Better understanding of the complex links between urban transportation, land use, air quality, and population exposure is needed to improve urban sustainability. A goal of this study was to develop an exposure modeling framework that integrates agent-based activity and travel simulation with air pollution modeling for Tampa, Florida. We aimed to characterize exposure and exposure inequality for traffic-related air pollution, and to investigate the impacts of high-resolution information on estimated exposure. To do these, we developed and applied a modeling framework that combines the DaySim activity-based travel demand model, the MATSim dynamic traffic assignment model, the MOVES mobile source emissions estimator, and the R-LINE dispersion model. Resulting spatiotemporal distributions of daily individual human activity and pollutant concentration were matched to analyze population and subgroup exposure to oxides of nitrogen (NOx) from passenger car travel for an average winter day in 2010. Four scenarios using data with different spatiotemporal resolutions were considered: a) high resolution for both activities and concentrations, b) low resolution for both activities and concentrations, c) high resolution for activities, but low resolution for concentrations, and d) vice versa. For the high-resolution scenario, the mean daily population exposure concentration of NOx from passenger cars was 10.2 μg/m3; individual exposure concentrations ranged from 0.2 to 145 μg/m3. Subgroup mean exposure was higher than the population mean for individuals living below-poverty (by ~16%), those with daily travel time over one hour (8%), adults aged 19–45 (7%), blacks (6%), Hispanics (4%), Asians (2%), combined other non-white races (2%), people from middle income households (2%), and residents of urban areas (2%). The subgroup inequality index (a measure of disparity) largely increased with concentration up to the 90th percentile level for these groups. At higher levels, disparities increased sharply for individuals from below poverty households, blacks, and Hispanics. Low-resolution simulation of both activities and concentrations decreased the exposure estimates by 10% on average, with differences ranging from eight times higher to ~90% lower.
This chapter reviews the different pathways which cities are following to become more accessible. By identifying the close link between transport and urban form based on global evidence, it highlights the direct and indirect costs of choices made. It then presents the tipping points which can allow to proceed from sprawling urban development and conventional motorised transport to more compact cities characterised by innovative mobility choices shaped around shared and public transport. The examples used are based on cities worldwide to illustrate emerging trends from both developed and developing countries. Therefore, the recommendations are valuable for a range of stakeholders including local and national policy makers, academics and vehicle manufacturers.
The MATSim (Multi-Agent Transport Simulation) software project was started around 2006 with the goal of generating traffic and congestion patterns by following individual synthetic travelers through their daily or weekly activity programme. It has since then evolved from a collection of stand-alone C++ programs to an integrated Java-based framework which is publicly hosted, open-source available, automatically regression tested. It is currently used by about 40 groups throughout the world. This book takes stock of the current status. The first part of the book gives an introduction to the most important concepts, with the intention of enabling a potential user to set up and run basic simulations.The second part of the book describes how the basic functionality can be extended, for example by adding schedule-based public transit, electric or autonomous cars, paratransit, or within-day replanning. For each extension, the text provides pointers to the additional documentation and to the code base. It is also discussed how people with appropriate Java programming skills can write their own extensions, and plug them into the MATSim core. The project has started from the basic idea that traffic is a consequence of human behavior, and thus humans and their behavior should be the starting point of all modelling, and with the intuition that when simulations with 100 million particles are possible in computational physics, then behavior-oriented simulations with 10 million travelers should be possible in travel behavior research. The initial implementations thus combined concepts from computational physics and complex adaptive systems with concepts from travel behavior research. The third part of the book looks at theoretical concepts that are able to describe important aspects of the simulation system; for example, under certain conditions the code becomes a Monte Carlo engine sampling from a discrete choice model. Another important aspect is the interpretation of the MATSim score as utility in the microeconomic sense, opening up a connection to benefit cost analysis. Finally, the book collects use cases as they have been undertaken with MATSim. All current users of MATSim were invited to submit their work, and many followed with sometimes crisp and short and sometimes longer contributions, always with pointers to additional references. We hope that the book will become an invitation to explore, to build and to extend agent-based modeling of travel behavior from the stable and well tested core of MATSim documented here.
Travelers today use technology that generates vast amounts of data at low cost. These data could supplement most outputs of regional travel demand models. New analysis tools could change how data and modeling are used in the assessment of travel demand. Recent work has shown how processed origin–destination trips, as developed by trip data providers, support travel analysis. Much less has been reported on how raw data from telecommunication providers can be processed to support such an analysis or to what extent the raw data can be treated to extract travel behavior. This paper discusses how cell phone data can be processed to inform a four-step transportation model, with a focus on the limitations and opportunities of such data. The illustrated data treatment approach uses only phone data and population density to generate trip matrices in two metropolitan areas: Boston, Massachusetts, and Rio de Janeiro, Brazil. How to label zones as home- and work-based according to frequency and time of day is detaile...
This paper demonstrates the use of large streams of truck GPS data from the American Transportation Research Institute for the estimation of statewide freight truck flows in Florida. Raw GPS data streams, which comprised more than 145 million GPS records, were used to derive a database of more than 1 2 million truck trips that started or ended in Florida. The paper sheds light on the extent to which the trips derived from the GPS data captured the observed truck traffic flows in Florida. The paper includes insights into (a) the truck type composition, (b) the proportion of the observed truck traffic flows covered by the data, and (c) the geographical differences in the coverage. The paper applies origin destination (O-D) matrix estimation to combine the GPS data with the observed truck traffic volumes at different locations within and outside Florida to derive an O-D table of truck flows within, into, and out of the state. The procedures, implementation details, and findings discussed in the paper are expected to be useful to agencies that are considering the use of GPS data in freight travel demand modeling.