ArticlePDF Available

Abstract and Figures

Are the patterns of car travel different from those of general human mobility? Based on a unique dataset consisting of the GPS trajectories of 10 million travels accomplished by 150,000 cars in Italy, we investigate how known mobility models apply to car travels, and illustrate novel analytical findings. We also assess to what extent the sample in our dataset is representative of the overall car mobility, and discover how to build an extremely accurate model that, given our GPS data, estimates the real traffic values as measured by road sensors.
Content may be subject to copyright.
Eur. Phys. J. Special Topics 215, 61–73 (2013)
© EDP Sciences, Springer-Verlag 2013
DOI: 10.1140/epjst/e2013-01715-5
Regular Article
Understanding the patterns of car travel
L. Pappalardo
, S. Rinzivillo
, D. Pedreschi
, and F. Giannotti
KDDLab ISTI CNR, Pisa, Italy
KDDLab, Department of Computer Science, University of Pisa, Italy
College of Computer and Information Science, Southwest University, Chongqing, China
Received 22 October 2012 / Received in final form 29 November 2012
Published online xx January 2013
Abstract. Are the patterns of car travel different from those of gen-
eral human mobility? Based on a unique dataset consisting of the GPS
trajectories of 10 million travels accomplished by 150,000 cars in Italy,
we investigate how known mobility models apply to car travels, and
illustrate novel analytical findings. We also assess to what extent the
sample in our dataset is representative of the overall car mobility,
and discover how to build an extremely accurate model that, given
our GPS data, estimates the real traffic values as measured by road
1 Introduction
The analysis of human movement has received increasing attention in the last decade,
due to the emergence of big mobility data, portraying mobility activity at an unprece-
dented scale and detail. Wireless technologies, such as the satellite-enabled Global
Positioning System (GPS) and the mobile phone networks, as a by-product of their
normal operations, allow for sensing and collecting massive repositories of spatio-
temporal data, such as the call detail records from mobile phones and the GPS tracks
from navigation devices, delivering society-wide proxies of human mobile activities.
This is a brand new social microscope, which promises to help us discover the hidden
patterns and models that characterize the trajectories humans follow during their
daily activity. This direction of research has recently attracted scientists from diverse
disciplines [1–5], notably data mining and network science, also given its importance
in domains such as urban planning, sustainability, transportation engineering, pub-
lic health, and economic forecasting. We focus here on mobility by car, the most
popular private means for transportation in the current society. We had access to a
unique dataset consisting of the detailed spatio-temporal trajectories of approximately
10 million travels, accomplished by more than 150,000 cars in central Italy during the
month of May 2011. We reconstructed this information from the anonymous GPS
62 The European Physical Journal Special Topics
tracks sensed in the mentioned period by OctoTelematics, a company that manages
on board GPS receivers and data collection for the car insurance industry; currently,
approx. 2% of circulating cars in Italy carry this technology. In previous research, we
have validated, using this form of data, many mobility data mining tools and asso-
ciated analytical models, including the discovery of the access patterns to a city [1],
of the individual profiles or travel routines [6], of the geographical borders of human
mobility [7], of clusters of similar mobility behaviors [8], and more.
In this paper, we consider the preferential return model for human mobility intro-
duced by Barab´asi and others in [4,5], which explains the patterns and laws govern-
ing the key phisical quantities of human movements, including the length/duration
of travels, the radius of gyration of travellers, the number and frequency of location
visited by travellers. We address the following question: does this model apply to
car travel? It should be noted that the models in [4,5] are developed with reference
to mobile phone data, which, compared with our GPS trajectories, have two main
differences: mobile phone data pertain to general mobility (while GPS data pertain
only to cars) and are much less detailed than GPS trajectories, the latter providing
for the precise spatio-temporal records of each travel with high exact geo-location and
high sampling rate. In fact, each trajectory in our dataset is a time-ordered sequence
of position records <id, x, y, t> <id, x, y, t> where id is the anonymized car
identifier, (x, y) are the lat-long coordinates, and t is the time of the position.
According to GPS standards, positioning accuracy is within a 15 meters bound in
absence of obstacles and malfunctions.
Sampling time, i.e. time between two observations, is 30 seconds in average when
the car is in motion.
It is therefore legitimate to investigate to what extent the previous models apply to
GPS data, which deviations are observed, and which new analytical opportunities are
provided by the finer spatio-temporal granularity. On the other hand, it is compulsory
to investigate to what extent our GPS data are representative of the overall vehicular
mobility, in order to generalize the validity of our findings. To this purpose, we use
independent ground-truth measurements of global traffic volumes obtained by sensors
placed in a set of locations during the same observational period of our GPS data, and
show that the GPS data are an extremely accurate estimation of the overall volumes
in each location, even if GPS data pertain to a 2% sample. Our model is based on
standard machine learning techniques, tailored to the peculiarity of trajectory data.
In conclusion, we obtain two intertwined results: first, the known human mobility
models can be refined to deal with car mobility, and second, the available GPS data
can indeed be used as a faithful proxy of car mobility.
The remainder of this paper is organized as follows. Section 2 sketches the state
of the art in human mobility patterns and traffic forecasting. Section 3 presents the
experimental results of comparing a massive datasets of human movements by car
against the mobility laws proposed in the literature. The section 4 shows that GPS
data are a good approximation of actual traffic and presents a method to infer total
traffic volume by exploiting information from a single location. Finally, the last section
summarizes and discusses the findings of the paper.
2 State of the art
In the past few years, the exploding prevalence of mobile phones, GPS navigators,
and other handheld devices allowed scientists to track human mobility and to test
mobility models on a fertile ground. Such a social microscope showed that human
movements do not follow a Brownian motion, nor are they governed by evy flight,
but follow their own laws. Brockmann et al. [3] analyzed data collected at a large
Spatially Embedded Socio-Technical Complex Networks 63
online bill-tracking website, and found a power law in the probability P (r) of a bank
note to traverse a distance r, P (r) r
. Gonz´alez et al. [4] analyzed a massive
mobile phone dataset, and also found a power law in the distribution P (r
radius of gyration r
, the characteristic distance travelled by a user. This means that
there is huge variability in the distance traveled by individuals, with most people
traveling in close vicinity to their home location, and a small but not negligible
amount of people making very long journeys. The number of distinct location S(t)
visited by humans is sublinear in time, well approximated by S(t) t
with µ =
0.6 ± 0.02, that indicates a decreasing tendency of people to visit previously unvisited
locations [5]. Moreover, the visitation frequency, that is the probability f of a user
to visit a given location, is rather uneven, resulting in a Zipf-like visitation frequency
distribution P (f) f
[5]. Bazzani et al. [9] discussed an exponential law for
the trajectories distribution in a urban road network, using a GPS dataset on private
cars similar to ours. In [10], Lee et al. clustered visit points from a GPS dataset
to form hot spots, finding that the pausetime distribution in hot spot following a
truncated power law consistently with [9]. [11] analyzed distances splitting them in
distance inside locations and distance between locations. In [2], the authors presented
a data mining approach applied on trajectory data to analyze human behavior. With
the same approach, in [1] they presented an extensive set of analysis on large sets of
GPS data. These analyses were performed by means of an integrated framework for
mobility analysis: M-Atlas, which has also been used in this study.
Predicting and estimating the number of vehicles is a crucial component of ad-
vanced traffic management and information system which aim to reduce traffic con-
gestion, improve mobility and enhance air quality. Among existing techniques, factor
approaches are the most popular methods, and are generally implemented by devel-
oping a set of factors from historical data and applying them on new data to make
predictions [12,13]. Other approaches consider the traffic flow in the road as a time
series. Such models assume that the historical values of a variable are related to its
value in the future [14–16]. Locally weighted regression, a variant of regression analy-
sis, is a memory-based algorithm that learns continuous mappings from real-valued
input vectors to real-valued output vectors. It assigns a weight to each training obser-
vation that regulates its influence on the training process. This weight depends upon
the location of the training point in the input variable space relative to that of the
point to be predicted. Training observations closer to the prediction point generally
receive higher weights [17,18]. Machine learning, a branch of artificial intelligence,
is a discipline related to the design and development of algorithms that learn hid-
den processes based on historical empirical data. In the traffic forecasting context,
artificial neural networks and support vector machines have proved to be the best
alternatives for modeling and predicting traffic counts [19–21].
3 Understanding human mobility by car
Many studies on human mobility are based on the analysis of GSM phone data
[4,5]. Even though mobile phones are carried by the same person during the daily
routine offering a good proxy to capture individual trajectories, they do not provide
an accurate spatial information. Indeed, we know users’ positions only when they
perform a call, and we only know the position of the tower managing the area the
user is within (an area of about 3 km
in average), and not the user’s actual geographic
location. On the contrary, GPS traces provide high resolution location data, storing
the geodetic coordinates with an average sampling rate of few seconds. These features
are ideal in making a refined statistical analysis of human mobility patterns. However,
in literature relatively few works are based on GPS data, mainly because it is difficult
64 The European Physical Journal Special Topics
Fig. 1. Spatial distribution of GPS trajectories in the dataset D
. The trajectories corre-
spond to car travels performed by vehicle passed through an area correspoding to central
Italy in May 2011.
to obtain complete traces covering movements along the whole day. In this paper,
we present a statistical validation of human mobility patterns, by using a massive
real-life dataset D
that captures car mobility, obtained from tens of thousands private
vehicles with on-board GPS receivers.
The dataset stores information of approximately 9.8 Million different car travels
from 159,000 cars tracked during one month (May 2011) in an area corresponding
to central Italy (a 250 km × 250 km square, Figure 1). The GPS traces are collected
by Octo Telematics Italia Srl
, a company that provides a data collection service
for insurance companies. The market penetration of this service is variable on the
territory, but in general covers around 2% of the total registered cars. The GPS
device is automatically turned on when the car is started, and the global trajectory
of a vehicle is formed by the sequence of GPS points that the device transmits each
30 seconds to the server via a GPRS connection. When the vehicle stops no points
are logged nor sent. We exploit these stops to split the global trajectory into several
Spatially Embedded Socio-Technical Complex Networks 65
Fig. 2. Left: the probability density function P (d
) of travel distances in kilometers. Two
different regime emerge: an exponential distribution (from 1 to about 20 km), and a power
law distribution with exponent β 2.53 (from about 20 km to about 500 km). Center the
distribution P (r
) of the radius of gyration measured for the users in the dataset D
solid line represents a truncated power law fit, with r
=5.54, β =1.13 and τ =39.76.
Right: the correlation between r
and the distance d(c
) between the center of mass and
the most frequent location of users.
sub-trajectories, that correspond to the travels performed by the vehicle. Clearly, the
vehicle may have stops of different duration, corresponding to different activities. To
ignore small stops like gas stations, traffic lights, bring and get activities and so on,
we chose a stop duration threshold of at least 20 minutes: if the time interval between
two consecutive observations of the car is larger than 20 minutes, the first observation
is considered as the end of a travel and the second observation is considered as the
start of another travel. We also performed the extraction of the trips by using different
stop duration thresholds (5, 10, 15, 20, 30, 40 minutes), without finding significant
differences in the sample of short trips and in the statistical analysis we present in
the current paper.
Our first measurement is the distribution of travel length, from which two different
regimes clearly emerge (Figure 2, left). The first one (from 1 to about 20 km) corre-
sponds to low range travels, mainly located within the cities, and is characterized by
an exponential distribution. The second regime corresponds to inter-city travels (i.e.
travels connecting different urban areas), and is governed by a power law distribution
with exponent β =2.53. Here, the observed scaling exponent is different from the
one observed for GSM data (β =1.75) [4] and for bank note dispersal (β =1.59) [3],
since it is influenced by a reduced range of distances in the GPS dataset. Indeed, both
geographical limits (the region considered has a length of about 500 km) and physical
limitations (travels longer than 500 km are rare) provide a limitation to the range of
possible distances. In order to characterize human mobility patterns emerging from
available trajectories, we used the radius of gyration as the characteristic travel dis-
tance covered by each individual, a measure of how far a car is from its center of mass
(mean location) [22]. Formally, the radius of gyration of a user u is defined as
, (1)
where r
represents the i =1,...,n
(t) positions recorded for the user u up to time
is the center of mass of the trajectory. For each user in the
dataset D
, we calculated his radius of gyration by taking all points composing his
sub-trajectories as the n
(t) recorded positions, where t refers to the full time period
covered by the data (from May first to May 31th). Then, we plotted the distribution
of r
, observing a power law with an exponential cutoff, P (r
) (r
66 The European Physical Journal Special Topics
Fig. 3. Spatial distribution of users’ center of mass on the map of central Italy. The Figure
clearly shows that vehicles having small radius of gyration (red points) tend to concentrate
their center of mass in the main urban centers of central Italy: Carrara, Pisa, Livorno,
Pistoia, Empoli, Siena, Grosseto, Arezzo and the pool of towns composing the conurbation
of Florence (Firenze, Prato, Sesto Fiorentino, Scandicci). Conversely, vehicles characterized
by high radius of gyration (green points) are distributed in the contryside and on the coast.
where r
=5.54, β =1.13 and τ =39.76 (Figure 2, center). Such curve agrees with the
previous results found on GSM data [4], confirming that the majority of users travel
within a small distance, but some of them carry out long journeys. The difference
between the predicted distribution and the observed behavior for people with low r
(up to 5 km) is presumably due to the tendency of covering small distances by foot,
bike, or bus, resulting in a low probabilty to find such travels in our dataset. Figure 3
shows the spatial distribution of the center of masses, with the color representing the
value of relative r
. People with high radius of gyration concentrate their center of
masses in the countryside, in the mountains (Appennines) and on the coast, whereas
those with lower r
are mainly located in urban areas. While the center of mass can be
interpreted as the center position of a vehicle when it is moving, another interesting
characteristic of individual mobility is the most frequent location L
, i.e. the zone
where a vehicle can be located with highest probability when it is stationary, most
likely his home or work. To estimate L
for a user u, we calculated all the locations
where he goes by extracting origin and destination points of his sub-trajectories,
without taking into consideration the time spent in each location by the vehicles.
Then, we applied on such points the Bisecting K-means clustering algorithm [23]. It
is an extension of the basic K-means algorithm that splits the set of all points in two
clusters, dividing recursively the obtained clusters until they have a radius smaller
Spatially Embedded Socio-Technical Complex Networks 67
Fig. 4. Spatial distribution of users’ most frequent locations on the map of central Italy.
The Figure shows that the most frequent location of the vehicles, regardless of their radius
of gyration, tends to concentrate in the main urban areas of central Italy, corresponding
to the cities of Carrara, Pisa, Livorno, Pistoia, Empoli, Siena, Grosseto, Arezzo and the
pool of towns composing the conurbation of Florence (Firenze, Prato, Sesto Fiorentino,
than or equal to a threshold, set in our experiments to 250 meters. The centroid of
the cluster with the highest frequency is chosen as L
for the user u.
The most frequent location does not necessarily coincide with the center of mass,
and the distance d(L
) tends to grow with the radius of gyration (Figure 2, right).
The strong correlation shown in Figure 2 (right) is not obvious, and it is presumably
due to the systematic nature of human motion. If a person travels arbitrarily on any
direction from and to the same preferential location, then the distance between c
and L
would tend to zero, and the radius of gyration would have no relation with
such distance. On the contrary, since each vehicle follows systematic travels among
few preferred places (see Figure 5, right), the center of mass is pulled by these trips
towards the mean point of the frequent locations. Therefore, the more a vehicle travels
away from its L
, the more the center of mass tends to be distant from the most
frequent location. Figure 2 (right) also suggests that people with low r
have a larger
probability to be located near the place where they live or work. On the contrary,
people traveling at large distances tend to be located in distant places, depending
on the fact that they are moving or not. Such phenomenon is confirmed by plotting
on the map L
instead of r
, and noting that points corresponding to users with
high radius of gyration move towards urban centers, showing a power of attraction
of cities on mobility by car (Figure 4). So, the majority of individuals tend to live or
68 The European Physical Journal Special Topics
Fig. 5. Left: the number the visited distinct location S(t)overtimefordierentr
S(t) grows as t
,withµ 0.3forr
1km and µ 0.65 ± 0.03 otherwise. Right: the
visitation frequency f
of the kth most visited location, for users that have been observed to
visit s = 40, 60, 80, 100 and 120 different locations. Empirical data are well approximated
by a Zipf s law, f
work within urban centers, but those characterized by high r
tend to stay far from
these places when they move.
In order to estimate the trend of people to visit new distinct locations, we extracted
the number of clusters S(t) visited by a user, finding a power law S(t) t
. For users
having a small r
(within 1 km), the exponent of power law is µ 0.3, while it grows
for users traveling at large distances from the center of mass, µ =0.65±0.03 (Figure 5,
left). In both cases, the fact that µ<1 indicates a decreasing tendency of users to
visit previously unvisited locations.
Moreover, the visitation frequencies of individuals, that measures to what extent
individuals return to the same place over and over, follow a Zipf’s like distribution
(Figure 5, right), confirming the pattern found in [5]. Clearly, the Zipf’s
regime is more evident for users with larger number of visited locations (above 80
visited location), whereas the frequency distribution of user with less destinations
does not allow us to state that it is compliant with a Zipf’s law, since it does not even
cover an order of magnitude.
The results of our analysis substantially confirm and refine the mobility patterns
found on GSM data [4], with a difference in the population of very short range travelers
which is underrepresented with respect to the prediction, and has a much slower
rate of visiting new places. This suggests, on the one hand, that movements by car
represents a significant portion of human travels, serving as a good social microscope
that enables us to observe habits, trends and patterns in human mobility behavior,
and on the other hand that it is worth investigating the particular tendency of very
short range travelers.
4 Inferring the traffic count by GPS data
GPS data representing movements of cars traveling within an urban territory could be
very useful to address urban traffic monitoring and prediction, provided that this data
are a trustable proxy of ground-truth. This is also important to assess the generality of
our analytical results: to what extent our 2% sample of tracked cars is representative
Spatially Embedded Socio-Technical Complex Networks 69
Fig. 6. Traffic sensed by a VMP device and GPS traffic volume in one of the twelve entry
gates of Pisa. The curves are very similar, suggesting that GPS data are a good approxima-
tion of real data.
of the overall mobility? Fortunately, there are many sensing technologies for proving
traffic counts on streets, which can be used to evaluate the ability of GPS data
to model local traffic, based on ground-truth. In this study, we use a dataset D
composed by logs collected in May 2011 from twelve Variable Message Panels (VMP).
VMPs are devices situated in the outer belt of the city of Pisa with the purpose of
counting all the vehicles entering the entry gates of the city. Therefore, we can assume
that they capture the real number of vehicles passing from the corresponding roads.
Each VMP stores information about the total count of vehicles per hour passing under
that device. Exploiting the spatial precision of GPS data, we simulated the number
of GPS vehicles crossing a VMP gate, by considering a spatial buffer of 30 meters
radius around the position of the road sensors, and by aggregating hour by hour the
number of GPS vehicles crossing those areas. Clearly, the GPS flow on a road arc can
be measured on the two ways. We considered only the incoming flux to the city, since
the VMP do not measure the outgoing traffic.
As the snapshot in Figure 6 suggests, there is a good match between the curves,
which essentially differ for a scaling factor.
In order to estimate such scaling, we considered the hourly vehicle count as a
discrete signal, i.e. a time series, and analyzed it through a discrete wavelet trans-
form (DWT) [24]. DWT is a mathematical tool that projects a time series onto a
collection of orthonormal basis functions and produces a set of coefficients, captur-
ing information from the time series at different frequencies and distinct times. For
a square-summable sequence x R
, the decomposition has the following general
x =
, xϕ
where c
= ϕ
, x =
(j)x(j) are the coefficients of the transform (j is a
time index denoting the instants of the time series), and ϕ
the basis functions
satisfying the orthonormality constraint ϕ
, ϕ
= δ[k l].
For each of the twelve entry gates of Pisa, we applied a Daubechies 5 algorithm
(db5) [25] to decompose in four layers the signal r
(n = 168, the number of
hours in a week) representing the first week of the i-th VMP time series,
, r
, (3)
where i =1,...,12 and m = 4. We repeated the same procedure on the twelve time
series g
representing the signals of the corresponding GPS locations around
70 The European Physical Journal Special Topics
Fig. 7. A graphical representation of two wavelet decompositions corresponding respectively
to the VMP (Left )andGPS(Right ) traffic count in one of the entry gates of Pisa.
, g
. (4)
Figure 7 shows two wavelet decompositions corresponding respectively to the real
(left) and sampled (right) traffic count in one of the twelve entry gates of Pisa.
For each location i, we divided layer by layer the relative coefficients, obtaining
four scaling factors for each of the twelve locations of interest:
. (5)
Such scaling factors can be used to scale a GPS signal in order to estimate the real
traffic volume (i.e. an estimation of the corresponding VMP signal). To this purpose,
the decomposition of a GPS signal is multiplied by the scaling factors as formalized
by the following equation:
. (6)
To show the validity of this approach we measured the error with respect to the
observed VMP traffic counts in all locations. In Figure 8 we plot the real VMP series,
the scaled GPS signal, and the measured relative error at a selected VMP location. As
can be noted, the error is low when the GPS traffic is high. During the night hours the
relative error tends to grow since there are too few circulating GPS vehicles, but the
absolute error is still negligible. However, for a traffic manager it is crucial to have
a precise estimation during the rush hours in order to design ad hoc intervention
to avoid congestions, a situation for which our reconstruction provide a very high
These results suggest that our sample is highly significative and represents a good
approximation of real traffic. We discovered that it is possible to generalize the ap-
proach to scale the GPS traffic observed in a single VMP location to the total traffic
entering the city, i.e. the total traffic measured by all the VMP sensors. This ap-
proach can enable a real time traffic estimation based on the observation of the GPS
vehicles alone, reducing the need for ad hoc installation of new panels. To generalize
the inference to the real time scenario, we need to create a model trained on historic
data capable of giving precise estimates for the traffic situations. To this aim, we
create the signal v, that represents the total real traffic obtained by summing hour
Spatially Embedded Socio-Technical Complex Networks 71
Fig. 8. Comparison between the real traffic volume and the scaled GPS signal at a single
location. The plot at the bottom shows the relative error with respect to the observed VMP
by hour the traffic volume of all VMP devices. Then, we divide the signal v into
two sub-signals: v
is the training set, used to learn the model; v
is the test set
used to evaluate the performance of the extracted model. As training set we used the
sub-signal corresponding to the first week of our dataset. The remaining weeks are
used as test set. Our method consists in learning a model by extracting the scaling
factors with equation (5) from the signal v
. Equation (6) is used to estimate the
signal of the unseen traffic. The resulting series is then compared with the real signal
. To evaluate the accuracy of our model we performed the evaluation using two
other approaches. A first approach is based on a Backpropagation Multilayer Feedfor-
ward Neural Network [26]. A second approach is based on a naive predictor, which is
learned on the training set by extracting a single day model: the model is computed
by averaging, for each hour, the values observed during the training week. Clearly,
this trivial model is not robust against local variations and we use this as a base-
line, and to show the adaptability of our approach to the fluctuation of traffic flows.
Figure 9 compares the estimations made by the three approaches. Our DWT method
maintains the general shape of the curve but tends to overestimate the real traffic, in
particular during the daylight hours. The ANN approach, on the contrary, provides
volumes that are comparable with the VMP observations but it does not preserve the
general shape of traffic. Finally, the naive approach shows how the phenomenon can
not be captured by a static model. Although the shape of the naive curve is similar
to the VMP curve, it tends to underestimate the real traffic and in particular the
first peak in the morning, corresponding to people going to work. Furthermore, since
it is a simple daily mean of the traffic in the observation period, it is not able to
discriminate between working and nonworking days.
Of the three approaches, ours gives a better approximation of evolution of traffic
during the week capturing the crucial peaks during the rush hours. The slight overes-
timation can be also acceptable in a monitoring scenario where, for example, an alert
72 The European Physical Journal Special Topics
Fig. 9. Comparisons of real traffic data against our approach (upper), a neural network
approach (center), and a naive approach (bottom).
should be raised when a given threshold is crossed, because the threshold can be set
according to the expected over-estimation.
5 Conclusions
In this paper we studied the patterns of human mobility by car using a large dataset
of GPS traces collected in central Italy. Since the GPS data pertain only to private
cars movements, we used our data to assess the validity of the general laws of mobility
derived from individual movements observed by means of GSM data. Moreover, we
focused on the analysis of local behavior and validity of the dataset by comparing the
observations with the ground-truth provided by real-traffic sensors. The experimental
settings showed a close correlation between the real traffic volume and the scaled GPS
flows obtained by means of a machine learning approach. The final part of the paper
introduces a method, based on historic data analysis, to monitor real time traffic.
The authors wish to thank Lorenzo Gabrielli for his invaluable technical support. We
also acknowledge Octo Telematics Italia Srl for providing the GPS data, and Pisamo
Spa, the mobility agency of Pisa, for providing the traffic count data. The research re-
ported in this article has been partially supported by European FP7 project DATASIM
1. F. Giannotti, M. Nanni, D. Pedreschi, F. Pinelli, C. Renso, S. Rinzivillo, R. Trasarti,
VLDB Journal 20, 695 (2011)
2. F. Giannotti, D. Pedreschi, Mobility, Data Mining and Privacy - Geographic Knowledge
Discovery, (Springer 2008)
Spatially Embedded Socio-Technical Complex Networks 73
3. D. Brockmann, L. Hufnagel, T. Geisel, Nature 439, 462 (2006)
4. M.C. Gonz´alez, C.A. Hidalgo, A.L. Barab´asi, Nature 454, 779 (2008)
5. C. Song, T. Koren, P. Wang, A.L. Barab´asi, Nat. Phys. 6, 818 (2010)
6. R. Trasarti, F. Pinelli, M. Nanni, F. Giannotti, in Proceedings of KDD, p. 1190 (2011)
7. S. Rinzivillo, S. Mainardi, F. Pezzoni, M. Coscia, D. Pedreschi, F. Giannotti, unstliche
Intelligenz, (to appear) (2012)
8. S. Rinzivillo, D. Pedreschi, M. Nanni, F. Giannotti, N. Andrienko, G. Andrienko,
Information Visualiz. 7, 225 (2008)
9. A. Bazzani, B. Giorgini, S. Rambaldi, R. Gallotti and L. Giovannini, J. Statist. Mechan.
Theory Exper., May 2010
10. K. Lee, S. Hong, S.J. Kim, I. Rhee, S. Chong, Demystifying levy walk patterns in human
walks, NCSU, Technical Report, 2008
11. M. Zignani, S. Gaito, On Wireless Days (WD), 2010 IFIP (October 2010), p. 1
12. N.J. Garber, L.A. Hoel, Traffic and Highway Engineering, (Brooks/Cole Publishing
Company, 1999)
13. N. Stamatiadis, D.L. Allen, J. Transp. Res. Board 1593, 23 (1997)
14. B.M. Williams, L.A. Hoel, J Transp. Eng. 129, 664 (2003)
15. B. Ghosh, B. Basu, M. O’Mahony, J. Transp. Eng. 133, 180 (2007)
16. Z. Wei, Z. Jinfu, in Proceedings of the IEEE International Conference on Grey Systems
and Intelligent Services (GSIS), p. 630 (2009)
17. J.H. Friedman, Intelligent local learning for prediction in high dimensions, International
Conference on Artificial Neural Networks (ICANN 95), Paris, 1995
18. H. Sun, H.X. Liu, H. Xiao, R.R. He, B. Ran, J. Transp. Res. Board 1836, 143 (2003)
19. P. Lingras, S. Sharma, M. Zhong, J. Transp. Res. Board 1805 16 (2002)
20. W.-C. Hong, Neurocomputing 74, 20962107 (2011)
21. J.P. Donate, X. Li, G.G. anchez, A.S. de Migue, Neural Comp. Applic. 10, 1 (2011)
22. M.A. Abramowicz, J.C. Miller Z. Stuchl´ık, Phys. Rev. D 47, 1440 (1993)
23. P.N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, 1 edn (Addison Wesley,
24. G. Strang, SIAM Review 31, 614 (1989)
25. N.M. Temme, Appl. Comput. Harmonic Anal. 4, 414428 (1997)
26. D. Svozil, V. Kvasnickab, J Pospichalb, Chem. Intell. Lab. Syst. 39, 43 (1997)
... Global Positioning System (GPS) traces generated by in-vehicle devices stand as a trade-off between these two extremes. Depending on the provider's market penetration, they can cover a representative fraction of the vehicle fleet 18 and allow the instantaneous speed and acceleration to be computed, which are then used within microscopic models to obtain emissions estimates with high spatiotemporal resolution. GPS traces describe human mobility in great detail [19][20][21][22] and offer an unprecedented tool to implement strategies such as reducing congestion, improving vehicle efficiency and shifting to lower-carbon options [23][24][25][26][27][28][29] . ...
... The trajectories were produced by onboard GPS devices, which automatically turn on when the vehicle starts, transmitting a point every minute to the server via a General Packet Radio Service connection [18][19][20] . When the vehicle stops, no points are logged or sent. ...
... The GPS traces are collected by a company that provides a data collection service for insurance companies. The market penetration of this service is variable, but, in general, it covers at least 2% of the total registered vehicles, and it is representative of the overall number of vehicles circulating in a city 18 . Figure 1a shows a sample of trajectories for 20 vehicles in Rome. ...
Full-text available
Vehicle emissions produce an important share of a city’s air pollution, with a substantial impact on the environment and human health. Traditional emission estimation methods use remote sensing stations, missing the full driving cycle of vehicles, or focus on a few vehicles. We have used GPS traces and a microscopic model to analyse the emissions of four air pollutants from thousands of private vehicles in three European cities. We found that the emissions across the vehicles and roads are well approximated by heavy-tailed distributions and thus discovered the existence of gross polluters, vehicles responsible for the greatest quantity of emissions, and grossly polluted roads, which suffer the greatest amount of emissions. Our simulations show that emissions reduction policies targeting gross polluters are far more effective than those limiting circulation based on an uninformed choice of vehicles. Our study contributes to shaping the discussion on how to measure emissions with digital data.
... Mobility is often estimated using the socalled Radius of Gyration (RoG), which is a measure of mobility volume and indicates the characteristic distance traveled by an individual [1]. In [18], the Radius of Gyration is used to characterize human mobility patterns emerging from available GPS trajectories. In [19], starting from nation-wide mobile phone data, authors investigate the correlations between the Radius of Gyration and external socio-economic indicators. ...
... Before discussing the specific findings, in this section we present the methodology that we have used and the kind of results obtained. This kind of analysis can be seen as borrowed from the Human Mobility field, where it emerges that the length of human travels can be described by a Power Law distribution in general [1,39] and by a Lognormal distribution if analyzed with a transport modality decomposition (walk, car, bike, train, airplane) [18,40]. In our analysis the scholars travel among the topics, modeled as the estimated locations. ...
... As presented in Table 5, the only distribution fit that is acceptable is the Lognormal one with a p-value = 0.03. This result is interesting also considering that several patterns related to the human mobility follow the same distribution [18]. Once detected that scholars' travels around research topics are well approximated by a Lognormal, we try to explain what are the implications of this result. ...
Full-text available
In the knowledge discovery field of the Big Data domain the analysis of geographic positioning and mobility information plays a key role. At the same time, in the Natural Language Processing (NLP) domain pre-trained models such as BERT and word embedding algorithms such as Word2Vec enabled a rich encoding of words that allows mapping textual data into points of an arbitrary multi-dimensional space, in which the notion of proximity reflects an association among terms or topics. The main contribution of this paper is to show how analytical tools, traditionally adopted to deal with geographic data to measure the mobility of an agent in a time interval, can also be effectively applied to extract knowledge in a semantic realm, such as a semantic space of words and topics, looking for latent trajectories that can benefit the properties of neural network latent representations. As a case study, the Scopus database was queried about works of highly cited researchers in recent years. On this basis, we performed a dynamic analysis, for measuring the Radius of Gyration as an index of the mobility of researchers across scientific topics. The semantic space is built from the automatic analysis of the paper abstracts of each author. In particular, we evaluated two different methodologies to build the semantic space and we found that Word2Vec embeddings perform better than the BERT ones for this task. Finally, The scholars’ trajectories show some latent properties of this model, which also represent new scientific contributions of this work. These properties include (i) the correlation between the scientific mobility and the achievement of scientific results, measured through the H-index; (ii) differences in the behavior of researchers working in different countries and subjects; and (iii) some interesting similarities between mobility patterns in this semantic realm and those typically observed in the case of human mobility.
... • Jump Length ∆r, the distance between two consecutive locations visited by an individual [28][29][30]. Formally, ∆r = d(r i , r i+1 ) is the geographical distance between two points r i and r i+1 in a trajectory. A truncated power-law well approximates the empirical distribution P(∆r) within a population of individuals, with the value of the exponent slightly varying based on the type of data and the spatial scale [28,29]. ...
... The r g of individual u defined as r g (u) = 1 n u ∑ n u i=1 d(r i , r cm ) 2 , where n u is the number of points in T u , r i ∈ T u and r cm = 1 n u ∑ n u i=1 r i is the position vector of the centre of mass of the set of points in T u . A truncated power-law well approximates the distribution of r g [29,30]. • Visits per Location V l , the relevance of a location described as its attractiveness at a collective level, indicating the popularity of locations according to how people visit them on the geographic space [31,45]. ...
... • Location Frequency f (r i ), the probability of an individual to visit a location r i [29], identifying the importance of a location to an individual's mobility: the most visited location (likely home or work) has rank 1, the second most visited location (e.g., school or local shop) has rank 2, and so on. The probability of finding an individual at a location of rank L is well approximated by P(L) ∼ 1/L [29,30]. • Waiting Time ∆t: the elapsed time between two consecutive visits of an individual u: ...
Full-text available
Modelling human mobility is crucial in several areas, from urban planning to epidemic modelling, traffic forecasting, and what-if analysis. Existing generative models focus mainly on reproducing the spatial and temporal dimensions of human mobility, while the social aspect, though it influences human movements significantly, is often neglected. Those models that capture some social perspectives of human mobility utilize trivial and unrealistic spatial and temporal mechanisms. In this paper, we propose the Spatial, Temporal and Social Exploration and Preferential Return model (STS-EPR), which embeds mechanisms to capture the spatial, temporal, and social aspects together. We compare the trajectories produced by STS-EPR with respect to real-world trajectories and synthetic trajectories generated by two state-of-the-art generative models on a set of standard mobility measures. Our experiments conducted on an open dataset show that STS-EPR, overall, outperforms existing spatial-temporal or social models demonstrating the importance of modelling adequately the sociality to capture precisely all the other dimensions of human mobility. We further investigate the impact of the tile shape of the spatial tessellation on the performance of our model. STS-EPR, which is open-source and tested on open data, represents a step towards the design of a mechanistic data-driven model that captures all the aspects of human mobility comprehensively.
... GPS data generated by in-vehicle devices describe human mobility in great detail [44,46,18,19,37] and offer an unprecedented tool to implement strategies such as reducing transport activity and congestions [10,39,56,6], improving vehicle efficiency, encouraging alternative fuels and electrification, and shifting to lower-carbon options [27,34,53,63]. Emissions from vehicles are traditionally studied with the use of two types of data: (i) measured traffic data, either coming from sensors [11], official sources [50], or household travel surveys [49]; and (ii) simulated traffic data, such as those generated with driving simulators [69], or traffic simulation models [2,54]. ...
... Our data consist of anonymous GPS trajectories describing 433,272 trips from 14,907 private vehicles moving during January 2017 in Greater London, Rome, and Florence throughout January 2017 (see Table 1). The trajectories are produced by on-board GPS devices, which automatically turn on when the vehicle starts, transmitting a point every minute to the server via a GPRS connection [44,46,18,19]. When the vehicle stops, no points are logged nor sent. ...
... The GPS traces are collected by a company that provides a data collection service for insurance companies. The market penetration of this service is variable, but in general covers at least 2% of the total registered vehicles, and it is representative of the overall amount of vehicles circulating in a city [44]. Figure 1a shows a sample of trajectories for 20 vehicles in Rome. ...
Full-text available
Vehicles' emissions produce a significant share of cities' air pollution, with a substantial impact on the environment and human health. Traditional emission estimation methods use remote sensing stations, missing vehicles' full driving cycle, or focus on a few vehicles. This study uses GPS traces and a microscopic model to analyse the emissions of four air pollutants from thousands of vehicles in three European cities. We discover the existence of gross polluters, vehicles responsible for the greatest quantity of emissions, and grossly polluted roads, which suffer the greatest amount of emissions. Our simulations show that emissions reduction policies targeting gross polluters are way more effective than those limiting circulation based on a non-informed choice of vehicles. Our study applies to any city and may contribute to shaping the discussion on how to measure emissions with digital data.
... The last decade has witnessed the emergence of massive datasets of digital traces that portray human movements at an unprecedented scale and detail. Examples include tracks generated by GPS devices embedded in personal smartphones (Zheng, Wang, Zhang, Xie, and Ma 2008), private vehicles (Pappalardo, Rinzivillo, Qu, Pedreschi, and Giannotti 2013) or boats (Fernandez Arguedas, Pallotta, and Vespe 2018); call detail records produced as a by-product of the communication between cellular phones and the mobile phone network (González, Hidalgo, and Barabási 2008;Barlacchi et al. 2015); geotagged posts from the most disparate social media platforms (Noulas, Scellato, Lambiotte, Pontil, and Mascolo 2012); even traces describing the sports activity of amateurs or professional athletes (Rossi, Pappalardo, Cintia, Iaia, Fernández, and Medina 2018). The availability of digital mobility data has attracted enormous interests from scientists of diverse disciplines, fueling advances in several applications, from computational health (Tizzoni et al. 2012;Barlacchi, Perentis, Mehrotra, Musolesi, and Lepri 2017) to the estimation of air pollution (Nyhan, Kloog, Britter, Ratti, and Koutrakis 2018;Bohm, Nanni, and Pappalardo 2021), from the design of recommender systems (Wang, Pedreschi, Song, Giannotti, and Barabasi 2011) to the optimization of mobile and wireless networks (Karamshuk, Boldrini, Conti, and Passarella 2011;Tomasini, Mahmood, Zambonelli, Brayner, and Menezes 2017), from transportation engineering and urban planning (Zhao, Tarkoma, Liu, and Vo 2016) to the estimation of migratory flows (Simini, González, Maritan, and Barabási 2012;Ahmed et al. 2016) and people's place of residence (Pappalardo, Ferres, Sacasa, Cattuto, and Bravo 2021), from the well-being status of municipalities, regions, and countries (Pappalardo, Vanhoof, Gabrielli, Smoreda, Pedreschi, and Giannotti 2016b;Voukelatou et al. 2020) to the prediction of traffic and future displacements (Zhang, Zheng, and Qi 2017;Rossi, Barlacchi, Bianchini, and Lepri 2019). ...
Full-text available
The last decade has witnessed the emergence of massive mobility datasets, such as tracks generated by GPS devices, call detail records, and geo-tagged posts from social media platforms. These datasets have fostered a vast scientific production on various applications of mobility analysis, ranging from computational epidemiology to urban planning and transportation engineering. A strand of literature addresses data cleaning issues related to raw spatiotemporal trajectories, while the second line of research focuses on discovering the statistical "laws" that govern human movements. A significant effort has also been put on designing algorithms to generate synthetic trajectories able to reproduce, realistically, the laws of human mobility. Last but not least, a line of research addresses the crucial problem of privacy, proposing techniques to perform the re-identification of individuals in a database. A view on state-of-the-art cannot avoid noticing that there is no statistical software that can support scientists and practitioners with all the aspects mentioned above of mobility data analysis. In this paper, we propose scikit-mobility, a Python library that has the ambition of providing an environment to reproduce existing research, analyze mobility data, and simulate human mobility habits. scikit-mobility is efficient and easy to use as it extends pandas, a popular Python library for data analysis. Moreover, scikit-mobility provides the user with many functionalities, from visualizing trajectories to generating synthetic data, from analyzing statistical patterns to assessing the privacy risk related to the analysis of mobility datasets.
... Imagery is also a good source for spatial data, as it has been used to analyze cars' parking angles [55] and to analyze unofficial parking [56]. Nonetheless, GPS traces from mobile service providers, known as mobile phone positioning, are often used as a proxy for vehicular journey trajectories [57,58]. ...
Full-text available
Transportation is a spatial activity. The geographic Information System (GIS) is the process of capturing, managing, analyzing, and presenting spatial data. GIS techniques are essential to the study of various aspects of transportation. In this entry, the state of knowledge regarding atomized transportation modes is presented. Atomized transportation modes are defined as transportation modes which deal with low passenger numbers.
... The digitalization of city services and ubiquitous computing provides numerous channels through which up-to-date human mobility data can be aggregated at various temporal and spatial scales (Luca et al., 2020). The ways in which mobility data can be aggregated include GPS trackers embedded in mobile phones (Zheng et al., 2010) or vehicles (Pappalardo et al., 2013), the connection of the mobile phone to the cellular network (González et al., 2008), and geo-tagged posts on social media (Blanford et al., 2015). ...
Full-text available
This article discusses the technology of city digital twins (CDTs) and its potential applications in the policymaking context. The article analyzes the history of the development of the concept of digital twins and how it is now being adopted on a city-scale. One of the most advanced projects in the field—Virtual Singapore—is discussed in detail to determine the scope of its potential domains of application and highlight challenges associated with it. Concerns related to data privacy, availability, and its applicability for predictive simulations are analyzed, and potential usage of synthetic data is proposed as a way to address these challenges. The authors argue that despite the abundance of urban data, the historical data are not always applicable for predictions about the events for which there does not exist any data, as well as discuss the potential privacy challenges of the usage of micro-level individual mobility data in CDTs. A task-based approach to urban mobility data generation is proposed in the last section of the article. This approach suggests that city authorities can establish services responsible for asking people to conduct certain activities in an urban environment in order to create data for possible policy interventions for which there does not exist useful historical data. This approach can help in addressing the challenges associated with the availability of data without raising privacy concerns, as the data generated through this approach will not represent any real individual in society.
Human mobility pattern analysis has received rising attention. However, little is known about the mobility patterns of private Electric Vehicle (EV) users. In response, this paper characterized mobility patterns of private EV users using a unique one-month dataset containing moving trajectories of 76,774 actual private EVs in January 2018 in Beijing. Specifically, we first explored the diversity, regularity, spatial extent, and uniqueness of EV users’ mobility patterns. The results suggested that most EV users had both regular travel and activity patterns (the mean travel and activity entropies were 2.17 and 1.83, respectively) with special preferences towards some specific activity locations relative to all the locations they visited (the mean number of activity locations visited was 13.57 in one month). Furthermore, they tended to perform activities within a small geographical area (the mean radius of gyration was 7.60 km) and have a short daily travel distance (the mean value was 37.35 km) relative to their electric driving range. Further, we associated EV users’ mobility patterns with the built environment through ordinary least squares and geographically weighted regression models, particularly considering the so-called modifiable areal unit problem (MAUP). Due to the MAUP, most of the statistically significant built environment variables varied across spatial analysis units (SAUs). Gymnasia was the only variable statistically associated with the mobility patterns for all SAUs; while the variables related to residence and workplace were not statistically associated.
Full-text available
Zusammenfassung Erweiterte Floating-Car-Daten (xFCD) bieten die Möglichkeit, Fahrzeuglärm aus den Daten zur Position, Geschwindigkeit und der Drehzahl des Motors eines Fahrzeugs zu ermitteln und die Ergebnisse u. a. visuell auszuwerten. Dazu werden in einer exemplarischen Fallstudie im Raum Mönchengladbach Daten des enviroCar-Projekts verwendet. Die Lärmkalkulation enthält die separate Berechnung von Rollgeräuschen und Antriebsgeräuschen der Fahrzeuge sowie deren energetische Addition zum Gesamtgeräusch. Zusätzlich sind das anzunehmende Alter der Straßen, die Temperatur und eventuell gefallener Niederschlag berücksichtigt. Die Resultate der räumlichen Lärmanalyse aus diesen xFCD werden mit „klassischen“ Lärmkarten verglichen. Häufig lassen sich die gleichen Lärmschwerpunkte wie z. B. größere Straßen oder Kreuzungen erkennen, auch wenn die Herangehensweisen zur Erstellung der Lärmkarten andere sind. Zudem zeigen wir exemplarisch Möglichkeiten, theoretisch vollständig elektrifizierten Verkehr mit Verkehr durch Verbrennerfahrzeuge hinsichtlich der Lärmemissionen zu vergleichen. Diese Studie zeigt, dass xFCD für die Analyse von Verkehrslärm erfolgreich genutzt werden können.
Human mobility models typically produce mobility data to capture human mobility patterns individually or collectively based on real-world observations or assumptions, which are essential for many use cases in research and practice, e.g., mobile networking, autonomous driving, urban planning, and epidemic control. However, most existing mobility models suffer from practical issues like unknown accuracy and uncertain parameters in new use cases because they are normally designed and verified based on a particular use case (e.g., mobile phones, taxis, or mobile payments). This causes significant challenges for researchers when they try to select a representative human mobility model with appropriate parameters for new use cases. In this paper, we introduce a MObility VERification framework called MOVER to systematically measure the performance of a set of representative mobility models including both theoretical and empirical models based on a diverse set of use cases with various measures. Based on a taxonomy built upon spatial granularity and temporal continuity, we selected four representative mobility use cases (e.g., the vehicle tracking system, the camera-based system, the mobile payment system, and the cellular network system) to verify the generalizability of the state-of-the-art human mobility models. MOVER methodically characterizes the accuracy of five different mobility models in these four use cases based on a comprehensive set of mobility measures and provide two key lessons learned: (i) For the collective level measures, the finer spatial granularity of the user cases, the better generalization of the theoretical models; (ii) For the individual-level measures, the lower periodic temporal continuity of the user cases, the theoretical models typically generalize better than the empirical models. The verification results can help the research community to select appropriate mobility models and parameters in different use cases.
Full-text available
The Seasonal Autoregressive Integrated Moving Average (SARIMA) model is one of the popular univariate time-series models in the field of short-term traffic flow forecasting. The parameters of the SARIMA model are commonly estimated using classical (maximum likelihood estimate and/or least square estimate) methods. In this paper, instead of using classical inference the Bayesian method is employed to estimate the parameters of the SARIMA model considered for modelling. In Bayesian analysis, Markov chain Monte Carlo method is used to solve the posterior integration problem in high dimension. Each of the estimated parameters from the Bayesian method has a probability density function conditional to the observed traffic volumes. The forecasts from the Bayesian model can better match the traffic behavior of extreme peaks and rapid fluctuation. Similar to the estimated parameters, each forecast has a probability density curve with the maximum probable value as the point forecast. Individual probability density curves provide a time-varying prediction interval unlike the constant prediction interval from classical inference. The time-series data used for fitting the SARIMA model are obtained from
Full-text available
The increased use of equipment having automatic vehicle classification capabilities and weigh-in-motion devices produces a large amount of new data that can provide some insights into understanding traffic patterns more efficiently. These data can be used in determining seasonal fic for highway cost allocation studies, and in predicting traffic volumes for roadways. However, due to practical limitations, it is not possible to are usually performed. Therefore, it is important to understand the relationship of the data obtained in a short-term period to those for the entire year. A study is currently under way that determines these relationships and develops seasonal adjustment factors for the state of Kentucky. The first step of the study was a survey of current uses of vehicle classification data and methods used for data collection throughout the United States. The results indicated that most states use no seasonal adjustment factors nor are they planning to develop any factors in the near future. At the same time, analysis of 2 years of vehicle classification data was used to develop seasonal and daily factors for Kentucky. Currently, the validation of these factors is under way. The preliminary analysis indicated that seasonal adjustment factors are essential in developing accurate estimates of traffic volumes for each vehicle type, and their use can improve the estimation of daily volumes.
Full-text available
Selection of appropriate input variables is a crucial step in developing the statistical or neural network model for short-term traffic prediction. Recently, genetic algorithms have provided some success in input variable selection. Extensive experimentation with recreational traffic volume projections from Banff National Park in Alberta, Canada, is reported. Genetic algorithms (GAs) were used to select a set of historical traffic volumes that had higher correlation to the next hourly traffic volume. Universal models developed using GAs were accurate within 10%, on average. Separation of time series for individual hours revealed a linear trend in traffic volumes. Genetically designed regression submodels for individual hours had average prediction errors of less than 1% for the training sets. Even the 95th-percentile errors for the test sets were between 2% and 8%. Many highway agencies expect to deploy an advanced traveler information system (ATIS) for all highway categories. On the basis of such accurate predictions of traffic conditions from an ATIS, recreational drivers will be able to reschedule their travel time as well as routes. Such rescheduling will alleviate stress caused by traffic congestion during recreational travel.
Full-text available
The technologies of mobile communications and ubiquitous computing pervade our society, and wireless networks sense the movement of people and vehicles, generating large volumes of mobility data. This is a scenario of great opportunities and risks: on one side, mining this data can produce useful knowledge, supporting sustainable mobility and intelligent transportation systems; on the other side, individual privacy is at risk, as the mobility data contain sensitive personal information. A new multidisciplinary research area is emerging at this crossroads of mobility, data mining, and privacy. This book assesses this research frontier from a computer science perspective, investigating the various scientific and technological issues, open problems, and roadmap. The editors manage a research project called GeoPKDD, Geographic Privacy-Aware Knowledge Discovery and Delivery, funded by the EU Commission and involving 40 researchers from 7 countries, and this book tightly integrates and relates their findings in 13 chapters covering all related subjects, including the concepts of movement data and knowledge discovery from movement data; privacy-aware geographic knowledge discovery; wireless network and next-generation mobile technologies; trajectory data models, systems and warehouses; privacy and security aspects of technologies and related regulations; querying, mining and reasoning on spatiotemporal data; and visual analytics methods for movement data. This book will benefit researchers and practitioners in the related areas of computer science, geography, social science, statistics, law, telecommunications and transportation engineering. © 2008 Springer-Verlag Berlin Heidelberg. All rights are reserved.
Full-text available
This article presents the theoretical basis for modeling univariate traffic condition data streams as seasonal autoregressive integrated moving average processes. This foundation rests on the Wold decomposition theorem and on the assertion that a one-week lagged first seasonal difference applied to discrete interval traffic condition data will yield a weakly stationary transformation. Moreover, empirical results using actual intelligent transportation system data are presented and found to be consistent with the theoretical hypothesis. Conclusions are given on the implications of these assertions and findings relative to ongoing intelligent transportation systems research, deployment, and operations.
Full-text available
Basic definitions concerning the multi-layer feed-forward neural networks are given. The back-propagation training algorithm is explained. Partial derivatives of the objective function with respect to the weight and threshold coefficients are derived. These derivatives are valuable for an adaptation process of the considered neural network. Training and generalisation of multi-layer feed-forward neural networks are discussed. Improvements of the standard back-propagation algorithm are reviewed. Example of the use of multi-layer feed-forward neural networks for prediction of carbon-13 NMR chemical shifts of alkanes is given. Further applications of neural networks in chemistry are reviewed. Advantages and disadvantages of multilayer feed-forward neural networks are discussed.
Full-text available
The paper investigates the possibilities of using clustering techniques in visual exploration and analysis of large numbers of trajectories, that is, sequences of time-stamped locations of some moving entities. Trajectories are complex spatio-temporal constructs characterized by diverse non-trivial properties. To assess the degree of (dis)similarity between trajectories, specific methods (distance functions) are required. A single distance function accounting for all properties of trajectories, (1) is difficult to build, (2) would require much time to compute, and (3) might be difficult to understand and to use. We suggest the procedure of progressive clustering where a simple distance function with a clear meaning is applied on each step, which leads to easily interpretable outcomes. Successive application of several different functions enables sophisticated analyses through gradual refinement of earlier obtained results. Besides the advantages from the sense-making perspective, progressive clustering enables a rational work organization where time-consuming computations are applied to relatively small potentially interesting subsets obtained by means of ‘cheap’ distance functions producing quick results. We introduce the concept of progressive clustering by an example of analyzing a large real data set. We also review the existing clustering methods, describe the method OPTICS suitable for progressive clustering of trajectories, and briefly present several distance functions for trajectories.
The traffic-forecasting model, when considered as a system with inputs of historical and current data and outputs of future data, behaves in a non-linear fashion and varies with time of day. Traffic data are found to change abruptly during the transition times of entering and leaving peak periods. Accurate and real-time models are needed to approximate the nonlinear time-variant functions between system inputs and outputs from a continuous stream of training data. A proposed local linear regression model was applied to short-term traffic prediction. The performance of the model was compared with previous results of nonparametric approaches that are based on local constant regression, such as the k-nearest neighbor and kernel methods, by using 32-day traffic-speed data collected on US-290, in Houston, Texas, at 5-min intervals. It was found that the local linear methods consistently showed better performance than the k-nearest neighbor and kernel smoothing methods.
Wavelets are new families of basis functions that yield the representation f(x)=∑b jk W(2 j x-k). Their construction begins with the solution ϕ (x) to a dilation equation with coefficients c k . Then W comes from ϕ, and the basis comes by translation and dilation of W. It is shown in Part 1 how conditions on the c k lead to approximation properties and orthogonality properties of the wavelets. Part 2 describes the recursive algorithms (also based on the c k ) that decompose and reconstruct f. The object of wavelets is to localize as far as possible in both time and frequency, with efficient algorithms.
Conference Paper
In this paper we analyze few GPS-based traces to infer human mobility patterns. We propose a clustering method to extract the main points of interest, called geo-locations, from GPS data. Starting from geo-locations we propose a definition of community, the geo-community, which captures the relation between a spatial description of human movements and the social context where users live. A statistical analysis of the principal characteristics of human walks provide the fitting distributions of distances covered by people inside a geo-location and among geo-locations and pause time. Finally we analyze factors influencing people when choosing successive location in their movement.