Article

Conflation of OpenStreetMap and Mobile Sports Tracking Data for Automatic Bicycle Routing

Authors:
  • Finnish Geospatial Research Institute / National Land Survey of Finland
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This article investigates how workout trajectories from a mobile sports tracking application can be used to provide automatic route suggestions for bicyclists. We apply a Hidden Markov Model (HMM)-based method for matching cycling tracks to a “bicycle network” extracted from crowdsourced OpenStreetMap (OSM) data, and evaluate its effective differences in terms of optimal routing compared with a simple geometric point-to-curve method. OSM has quickly established itself as a popular resource for bicycle routing; however, its high-level of detail presents challenges for its applicability to popularity-based routing. We propose a solution where bikeways are prioritized in map-matching, achieving good performance; the HMM-based method matched correctly on average 94% of the route length. In addition, we show that the extremely biased nature of the trajectory dataset, which is typical of volunteered user-generated data, can be of high importance in terms of popularity-based routing. Most computed routes diverged depending on whether the number of users or number of tracks was used as an indicator of popularity, which may imply varying preferences among different types of cyclists. Revising the number of tracks by diversity of users to surmount local biases in the data had a more limited effect on routing.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Topological approaches can overcome some errors of geometric approaches by additionally considering the connectivity of the network. The third group of map-matching algorithms employs advanced methods such as Hidden Markov Models in addition to geometric and topological approaches to generate the most likely route [17,18]. Most papers proposing map-matching algorithms include performance measures in their results (usually from crossvalidation), but external evaluations and comparisons of mapmatching performance are rare. ...
... Another recent study on road-based navigation systems using GPS records collected at speeds of 5 to 80 km/h concluded that advanced methods (particularly using processed data by Kalman filtering in a topological algorithm) perform better than geometric or topological algorithms [19]. Additional evidence for enhanced performance of advanced algorithms is provided by a study using cycling trajectories that compared a geometric algorithm to a Hidden Markov Modelbased advanced technique (with 90% correctly matched links by the advanced technique compared to 70% by the geometric algorithm, both adjusted to prefer cycling facilities) [17]. ...
... Some past studies only report if the map-matching algorithm successfully generates a route for a given trajectory [23], while others use visual inspections [15]. When the ground-truth route is known, studies have evaluated accuracy using a binary variable indicating a complete reproduction of the ground-truth route [14], or a continuous variable indicating the proportion of the ground-truth route correctly reproduced [17]. In the absence of ground-truth routes, studies have compared the generated routes to the GPS trajectory data to evaluate the similarity of attributes such as distance travelled [24]. ...
Article
Full-text available
Global Positioning System (GPS) data on walking and cycling trips can generate useful insights for transportation systems but require substantial processing. One of the key GPS data processing steps is “map‐matching”, or inference of the sequence of network links traversed during travel. The objective of this research is to evaluate the accuracy of existing map‐matching algorithms for GPS data on active travel. A method to flag erroneous map‐matching results without requiring ground‐truth data and improvements for active travel data are also proposed. Six map‐matching algorithms are applied to a sample of 63 trajectories, stratified on network density and average heading change, extracted from a large set of real‐world trips from metropolitan Vancouver, Canada. Results show that the best performing method is PgMapMatch, which can be further improved by adjustments to link costs and allowing wrong‐way travel. Two other algorithms have similarly accurate routes (70–90% accuracy, depending on the measure), but fail to generate routes for about a third of trips. The proposed error detection measure can be used (without ground truth data) to flag matched routes requiring visual inspection, with a recommendation to look for: Wrong‐way travel, missing links in the network data, and parallel facilities on the same street.
... Such large scale GPS data is 18 applied to find popular routes and to route cyclists using the popularity of the links in a 19 digital road graph (e.g. Baker et al. (2016) or Bergman and Oksanen (2016)). However, 20 the available research concentrates on recreational cycling, since GPS data used there is 21 collected mostly on recreational trips. ...
... Hence, data from both recreational as well as every day trips needs 25 to be included for a more comprehensive study and it is expected that different route choice 26 behaviour is present in these different situations. Furthermore, the popularity measures 27 used by Bergman and Oksanen (2016) is the link length divided through either the number 28 of unique cyclists, the number of routes passing the link, or a route count combined with 29 Simpson's diversity index. Those measures do not account for the fact that bicycle traffic 30 is typically most dense in the city centre and decreases towards the city borders. ...
Presentation
Good cycling infrastructure is seen as a prerequisite for a high modal share of cycling. However, while there are studies using stated preference data and small sets of GPS trajectories to evaluate cycling infrastructure and to develop route choice models, there is a gap in applying large GPS data sets for these purposes. In this paper we present such a methodology based on a popularity measure that evaluates the popularity of roads compared to alternative roads in the neighbourhood. We show how this measure can be applied to assess cycling infrastructure. We also show how it can be used to improve path size logit (PSL) route choice models and that adding this measure significantly improves the route set generation. Additionally, the popularity measure can be used to replace attributes for which data is unavailable while still achieving a similar model quality.
... And, this kind of human behavior information can be utilized to evaluate the popularity of non-motorized transport system. Recently, there are a lot of studies focus on big data of people transport behaviors and utilize this data to do urban planning and public services design [5], [6]. In this paper, we attempt to utilize this crowd sourced data and wisdom to evaluate human friendly degree of the urban roads. ...
... In paper [5], the authors provide three indicators to evaluate the cycling popularity of road segments. They use users number, tracks number, and Simpson's diversity index to evaluate the popularity. ...
Preprint
Full-text available
Non-motorized transport is becoming increasingly important in urban development of cities in China. How to evaluate the non-motorized transport popularity of urban roads is an interesting question to study. The great amount of tracking data generated by smart mobile devices give us opportunities to solve this problem. This study aims to provide a data driven method for evaluating the popularity (walkability and bikeability) of urban non-motorized transport system. This paper defines a p-index to evaluate the popular degree of road segments which is based on the cycling, running, and walking GPS track data from outdoor activities logging applications. According to the p-index definition, this paper evaluates the non-motorized transport popularity of urban area in Wuhan city within different temporal periods.
... Although this might a priori seem to be a dismissal of a considerable amount of information, it is similar to the value reported in other studies. Usyukov [35] removed about 10% of the original data; Bergman et al. [36] started with a total of 29,958 tracks, but discarding 22% of the total original data, resulting in 23,290 tracks. ...
... Fewer problems were related to a loss of signal by the GNSS device (22.98%). Our experience suggests that the number of discarded tracks may be a good indicator for evaluating the quality of a VGI source, and that a threshold of acceptable quality may be at around 20% of discarded tracks [35][36][37]. ...
Article
Full-text available
VGI (Volunteered Geographic Information) refers to spatial data collected, created, and shared voluntarily by users. Georeferenced tracks are one of the most common components of VGI, and, as such, are not free from errors. The cleaning of GNSS (Global Navigation Satellite System) tracks is usually based on the detection and removal of outliers using their geometric characteristics. However, according to our experience, user profile differentiation is still a novelty, and studies delving into the relationship between contributor efficiency, activity, and quality of the VGI produced are lacking. The aim of this study is to design a procedure to filter GNSS traces according to their quality, the type of activity pursued, and the contributor efficiency with VGI. Source data are obtained Wikiloc. The methodology includes tracks classification according mobility types, box plot analysis to identify outliers, bivariate user segmentation according to level of activity and efficiency, and the study of its spatial behavior using kernel-density maps. The results reveal that out of 44,326 tracks, 8096 (18.26%) are considered erroneous, mainly (73.02%) due to contributors’ poor practices and the remaining being due to bad GNSS reception. The results also show a positive correlation between data quality and the author’s efficiency collecting VGI.
... In particular, with the popularity of volunteered geographic information (VGI), geospatial data updating is undergoing a significant transformation from top-down active updating to bottom-up crowd updating [1]. On the one hand, crowdsourced geospatial data play an important role in generating 3D city models or highly precise road maps [2][3][4]. On the other hand, it is also essential to utilize the existing geospatial data to enrich the user-generated building information [5]. In practical applications, different departments or volunteers collect various geospatial datasets with different accuracies, different scales, or different thematic focuses. ...
Article
Full-text available
With the increasingly urgent demand for map conflation and timely data updating, data matching has become a crucial issue in big data and the GIS community. However, non-rigid deviation, shape homogenization, and uncertain scale differences occur in crowdsourced and official building data, causing challenges in conflating heterogeneous building datasets from different sources and scales. This paper thus proposes an automated building data matching method based on relaxation labelling and pattern combinations. The proposed method first detects all possible matching objects and pattern combinations to create a matching table, and calculates four geo-similarities for each candidate-matching pair to initialize a probabilistic matching matrix. After that, the contextual information of neighboring candidate-matching pairs is explored to heuristically amend the geo-similarity-based matching matrix for achieving a contextual matching consistency. Three case studies are conducted to illustrate that the proposed method obtains high matching accuracies and correctly identifies various 1:1, 1:M and M:N matching. This indicates the pattern-level relaxation labelling matching method can efficiently overcome the problems of shape homogeneity and non-rigid deviation, and meanwhile has weak sensitivity to uncertain scale differences, providing a functional solution for conflating crowdsourced and official building data.
... Using the residents' travel habits, they forecast the traffic demand of the city and provide theoretical support for the traffic control department in urban transportation planning. Bergman and Oksanen [37] combined the motion track data and Open Street Map (OSM) and applied them in the automatic travel route planning. Zhang et al. [38] analyzed the various areas of city residents in the travel law and in different periods by processing urban taxi track data in time-sharing segmentation to obtain information from all city residents who commute. ...
Article
Full-text available
The massive urban social management data with geographical coordinates from the inspectors, volunteers, and citizens of the city are a new source of spatio-temporal data, which can be used for the data mining of city management and the evolution of hot events to improve urban comprehensive governance. This paper proposes spatio-temporal data mining of urban social management events (USMEs) based on ontology semantic approach. First, an ontology model for USMEs is presented to accurately extract effective social management events from non-structured UMSEs. Second, an explorer spatial data analysis method based on “event-event” and “event-place” from spatial and time aspects is presented to mine the information from UMSEs for the urban social comprehensive governance. The data mining results are visualized as a thermal chart and a scatter diagram for the optimization of the management resources configuration, which can improve the efficiency of municipal service management and municipal departments for decision-making. Finally, the USMEs of Qingdao City in August 2016 are taken as a case study with the proposed approach. The proposed method can effectively mine the management of social hot events and their spatial distribution patterns, which can guide city governance and enhance the city’s comprehensive management level.
... In order to derive a weighted graph model reflecting the routing preferences of cyclists, Bergman and Oksanen (2016) have chosen a rather pragmatic approach by counting for each road segment the number of users and the number of trajectories using it. Based on these numbers, they have defined three different measures to derive edge weights, all of which are based on the assumption that highly used road segments should receive low weights. ...
Preprint
Full-text available
Understanding the criteria that bicyclists apply when they choose their routes is crucial for planning new bicycle paths or recommending routes to bicyclists. This is becoming more and more important as city councils are becoming increasingly aware of limitations of the transport infrastructure and problems related to automobile traffic. Since different groups of cyclists have different preferences, however, searching for a single set of criteria is prone to failure. Therefore, in this paper, we present a new approach to classify trajectories recorded and shared by bicyclists into different groups and, for each group, to identify favored and unfavored road types. Based on these results we show how to assign weights to the edges of a graph representing the road network such that minimum-weight paths in the graph, which can be computed with standard shortest-path algorithms, correspond to adequate routes. Our method combines known algorithms for machine learning and the analysis of trajectories in an innovative way and, thereby, constitutes a new comprehensive solution for the problem of deriving routing preferences from initially unclassified trajectories. An important property of our method is that it yields reasonable results even if the given set of trajectories is sparse in the sense that it does not cover all segments of the cycle network.
... In order to derive a weighted graph model reflecting the routing preferences of cyclists, Bergman and Oksanen (2016) have chosen a rather pragmatic approach by counting for each road segment the number of users and the number of trajectories using it. Based on these numbers, they have defined three different measures to derive edge weights, all of which are based on the assumption that highly used road segments should receive low weights. ...
Conference Paper
Full-text available
Understanding the criteria that bicyclists apply when they choose their routes is crucial for planning new bicycle paths or recommending routes to bicyclists. This is becoming more and more important as city councils are becoming increasingly aware of limitations of the transport infrastructure and problems related to automobile traffic. Since different groups of cyclists have different preferences, however, searching for a single set of criteria is prone to failure. Therefore, in this paper, we present a new approach to classify trajectories recorded and shared by bicyclists into different groups and, for each group, to identify favored and unfavored road types. Based on these results we show how to assign weights to the edges of a graph representing the road network such that minimum-weight paths in the graph, which can be computed with standard shortest-path algorithms, correspond to adequate routes. Our method combines known algorithms for machine learning and the analysis of trajectories in an innovative way and, thereby, constitutes a new comprehensive solution for the problem of deriving routing preferences from initially unclassified trajectories. An important property of our method is that it yields reasonable results even if the given set of trajectories is sparse in the sense that it does not cover all segments of the cycle network.
... The main reason that causes poor accuracy in reporting trajectory points is GPS signal as recorded by a receiver. However, in many cases, such location error at the processing phase can be solved by map matching methods [65]. The main principle behind map matching methods is to minimize the distance between the projected path on the map and the input trajectory [66][67][68]. ...
Article
Full-text available
Large volumes of trajectory-based data require development of appropriate data manipulation mechanisms that will offer efficient computational solutions. In particular, identification of meaningful geometric points of such trajectories is still an open research issue. Detection of these critical points implies to identify self-intersecting, turning and curvature points so that specific geometric characteristics that are worth identifying could be denoted. This research introduces an approach called Trajectory Critical Point detection using Convex Hull (TCP-CH) to identify a minimum number of critical points. The results can be applied to large trajectory data sets in order to reduce storage costs and complexity for further data mining and analysis. The main principles of the TCP-CH algorithm include computing: convex areas, convex hull curvatures, turning points, and intersecting points. The experimental validation applied to Geolife trajectory dataset reveals that the proposed framework can identify most of intersecting points in reasonable computing time. Finally, comparison of the proposed algorithm with other methods, such as turning function shows that our approach performs relatively well when considering the overall detection quality and computing time.
... On the other hand, VGI traces have a potential to be used for more advanced purposes such as bicycle routing (Bergman and Oksanen, 2016). The potential is based on three important properties of VGI traces. ...
Thesis
Full-text available
Nowadays, the need for very up to date authoritative spatial data has significantly increased. Thus, to fulfill this need, a continuous update of authoritative spatial datasets is a necessity. This task has become highly demanding in both its technical and financial aspects. In terms of road network, there are three types of roads in particular which are particularly challenging for continuous update: footpath, tractor and bicycle road. They are challenging due to their intermittent nature (e.g. they appear and disappear very often) and various landscapes (e.g. forest, high mountains, seashore, etc.).Simultaneously, GPS data voluntarily collected by the crowd is widely available in a large quantity. The number of people recording GPS data, such as GPS traces, has been steadily increasing, especially during sport and spare time activities. The traces are made openly available and popularized on social networks, blogs, sport and touristic associations' websites. However, their current use is limited to very basic metric analysis like total time of a trace, average speed, average elevation, etc. The main reasons for that are a high variation of spatial quality from a point to a point composing a trace as well as lack of protocols and metadata (e.g. precision of GPS device used).The global context of our work is the use of GPS hiking and mountain bike traces collected by volunteers (VGI traces), to detect potential updates of footpaths, tractor and bicycle roads in authoritative datasets. Particular attention is paid on roads that exist in reality but are not represented in authoritative datasets (missing roads). The approach we propose consists of three phases. The first phase consists of evaluation and improvement of VGI traces quality. The quality of traces was improved by filtering outlying points (machine learning based approach) and points that are a result of secondary human behaviour (activities out of main itinerary). Remained points are then evaluated in terms of their accuracy by classifying into low or high accurate (accuracy) points using rule based machine learning classification. The second phase deals with detection of potential updates. For that purpose, a growing buffer data matching solution is proposed. The size of buffers is adapted to the results of GPS point’s accuracy classification in order to handle the huge variations in VGI traces accuracy. As a result, parts of traces unmatched to authoritative road network are obtained and considered as candidates for missing roads. Finally, in the third phase we propose a decision method where the “missing road” candidates should be accepted as updates or not. This decision method was made in multi-criteria process where potential missing roads are qualified according to their degree of confidence. The approach was tested on multi-sourced VGI GPS traces from Vosges area. Missing roads in IGN authoritative database BDTopo® were successfully detected and proposed as potential updates
... These errors can cause significant errors and may limit the usability of the traces. On the other hand, the precision of these traces is continuously increasing and their interest is shown in the literature for purposes such as bicycle routing (Bergman & Oksanen 2016) or updating or enriching main roads from authoritative data (Al Bakri 2012;Liu et al. 2015;Van Winden et al. 2016). ...
Chapter
Full-text available
Crowdsourced traces collected by GPS devices during sports activities are now widely available on different websites. The goal of this paper is to study the potential of crowdsourced traces coming from GPS devices to highlight updates in authoritative geographic data. To reach this goal, an approach based on two steps is proposed. First, a data matching method is applied to match authoritative data and crowdsourced traces. Second, for the non-matched crowdsourced segments composing a trace, different criteria are defined to decide if whether or not, non-matched segments should be considered as an alert for update in authoritative data. The proposed approach is tested on crowdsourced traces and on BDTOPO® authoritative road and path network in mountain area. The results are promising: 727, 1 km of missing paths were found in the test area, which corresponds to 7.7% of the total length of used traces. The discovered missing paths also represent a contribution of 2.4% of the total length of BDTopo® road and path network in the test area.
... Those devices, including fitness trackers, smart phones, and alike, usually feature positioning sensors such as GPS as well as IMU sensors, such as accelerometers and gyroscopes. Tracking characteristics for sports thereby range from tracking distance with positioning sensors to step detection and counting when walking [4,5]. While extracting certain characteristics about a given sport by those sensors has previously been investigated, automatically extracting information for workout sessions with mixed workout types is more sparse [10]. ...
Chapter
Full-text available
Sports and workout activities have become important parts of modern life. Nowadays, many people track characteristics about their sport activities with their mobile devices, which feature inertial measurement unit (IMU) sensors. In this paper we present a methodology to detect and recognize workout, as well as to count repetitions done in a recognized type of workout, from a single 3D accelerometer worn at the chest. We consider four different types of workout (pushups, situps, squats and jumping jacks). Our technical approach to workout type recognition and repetition counting is based on machine learning with a convolutional neural network. Our evaluation utilizes data of 10 subjects, which wear a Movesense sensors on their chest during their workout. We thereby find that workouts are recognized correctly on average 89.9% of the time, and the workout repetition counting yields an average detection accuracy of 97.9% over all types of workout.
... Several companies offer sports-oriented platforms that can record the same travel observation data on either a dedicated bicycle computer or a smartphone app (Kitchel and Riordan 2014;Garmin 2018). These data sets provide an opportunity to address shortcomings in transportation data collected by agencies, but the use of the equipment and apps (often expensive) creates a significant sampling (or demographic) bias (Bergman and Oksanen 2016b). A review of the app users' demographics shows the primary representation of male, young, and middle-aged segments of the population (Griffin and Jiao 2015a, b;Boss et al. 2018). ...
Article
Full-text available
Emerging big data resources and practices provide opportunities to improve transportation safety planning and outcomes. However, researchers and practitioners recognise that big data from mobile phones, social media, and on-board vehicle systems include biases in representation and accuracy, related to transportation safety statistics. This study examines both the sources of bias and approaches to mitigate them through a review of published studies and interviews with experts. Coding of qualitative data enabled topical comparisons and reliability metrics. Results identify four categories of bias and mitigation approaches that concern transportation researchers and practitioners: sampling, measurement, demographics, and aggregation. This structure for understanding and working with bias in big data supports research with practical approaches for rapidly evolving transportation data sources.
... A similar mobile application called Sports Tracker was used in a separate study focussed on providing automatic popularity-based routing in Helsinki [89]. In this study, the full route trace of individuals was used, with a public dataset of nearly 30,000 routes from 1994 users. ...
Thesis
Full-text available
University download page: http://hdl.handle.net/11250/2612495 The prioritisation of bicycle-friendly infrastructure is now on the agenda of many policymakers seeking to capitalise on the advantages of cycling for transport. This thesis focusses upon how the improved availability, quality, and connectivity of infrastructure suitable for cycling can influence cycling behaviour at the city and neighbourhood level. Two key elements are necessary to understand the local-scale impact of bicycle infrastructure: the decision to bicycle in preference of other transport modes and the choice of route on the transport network. This thesis first addresses bicycle mode and route choice independently of each other before analysing the interaction between these elements in the context of bicycle infrastructure interventions. This article-based thesis is comprised of five research papers: four empirical studies and one literature review. Three of the empirical cases are based in the Norwegian city of Trondheim and the fourth is based in Oslo. Paper I addresses the modal shift of employees following a workplace relocation. Papers II and III are focused on bicycle route choice – firstly as a review of methods and then in connection with student route preferences. The two final papers, Papers IV and V, integrate both mode and route choice elements for the detailed analysis of neighbourhood scale effects resulting from the installation of bicycle lanes in Trondheim and Oslo respectively. The research uses a mixed methods approach, with a focus on empirical data to address the objectives of the thesis. Before and after travel surveys, web-based maps and GPS are the main means of data collection. Comparative analyses are performed using a Geographic Information System (GIS). Findings suggest that the decision to bicycle is to a significant extent determined by trip and spatial characteristics of the destination (Paper I). Route substitution is witnessed in both intervention studies (Papers IV and V), whilst significant changes (p < .05) in the modal share of cyclists is only witnessed in one (Paper IV), suggesting that it is mostly changes of route rather than mode that contribute to an individual intervention street’s change in bicycle volumes. Bicycle-specific infrastructure appears to be generally valued by all types of road users, however, the evidence suggests that public transport users and pedestrians are more willing to change their mode of transport assuming the only changes made are to the bicycle infrastructure (Papers I and IV). This suggests that much of the increase in the use of new bicycle infrastructure is the result of a reduction in the use of other sustainable transport modes. Many of the benefits associated with increased cycling are the result of reduced private car use, but for this to be achieved, it appears that initiatives beneficial for cyclists alone are insufficient. The means by which different transport infrastructure attributes can be researched and are valued by users are discussed by Papers II and III respectively. Paper II is a systematic review summarising the means through which revealed preference bicycle route choice data can be collected whilst Paper III evaluates four different Bicycle Level of Service (BLOS) methods for determining bicycle route choice. The latter study reveals that empirically founded BLOS methods with the most explanatory infrastructural attributes correspond best with actual route choices of university students. Of the tested BLOS methods, the Bicycle Compatibility Index is found to correspond best with actual route choice. Developing an understanding of the impacts of bicycle infrastructure can assist the prioritisation of limited city budgets towards the promotion of sustainable mobility behaviour. This research attempts to advance the state of the art for bicycle route choice research whilst also addressing the decision to bicycle for transportation purposes.
... These errors can cause significant errors and may limit the usability of the traces. On the other hand, the precision of these traces is continuously increasing and their interest is shown in the literature for purposes such as bicycle routing (Bergman & Oksanen 2016) or updating or enriching main roads from authoritative data (Al Bakri 2012;Liu et al. 2015;Van Winden et al. 2016). ...
... Conflation methods are traditionally focused on road network data [11,12] and snapping of mobile/GPS tracking data to road networks [13,14]. National mapping agencies list conflation as one of the technologies incorporated into map generalization processes [15]. ...
Article
Full-text available
Combining misaligned spatial data from different sources complicates spatial analysis and creation of maps. Conflation is a process that solves the misalignment problem through spatial adjustment or attribute transfer between similar features in two datasets. Even though a combination of digital elevation model (DEM) and vector hydrographic lines is a common practice in spatial analysis and mapping, no method for automated conflation between these spatial data types has been developed so far. The problem of DEM and hydrography misalignment arises not only in map compilation, but also during the production of generalized datasets. There is a lack of automated solutions which can ensure that the drainage network represented in the surface of generalized DEM is spatially adjusted with independently generalized vector hydrography. We propose a new method that performs the conflation of DEM with linear hydrographic data and is embeddable into DEM generalization process. Given a set of reference hydrographic lines, our method automatically recognizes the most similar paths on DEM surface called counterpart streams. The elevation data extracted from DEM is then rubbersheeted locally using the links between counterpart streams and reference lines, and the conflated DEM is reconstructed from the rubbersheeted elevation data. The algorithm developed for extraction of counterpart streams ensures that the resulting set of lines comprises the network similar to the network of ordered reference lines. We also show how our approach can be seamlessly integrated into a TIN-based structural DEM generalization process with spatial adjustment to pre-generalized hydrographic lines as additional requirement. The combination of the GEBCO_2019 DEM and the Natural Earth 10M vector dataset is used to illustrate the effectiveness of DEM conflation both in map compilation and map generalization workflows. Resulting maps are geographically correct and are aesthetically more pleasing in comparison to a straightforward combination of misaligned DEM and hydrographic lines without conflation.
... Wang and Zipf [39] used an algorithm to extract the building information in the OSM data for modeling, and the building interior details can be displayed by using the proposed method. In the study of path navigation, Bergman and Oksanen [40] took the OSM data and mobile sport tracking data as research objects, the hidden Markov model (HMM) based as a research method, pointed out that the OSM data has feasibility in bicycle path navigation. In terms of geographic mapping, Rosina et al. [41] took Slovenia and Austria as research objects, and added the OSM data to the Copernicus imperviousness layer to improve the population distribution map, drawing methods of the two countries. ...
Article
Full-text available
The evaluation of urban economies has been one key concern identified by scholars. In the past, most research methods on urban development assessments have been based on statistical data, and the analysis results have been presented in the form of statistical tables. Moreover, the development of urban road networks reflects the status of urban development and spatial metrics, which are obtained from the urban road network which can be used to evaluate the growth of the urban economy. The OpenStreetMap (OSM) is collected through crowdsourcing, and the OSM road network has the characteristics of a simplified and efficient approach to collect data, update data, free available data, etc. Therefore, in this paper, the OSM road network density is used as a spatial metric which is taken as the main study subject, to evaluate the economic development of Chinese cities. In our experiment, results show that there is a significant regression correlation between the OSM road network density and municipal gross domestic product (GDP). For the 85 selected Chinese cities, a total of 71 cities with residuals between −0.1 and 0.1 account for 83.53%, and a total of 79 cities with residuals between −0.2 and 0.2 account for 92.94%. Therefore, it is apparent that the OSM road network density can be used as a spatial metric to evaluate the municipal GDP, and as a result, can be used by local governments and scholars to estimate, evaluate, and forecast the urban economic development of China.
... Volunteered Geographic Information (VGI) has proven highly successful in enabling volunteers to contribute spatial data at very low cost (Goodchild 2007), particularly roads and other linear map features (Barrington-Leigh and Millard-Ball 2017). The availability of large amounts of VGI linear features as well as others has greatly facilitated many applications in routing and navigation (Bergman and Oksanen 2016), road network updates (Manandhar et al. 2019), emergency response (Lin et al. 2020), urban planning (Moeinaddini et al. 2014), and other fields (Fonte et al. 2015, Hong andYao 2019, Yang et attempted to investigate whether these existing metrics can fully detect quality issues under the reality of the rapid growth in crowdsourcing data. After all, not only does complexity of road networks increase in metropolitan areas, there is also an emergence of data sourced from uploaded driven GNSS tracks, out-of-copyright maps, street view imagery, and overhead photography. ...
Article
Full-text available
The majority of spatial data provided as Volunteered Geographic Information (VGI) are roads and other linear map features. Such data have been widely used in routing and navigation, road network update, emergency response, urban planning and more. Due to the lack of cartographic standards and issues with volunteer credibility, the quality of VGI linear features remains a concern and could seriously hinder the broad application of VGI data. This research proposes a comprehensive quality assessment framework for VGI linear features which adopts factor analysis to integrate two novel quality metrics with six other commonly used metrics, and further examines the spatial autocorrelation and semantic correlation of VGI linear feature quality. The OpenStreetMap road network of Allegheny County, Pennsylvania (USA) was selected as an example to test the proposed framework. Our results suggest that the proposed metrics, Box-counting dimension difference and Link accuracy are feasible for detecting quality issues and are important supplements to the common quality metrics. The findings also show that significant spatial autocorrelation exists in spatial completeness, positional accuracy, and logical consistency. Road type such as Tertiary, Residential, Service and Link has been proven to be a typical indicator of the different quality elements for VGI linear features.
... Oksanen, Bergman, Sainio, and Westerholm (2015) show that privacy-preserving heat maps can be generated from crowdsourced GNSS cycling trajectories, thus providing a way to visually communicate the popularity of different infrastructure with cyclists. Subsequently, Bergman and Oksanen (2016a) present an approach to utilize the data for popularity-based routing. Similarly, Baker et al. (2017) developed a process to model the appreciation of roads in a network as a way to improve routing for cyclists. ...
Article
Full-text available
Mobile activity tracking data, i.e. data collected by mobile applications that enable activity tracking based on the use of the Global Navigation Satellite Systems (GNSS), contains information on cycling in urban areas at an unprecedented spatial and temporal extent and resolution. It can be a valuable source of information about the quality of bicycling in the city. Required is a notion of quality that is derivable from plain GNSS trajectories. In this article, we quantify urban cycling quality by estimating the fluency of cycling traffic using a large set of GNSS trajectories recorded with a mobile tracking application. Earlier studies have shown that cyclists prefer to travel continuously and without halting, i.e. fluently. Our method extracts trajectory properties that describe the stopping behaviour and dynamics of cyclists. It aggregates these properties to segments of a street network and combines them in a descriptive index. The suitability of the data to describe the cyclists' behaviour with street-level detail is evaluated by comparison with various data from independent sources. Our approach to characterizing cycling traffic fluency offers a novel view on the cyclability of a city that could be valuable for urban planners, application providers, and cyclists alike. We find clear indications for the data's ability to estimate characteristics of city cycling quality correctly, despite behaviour patterns of cyclists not caused by external circumstances and the data's inherent bias. The proposed quality measure is adaptable for different applications, e.g. as an infrastructure quality measure or as a routing criterion.
... The geometric measures are the main measures used in matching based on the locations and shapes of objects. The Euclidean distance measures (Bergman and Oksanen 2016) and ratios of overlapping areas (Gösseln and Sester, 2005, Qi et al. 2010, Wang et al. 2013) have been widely employed in matching procedures to determine the matching candidates. Attributes and semantics refer to the non-geometric properties of objects, such as type and name, which are related to their functions (Li et al. 2018). ...
Article
Full-text available
The growth of georeferenced data sources calls for advanced matching methods to improve the reliability of geospatial data processing, such as map conflation. Existing matching methods mainly focus on similarity measures at the entity scale or area scale. A measure that combines entity-scale and area-scale similarities can provide sound matching results under various circumstances. In this paper, we propose a georeferenced-graph model that integrates multiscale similarities for data matching. Specifically, a match of correspondent data objects is identified by the entity-scale measure under the constraint of the area-scale measure. Nodes in the proposed georeferenced graph model represent polygons by their centroids, whereas the links in the graph connect the nodes (i.e. centroids) according to pre-defined rules. Then, we develop an algorithm to identify many-to-many matches. We demonstrate the proposed graph model and algorithm in real-world experiments using OpenStreetMap data. The experimental results show that the proposed georeferenced-graph model can effectively integrate the context and the location-and-form distance of geospatial data matches across different datasets.
... While loop tracks presumably represent recreational and sports cycling, A-to-B tracks are mainly utilitarian cycling, especially commuting (Bergman & Oksanen, 2016a). Third, the tracks are not evenly distributed between the users (Oksanen et al., 2015;Bergman & Oksanen, 2016b). While a small share of users have recorded hundreds of tracks, about 60% of users have recorded at most five tracks. ...
Conference Paper
Mobile sports tracking application data has become an attractive data source for cities seeking to understand patterns of active transportation and physical activity. However, to evaluate and enhance its usability, novel approaches are needed to better understand biases caused by non-random sampling. By investigating the definition of cyclists’ home locations based on their tracking behaviour, this paper provides a basis for future comparison of spatially aggregated home location data and population registry data. Ultimately, the aim is to understand the demographic representativeness of the tracking data, as well as the usability of population data in calibrating, for example, heat maps derived from sports tracking data. Using an interactive visual interface we compare two different rule-based home detection methods to uncover challenges related to episodic and heterogeneous movement data. Having inspected the home candidates of 100 randomly selected users, we could conclude that over 80% of the home locations were correctly detected using an approach based on the maximum number of tracks combined with temporal thresholds. The results emphasise the importance of understanding the characteristics of the data, and tuning the methods accordingly. Adjusting the temporal thresholds, removing tracks that represent mass events, and including information of land use, specifically residential areas, might solve most of the detected problems. In addition, we discuss how personal privacy could be enhanced within the suggested approach.
... On the other hand, they have a huge potential to be used for more advanced purposes. (Bergman and Oksanen, 2016). The aim of our research is to use crowdsourced GPS traces as a potential data source for highlighting updates in a referential road network of the French National Mapping Agency (IGN). ...
Conference Paper
Full-text available
Nowadays, crowdsourced GPS data are widely available in a huge amount. A number of people recording them has been increasing gradually, especially during sport and spare time activities. The traces are made openly available and popularized on social networks, blogs, sport and touristic associations' websites. However, their current use is limited to very basic metric analysis like total time of a trace, average speed, average elevation, etc. The main reasons for that are a high variation of spatial quality from a point to a point composing a trace and a need for referential data for evaluation of their quality. In this paper we present a novel approach for filtering and detection of outliers in crowdsourced GPS traces in order to assess their spatial quality intrinsically and make them more suitable for more advanced uses such as updating referential road network of French Mapping Agency – IGN. In addition, we propose a new definition of an outlier in GPS data, adapted to intrinsic assessment of spatial quality.
... A similar mobile application called Sports Tracker was used in a separate study focussed on providing automatic popularity-based routing in Helsinki [89]. In this study, the full route trace of individuals was used, with a public dataset of nearly 30,000 routes from 1994 users. ...
Article
Full-text available
One fundamental aspect of promoting utilitarian bicycle use involves making modifications to the built environment to improve the safety, efficiency and enjoyability of cycling. Revealed preference data on bicycle route choice can assist greatly in understanding the actual behaviour of a highly heterogeneous group of users, which in turn assists the prioritisation of infrastructure or other built environment initiatives. This systematic review seeks to compare the relative strengths and weaknesses of the empirical approaches for evaluating whole journey route choices of bicyclists. Two electronic databases were systematically searched for a selection of keywords pertaining to bicycle and route choice. In total seven families of methods are identified: GPS devices, smartphone applications, crowdsourcing, participant-recalled routes, accompanied journeys, egocentric cameras and virtual reality. The study illustrates a trade-off in the quality of data obtainable and the average number of participants. Future additional methods could include dockless bikeshare, multiple camera solutions using computer vision and immersive bicycle simulator environments.
Article
In many Brazilian cities, the most common procedure for planning cycling networks is using aggregated population data in census tracts, which may not take into account the true origin and destination of trips. It may also not identify potential users of a particular mode of transport. This is particularly important considering that implementing cycling infrastructures should be based on the assumption that they are able to meet the users' needs. Therefore, the aim of this study is to develop and adopt an objective method to design and compare cycling networks based on data-mining of disaggregated origin-destination data, GIS resources and multicriteria analysis techniques. The method follows three steps: a) identifying potential users based on real user profiles, b) designing proposed cycling networks and c) a comparison between the networks proposed in this study and those developed by the municipality selected as a case study, considering real and potential users, as well as cost and benefit criteria. As a positive outcome, using disaggregated data allows for a reasonable estimate of the number of people served by the networks, a detailed analysis of their proximity to the infrastructure, as well as identifying potential users. Comparing cycling networks considering cost and benefit criteria shows that the chosen criteria were effective. It was also determined that the cycling network of the studied city poorly serves bicycle transport users, if compared to the proposed networks. These findings indicate that appropriate methods for planning cycling networks are still needed.
Chapter
In the face of an exploding range of volunteered data initiatives, it is important to maintain good metadata and quality information in order to ensure the appropriate combination and reuse of the resulting data sets. At the same time, there is increasing evidence that validation and quality assessment of data (whether that data be volunteered or ‘official’) can sometimes be usefully crowdsourced, i.e. the required efforts can be distributed to a large number of people. However, as with VGI itself, maintaining the consistency, semantics and reliability of volunteered metadata presents a number of challenges. Initiatives which archive the history of features and tags (e.g. OpenStreetMap) lend themselves to some mapping of disputed features, but among citizen science projects in general there is often limited scope for users to comment on their own or others’ submissions in a consistent way which may be translated to any of the currently accepted geospatial metadata standards. At the same time, platforms which allow the publication of more ‘authoritative’ data sets, (e.g. Geonode and ArcGIS Online), have introduced the option of user comments and ratings. Volunteered metadata (on both authoritative and VG information) is potentially of huge value in assessing fitness for purpose, but some form of standardization is required in order to aggregate diverse ‘opinions’ on the content and quality of data sets and extracts the maximum value from this potentially vital resource. We discuss major challenges and present a set of examples of current practice which may assist in this aggregation.
Article
Navigation and orientation while walking in urban spaces pose serious challenges for blind pedestrians, sometimes even on a daily basis. Research shows the practicability of computerized weighted network route planning algorithms based on OpenStreetMap mapping data for calculating customized routes for blind pedestrians. While data about pedestrians and vehicle traffic flow at different times throughout the day influence the route choices of blind pedestrians, such data do not exist in OpenStreetMap. Quantifying the correlation between spatial structure and traffic flow could be used to fill this gap. As such, we investigated machine‐learning methods to develop a computerized model for predicting pedestrian traffic flow levels, with the objective of enriching the OpenStreetMap database. This article presents prediction results by implementing six machine‐learning algorithms based on parameters relating to the geometrical and topological configuration of streets in OpenStreetMap, as well as points‐of‐interest such as public transportation and shops. The Random Forest algorithm produced the best results, whereby 95% of the testing data were successfully predicted. These results indicate that machine‐learning algorithms can accurately generate necessary temporal data, which when combined with the available crowdsourced open mapping data could augment the reliability of route planning algorithms for blind pedestrians.
Article
Nowadays, biking is flourishing in many Western cities. While many roads are used for both cars and bicycles, buffered bike lanes are marked for the safety of cyclists. In many cities, segregated paths are built up to have physical separation from motor vehicles. These types of biking ways are regarded as attributes in geographic information system (GIS) data. This information is required and important in the service of route planning, as cyclists may prefer certain types of bikeways. This paper presents a framework for generating networks of bikeways with attribute information from the data collected on the collaborative street view data platform Mapillary. The framework consists of two layers: The first layer focuses on constructing a bikeway road network using Global Positioning System (GPS) information of Mapillary images. Mapillary sequences are classified into walking, cycling, driving (ordinary road), and driving (motorway) trajectories based on the transportation mode with a trained XGBoost classifier. The bikeway road network is then extracted from cycling and driving (ordinary road) trajectories using a raster-based method. The second layer focuses on extracting attribute information from Mapillary images. Cycling-specific information (i.e., bicycle signs/markings) is extracted using a two-stage detection and classification model. A series of quantitative evaluations based on a case study demonstrated the ability and potential of the framework for extracting bikeway road information to enrich the existing OSM cycling road data.
Article
Misclassification of features is a major source of uncertainty in OpenStreetMap (OSM). This study is an automated data-enrichment study whose primary goal is predicting road classes based on multi-classifier systems (MCSs). In this regard, fourteen parameters (thirteen centrality parameters and length) that were assumed to have the highest impact on the classes of the features were calculated for the features. Choosing Tehran, Iran, as the test case, no ground truth was available; therefore, the tags assigned by the Iranian identified experts were fed to several classifiers including random forest, decision tree, support vector machine (SVM), Artificial Neural Network (ANN), and naïve Bayes. Using the five-fold cross-validation method, the overall accuracy of the classifiers was 93.55%, 90.19%, 88.50%, 83.06%, and 25.06%, respectively. Using Gini importance showed that closeness, eccentricity, and length were the most important parameters affecting the classification. To further enhance the accuracy, MCSs were used to fuse the results. The employed methods were weighted majority voting, naïve Bayes, Dempster-Shaffer, decision template, and behavior knowledge space (BKS). Experimenting different fusion methods with different combinations of the input classifiers led to enhanced results, in which BKS being fed with the combination of SVM and random forest scored the highest with an accuracy of 97.19%. Also, the methodology was tested on two other major cities of Iran, namely Mashhad and Karaj, and the BKS fusion method resulted in the accuracy of 93.18% and 87.86%, respectively.
Chapter
Crowd-sourced data of high spatial and temporal resolution can provide a new basis for mobility analyses given that its various types of biases distorting the results are identified and adequately handled. In this paper, trajectory patterns that can affect the validity of mobile fitness app data are examined by means of cycling trajectories (n = 50,524) from the Helsinki Metropolitan Area, in Finland. In addition to mass events and group journeys, we evaluated the biasing effect of routes that have been repeatedly recorded by the same application user. Based on the results, repeatedly recorded commuting routes may skew fitness application data more than group patterns. Many of the changes in the frequencies and length distributions at different temporal granularities before and after extracting the ‘bias patterns’ were statistically significant. Also the skewed distribution of tracks among users (i.e. contribution inequality) became more even. The biases induced by behavioural patterns ought to be considered when evaluating the validity of fitness app data in analyses of general mobility behaviour and when designing value-added applications based on the data. Considering the trade-off between privacy and data accuracy regarding dissemination of sensitive crowd-sourced movement data, the findings emphasise the importance of preserving the possibility to detect individual-level phenomena in order to produce valid analysis results.
Article
Full-text available
Utilization of movement data from mobile sports tracking applications is affected by its inherent biases and sensitivity, which need to be understood when developing value-added services for, e.g., application users and city planners. We have developed a method for generating a privacy-preserving heat map with user diversity (ppDIV), in which the density of trajectories, as well as the diversity of users, is taken into account, thus preventing the bias effects caused by participation inequality. The method is applied to public cycling workouts and compared with privacy-preserving kernel density estimation (ppKDE) focusing only on the density of the recorded trajectories and privacy-preserving user count calculation (ppUCC), which is similar to the quadrat-count of individual application users. An awareness of privacy was introduced to all methods as a data pre-processing step following the principle of k-Anonymity. Calibration results for our heat maps using bicycle counting data gathered by the city of Helsinki are good (R2>0.7) and raise high expectations for utilizing heat maps in a city planning context. This is further supported by the diurnal distribution of the workouts indicating that, in addition to sports-oriented cyclists, many utilitarian cyclists are tracking their commutes. However, sports tracking data can only enrich official in-situ counts with its high spatio-temporal resolution and coverage, not replace them.
Article
Full-text available
User-generated content (UGC) platforms on the Internet have experienced a steep increase in data contributions in recent years. The ubiquitous usage of location-enabled devices, such as smartphones, allows contributors to share their geographic information on a number of selected online portals. The collected information is oftentimes referred to as volunteered geographic information (VGI). One of the most utilized, analyzed and cited VGI-platforms, with an increasing popularity over the past few years, is OpenStreetMap (OSM), whose main goal it is to create a freely available geographic database of the world. This paper presents a comprehensive overview of the latest developments in VGI research, focusing on its collaboratively collected geodata and corresponding contributor patterns. Additionally, trends in the realm of OSM research are discussed, highlighting which aspects need to be investigated more closely in the near future.
Article
Full-text available
Volunteered Geographic Information (VGI) projects and their crowdsourced data have been the focus of a number of scientific analyses and investigations in recent years. Oftentimes the results show that the collaboratively collected geodata of one of the most popular VGI projects, OpenStreetMap (OSM), provides good coverage in urban areas when considering particular completeness factors. However, results can potentially vary significantly for different world regions. In this article, we conduct an analysis to determine similarities and differences in data contributions and community development in OSM between 12 selected urban areas of the world. Our findings showed significantly different results in data collection efforts and local OSM community sizes. European cities provide quantitatively larger amounts of geodata and number of contributors in OSM, resulting in a better representation of the real world in the dataset. Although the number of volunteers does not necessarily correlate with the general population density of the urban areas, similarities could be detected while comparing the percentage of different contributor groups and the number of changes they made to the OSM project. Further analyses show that socio-economic factors, such as income, can have an impact on the number of active contributors and the data provided in the analyzed areas. Furthermore, the results showed significant data contributions by members whose main territory of interest lies more than one thousand kilometers from the tested areas.
Article
Full-text available
As open source volunteered geographic information continues to gain popularity, the user community and data contributions are expected to grow, e.g., CloudMade, Apple, and Ushahidi now provide OpenStreetMap (OSM) as a base layer for some of their mapping applications. This, coupled with the lack of cartographic standards and the expectation to one day be able to use this vector data for more geopositionally sensitive applications, like GPS navigation, leaves potential users and researchers to question the accuracy of the database. This research takes a photogrammetric approach to determining the positional accuracy of OSM road features using stereo imagery and a vector adjustment model. The method applies rigorous analytical measurement principles to compute accurate real world geolocations of OSM road vectors. The proposed approach was tested on several urban gridded city streets from the OSM database with the results showing that the post adjusted shape points improved positionally by 86%. Furthermore, the vector adjustment was able to recover 95% of the actual positional displacement present in the database. To demonstrate a practical application, a head-to-head positional accuracy assessment between OSM, the USGS National Map (TNM), and United States Census Bureau's Topologically Integrated Geographic Encoding Referencing (TIGER) 2007 roads was conducted.
Article
Full-text available
The proliferation of volunteered geographic information (VGI), such as OpenStreetMap (OSM) enabled by technological advancements, has led to large volumes of user-generated geographical content. While this data is becoming widely used, the understanding of the quality characteristics of such data is still largely unexplored. An open research question is the relationship between demographic indicators and VGI quality. While earlier studies have suggested a potential relationship between VGI quality and population density or socio-economic characteristics of an area, such relationships have not been rigorously explored, and mainly remained qualitative in nature. This paper addresses this gap by quantifying the relationship between demographic properties of a given area and the quality of VGI contributions. We study specifically the demographic characteristics of the mapped area and its relation to two dimensions of spatial data quality, namely positional accuracy and completeness of the corresponding VGI contributions with respect to OSM using the Denver (Colorado, US) area as a case study. We use non-spatial and spatial analysis techniques to identify potential associations among demographics data and the distribution of positional and completeness errors found within VGI data. Generally, the results of our study show a lack of statistically significant support for the assumption that demographic properties affect the positional accuracy or completeness of VGI. While this research is focused on a specific area, our results showcase the complex nature of the relationship between VGI quality and demographics, and highlights the need for a better understanding of it. By doing so, we add to the debate of how demographics impact on the quality of VGI data and lays the foundation to further work.
Article
Full-text available
I define big data with respect to its size but pay particular attention to the fact that the data I am referring to is urban data, that is, data for cities that are invariably tagged to space and time. I argue that this sort of data are largely being streamed from sensors, and this represents a sea change in the kinds of data that we have about what happens where and when in cities. I describe how the growth of big data is shifting the emphasis from longer term strategic planning to short-term thinking about how cities function and can be managed, although with the possibility that over much longer periods of time, this kind of big data will become a source for information about every time horizon. By way of conclusion, I illustrate the need for new theory and analysis with respect to 6 months of smart travel card data of individual trips on Greater London’s public transport systems.
Conference Paper
Full-text available
The potential for volunteer groups to contribute geographic data to National Mapping Agencies has been widely recognised. Several investigations have been done to determine the geometric accuracy of this data for the purposes of national mapping. Beyond accuracy, from a production perspective National Mapping Agencies will also be interested in the sufficiency and uniformity of the data. This paper presents an investigation of whether presently geographic data generated by volunteers is uniform across a country and whether the rate of production of data is consistent. For the purpose of the test, changes in data of South Africa from OpenStreetMap are analysed for the period 2006 to 2011. Here only point and line data are considered. The results generally show that the rate at which data is generated varies in space and time. The results also confirm that volunteers emphasise on the capture of certain information and that the capture doesn't average out as might be expected. The results also showed that social events, such as a World Cup, also have the effect of spurring the generation of volunteer geographic data. The implication of these results for National Mapping Agencies is that they cannot treat volunteer geographic information as being of a uniform standard. How National Mapping Agencies respond to this will have to be the subject of other investigations.
Article
Full-text available
The emergence and ubiquitary availability of geotechnologies yield an explosion of user generated geographical data, utilized for mapping, mod-eling etc. Using a well mapped German city in OpenStreetMap as an ex-ample, this research models the positional accuracy of locations of road junctions, whereas a statistical comparative approach with high precise survey data and commercial Tele Atlas data is conducted. The Open-StreetMap and Tele Atlas data showed similar spatial deviations and both do not coincide with the survey data. Especially, OpenStreetMap sug-gested spatial heterogeneity in the error distribution, leading to significant clusters of high and low positional accuracy.
Article
Full-text available
The assessment of OpenStreetMap (OSM) data quality has become an interdisciplinary research area over the recent years. The question of whether the OSM road network should be updated through periodic data imports from public domain data, or whether the currency of OSM data should rather rely on more traditional data collection efforts by active contributors, has led to perpetual debates within the OSM community. A US Census TIGER/Line 2005 import into OSM was accomplished in early 2008, which generated a road network foundation for the active community members in the US. In this study we perform a longitudinal analysis of road data for the US by comparing the development of OSM and TIGER/Line data since the initial TIGER/Line import. The analysis is performed for the 50 US states and the District of Columbia, and 70 Urbanized Areas. In almost all tested states and Urbanized Areas, OSM misses roads for motorized traffic when compared with TIGER/Line street data, while significant contributions could be observed in pedestrian related network data in OSM compared with corresponding TIGER/Line data. We conclude that the quality of OSM road data could be improved through new OSM editor tools allowing contributors to trace current TIGER/Line data.
Article
Full-text available
When providing directions to a place, web and mobile mapping services are all able to suggest the shortest route. The goal of this work is to automatically suggest routes that are not only short but also emotionally pleasant. To quantify the extent to which urban locations are pleasant, we use data from a crowd-sourcing platform that shows two street scenes in London (out of hundreds), and a user votes on which one looks more beautiful, quiet, and happy. We consider votes from more than 3.3K individuals and translate them into quantitative measures of location perceptions. We arrange those locations into a graph upon which we learn pleasant routes. Based on a quantitative validation, we find that, compared to the shortest routes, the recommended ones add just a few extra walking minutes and are indeed perceived to be more beautiful, quiet, and happy. To test the generality of our approach, we consider Flickr metadata of more than 3.7M pictures in London and 1.3M in Boston, compute proxies for the crowdsourced beauty dimension (the one for which we have collected the most votes), and evaluate those proxies with 30 participants in London and 54 in Boston. These participants have not only rated our recommendations but have also carefully motivated their choices, providing insights for future work.
Conference Paper
Full-text available
Our proposed system CrowdPath is based on the hypothesis that peo-ple know their commute area better than conventional routing services that use traditional digital roadmaps and shortest path algorithms. The knowledge and ex-periences of drivers reflected in volunteered commute routes may provide better routes. By leveraging such available volunteered geographic information (VGI), our goal is to investigate next-generation routing services to further reduce travel time, fuel consumption, and improve navigation. Previous related work summa-rizes GPS tracks into a landmark graph which is used for answering routing queries. In contrast, CrowdPath directly queries a collection of map-matched GPS tracks to recommend paths from a source location to a destination. Our evaluation using real GPS tracks illustrates the promise of CrowdPath in significantly reduc-ing travel time compared to routes from common routing providers. In the future, CrowdPath may be extended to adapt route recommendations by start time and provide safe paths using volunteered crime and accident reports.
Article
Full-text available
Understanding travel behaviour is significant in travel demand management as well as in urban and transport planning. Over the past decade, with the advancement of data collection techniques, such as GPS, transit smart cards, and mobile phones, various types of travel trajectory data are increasingly complementing or replacing conventional travel diaries and stated preference data. Other location-aware data are used in studying human movement patterns, such as social network check-in data and banknote dispersal data. Abundance of the emerging trajectory data has driven a new wave of travel behaviour research, and introduced new research problems. This paper provides a state-of-the-art review of the travel behaviour studies categorised by trajectory data types. Based on the literature review, research challenges are discussed and promising research topics in this field are proposed for future studies.
Conference Paper
Full-text available
The inaccuracy of manually created digital road maps is a persistent problem, despite their high economic value. We present CrowdAtlas, which automates map update based on people's travels, either individually or crowdsourced. Its mobile navigation app detects significant portions of GPS traces that do not conform to the existing map, as determined by state-of-the-art Viterbi map matching. When there is sufficient evidence collected, map inference algorithms can automatically update the map. The CrowdAtlas server aggregates exceptional traces from users with the navigation app as well as from other, large-scale data sources. From these it automatically generates high quality map updates, which can be propagated to its navigation app and other interested applications. Using CrowdAtlas app, we mapped out a 4.5 km^2 street block in Shanghai in less than half an hour and built a walking/cycling map of the SJTU campus. Using taxi traces collected from Beijing, we contributed completely computer-generated roads for this large, 61 km of missing roads to OpenStreetMap, the first set of open-source map community.
Conference Paper
Full-text available
In many Intelligent Transportation System (ITS) applications that crowd-source data from probe vehicles, a crucial step is to accurately map the GPS trajectories to the road network in real time. This process, known as map-matching, often needs to account for noise and sparseness of the data because (1) highly precise GPS traces are rarely available, and (2) dense trajectories are costly for live transmission and storage. We propose an online map-matching algorithm based on the Hidden Markov Model (HMM) that is robust to noise and sparseness. We focused on two improvements over existing HMM-based algorithms: (1) the use of an optimal localizing strategy, the variable sliding window (VSW) method, that guarantees the online solution quality under uncertain future inputs, and (2) the novel combination of spatial, temporal and topological information using machine learning. We evaluated the accuracy of our algorithm using field test data collected on bus routes covering urban and rural areas. Furthermore, we also investigated the relationships between accuracy and output delays in processing live input streams. In our tests on field test data, VSW outperformed the traditional localizing method in terms of both accuracy and output delay. Our results suggest that it is viable for low latency applications such as traffic sensing.
Article
Full-text available
Increasingly, location-aware datasets are of a size, variety, and update rate that exceeds the capability of spatial computing technologies. This paper addresses the emerging challenges posed by such datasets, which we call Spatial Big Data (SBD). SBD examples include trajectories of cellphones and GPS devices, vehicle engine measurements, temporally detailed road maps, etc. SBD has the potential to transform society via next-generation routing services such as eco-routing. However, the envisaged SBD-based next-generation routing services pose several significant challenges for current routing techniques. SBD magnifies the impact of partial information and ambiguity of traditional routing queries specified by a start location and an end location. In addition, SBD challenges the assumption that a single algorithm utilizing a specific dataset is appropriate for all situations. The tremendous diversity of SBD sources substantially increases the diversity of solution methods. Newer algorithms may emerge as new SBD becomes available, creating the need for a flexible architecture to rapidly integrate new datasets and associated algorithms.
Article
Full-text available
This article defines dual-concept diversity as a two-dimensional construct that holds a central place of study in many fields, including communication. The authors present 12 measures of dual-concept diversity appearing in the literature and assess the differential sensitivity of these measures in capturing the two dimensions. After assessing each measure and eliminating measures that are redundant or computationally intractable, the article compares the remaining measures of diversity in a time series of 30 years of network radio programming. Graphic and statistical interrelationships are presented to facilitate comparison and choice between measures in future research.
Article
Full-text available
Zipf's law governs many features of the Internet. Observations of Zipf distributions, while interesting in and of themselves, have strong implications for the design and function of the Internet. The connectivity of Internet routers influences the robustness of the network while the distribution in the number of email contacts affects the spread of email viruses. Even web caching strategies are formulated to account for a Zipf distribution in the number of requests for webpages.
Article
Full-text available
Setting the scene The successful collection of information by masses of volunteering individuals enabled by Web technology (otherwise referred to as Web 2.0) does not halt before the realm of geographic information. Information resources available for instance in the online-encyclopedia Wikipedia or the photo-sharing platform flickr are currently being extended with geographic information or geotags at an impressive rate. 1 Even more remarkable in the given context, several projects concentrating solely on the collection of geographic information have formed. Goodchild (2007) gives an overview of these global collaborations and calls the phenomenon Voluntary Geographic Information (VGI). One of the most striking and sophisticated examples of VGI is the OpenStreetMap (OSM) project started in 2004. It aims at creating and collecting free vector geodata covering the whole planet. Its means are ordinary citizens vested with GPS-devices logging coordinates, out-of-copyright maps and aerial imagery provided by OSM-friendly companies (like Yahoo! Inc.). Deriving from these data sources geodata is then created. At the time of writing OSM counts ~60000 registered users: ~7000 of which have created or updated nodes and ~3000 have uploaded GPX tracks. Altogether the OSM dataset currently consists of roughly 270 Mio. nodes partly constituting 30 Mio. ways. 2 Haklay (2008) analysed the data quality of OSM data in England. One outcome of his analysis is the fact that against common expectation only very little quality assurance is being carried out upon the OSM data: Dividing England into grid cells of 1 km 2 , it turns out that 50% of the area of England has been mapped by individual persons and 89.5% by only up to three individuals. Due to this and also due to its lack of completeness the dataset would not (yet) be suitable for more sophisticated purposes than 'cartographic products that display central areas of cities' (p.24).
Article
Full-text available
Planning travel to unfamiliar regions is a difficult task for novice travelers. The burden can be eased if the resident of the area offers to help. In this paper, we propose a social itinerary recommendation by learning from multiple user-generated digital trails, such as GPS trajectories of residents and travel experts. In order to recommend satisfying itinerary to users, we present an itinerary model in terms of attributes extracted from user-generated GPS trajectories. On top of this itinerary model, we present a social itinerary recommendation framework to find and rank itinerary candidates. We evaluated the efficiency of our recommendation method against baseline algorithms with a large set of user-generated GPS trajectories collected from Beijing, China. First, systematically generated user queries are used to compare the recommendation performance in the algorithmic level. Second, a user study involving current residents of Beijing is conducted to compare user perception and satisfaction on the recommended itinerary. Third, we compare mobile-only approach with Mobile+Cloud architecture for practical mobile recommender deployment. Lastly, we discuss personalization and adaptation factors in social itinerary recommendation throughout the paper. KeywordsSpatio-temporal data mining–GPS trajectories–Itinerary recommendation–Social recommendation
Article
Full-text available
In the US, the rise in motorized vehicle travel has contributed to serious societal, environmental, economic, and public health problems. These problems have increased the interest in encouraging non-motorized modes of travel (walking and bicycling). The current study contributes toward this objective by identifying and evaluating the importance of attributes influencing bicyclists’ route choice preferences. Specifically, the paper examines a comprehensive set of attributes that influence bicycle route choice, including: (1) bicyclists’ characteristics, (2) on-street parking, (3) bicycle facility type and amenities, (4) roadway physical characteristics, (5) roadway functional characteristics, and (6) roadway operational characteristics. The data used in the analysis is drawn from a web-based stated preference survey of Texas bicyclists. The results of the study emphasize the importance of a comprehensive evaluation of both route-related attributes and bicyclists’ demographics in bicycle route choice decisions. The empirical results indicate that travel time (for commuters) and motorized traffic volume are the most important attributes in bicycle route choice. Other route attributes with a high impact include number of stop signs, red light, and cross-streets, speed limits, on-street parking characteristics, and whether there exists a continuous bicycle facility on the route.
Conference Paper
Full-text available
Bicycling is an affordable, environmentally friendly alter-native transportation mode to motorized travel. A common task performed by bikers is to find good routes in an area, where the quality of a route is based on safety, efficiency, and enjoyment. Finding routes involves trial and error as well as exchanging information between members of a bike community. Biketastic is a platform that enriches this ex-perimentation and route sharing process making it both eas-ier and more effective. Using a mobile phone application and online map visualization, bikers are able to document and share routes, ride statistics, sensed information to infer route roughness and noisiness, and media that documents ride experience. Biketastic was designed to ensure the link between information gathering, visualization, and bicycling practices. In this paper, we present architecture and algo-rithms for route data inferences and visualization. We eval-uate the system based on feedback from bicyclists provided during a two-week pilot.
Conference Paper
Full-text available
Traffic delays and congestion are a major source of ineffi- ciency, wasted fuel, and commuter frustration. Measuring and localizing these delays, and routing users around them, is an important step towards reducing the time people spend stuck in traffic. As others have noted, the proliferation of commod- ity smartphones that can provide location estimates using a variety of sensors—GPS, WiFi, and/or cellular triangulation— opens up the attractive possibility of using position samples from drivers' phones to monitor traffic delays at a fine spatio- temporal granularity. This paper presents VTrack, a system for travel time estimation using this sensor data that addresses two key challenges: energy consumption and sensor unrelia- bility. While GPS provides highly accurate location estimates, it has several limitations: some phones don't have GPS at all, the GPS sensor doesn't work in "urban canyons" (tall buildings and tunnels) or when the phone is inside a pocket, and the GPS on many phones is power-hungry and drains the battery quickly. In these cases, VTrack can use alter- native, less energy-hungry but noisier sensors like WiFi to estimate both a user's trajectory and travel time along the route. VTrack uses a hidden Markov model (HMM)-based map matching scheme and travel time estimation method that interpolates sparse data to identify the most probable road segments driven by the user and to attribute travel times to those segments. We present experimental results from real drive data and WiFi access point sightings gathered from a de- ployment on several cars. We show that VTrack can tolerate significant noise and outages in these location estimates, and still successfully identify delay-prone segments, and provide accurate enough delays for delay-aware routing algorithms. We also study the best sampling strategies for WiFi and GPS sensors for different energy cost regimes.
Article
Ubiquitous mobile devices, such as smartphones, led to an increased popularity of pedestrian-related routing applications over the past few years. Because pedestrians typically aim to minimize their walking distance, especially in nonrecreational and multimodal trips, pedestrian routing systems will be fully used only if they can find the correct shortest path and thus help to avoid unnecessary detours. The standard equipment of car navigation systems based on the Global Positioning System several years ago led to the availability of accurate street network data for car-based routing applications. However, pedestrian routing applications should consider pedestrian-related network segments besides those used by motorized traffic, including footpaths and pedestrian bridges. The authors of this paper performed a shortest-path analysis of pedestrian routes for cities in Germany and the United States. For a set of 1,000 randomly generated origin-destination pairs, the authors compared the lengths of pedestrian routes that were computed by different freely available network sources, such as OpenStreetMap and TIGER/Line data, and proprietary data sets, such as TomTom, NAVTEQ, and ATKIS. The results showed that freely available data sources such as OpenStreetMap provided a relatively comprehensive option for cities in which commercial pedestrian data sets were not yet available.
Article
Problem, research strategy, and findings: Planners need a clear understanding of what influences walking and bicycling behavior to develop effective strategies to increase use of those modes. Transportation practitioners have largely focused on infrastructure and the built environment, although researchers have found that attitudes are also very important. The theory of planned behavior (TPB) suggests that behavior such as active transportation results from a mixture of personal attitudes toward these modes, subjective norms, and a person's perceived behavioral control, giving us a way to conceptualize psychological factors that influence travel behavior. Using data from a random phone survey of three neighborhoods in Portland (OR), we test whether TPB explains the possible causal relationships among the built environment, socio-demographics, and active transportation. We find that both the built environment and demographics influence cycling and walking, although indirectly, by influencing attitudes and perceived behavioral control. Moreover, it is important to look at bicycle-specific infrastructure separately from other environmental characteristics. For example, relatively flat neighborhoods with well-connected, low-traffic streets and multiple destinations were associated with more frequent bicycling, but striped bike lanes were not.Takeaway for practice: Practitioners cannot rely solely on changing the environment to increase bicycling. Programs such as public events and individualized marketing that influence attitudes may be necessary to reinforce positive environmental features. This is particularly true for women and older adults. Moreover, adding bike lanes to an otherwise poor bicycling environment may not increase bicycling in any significant way.
Article
To better understand bicyclists’ preferences for facility types, GPS units were used to observe the behavior of 164 cyclists in Portland, Oregon, USA for several days each. Trip purpose and several other trip-level variables recorded by the cyclists, and the resulting trips were coded to a highly detailed bicycle network. The authors used the 1449 non-exercise, utilitarian trips to estimate a bicycle route choice model. The model used a choice set generation algorithm based on multiple permutations of path attributes and was formulated to account for overlapping route alternatives. The findings suggest that cyclists are sensitive to the effects of distance, turn frequency, slope, intersection control (e.g. presence or absence of traffic signals), and traffic volumes. In addition, cyclists appear to place relatively high value on off-street bike paths, enhanced neighborhood bikeways with traffic calming features (aka “bicycle boulevards”), and bridge facilities. Bike lanes more or less exactly offset the negative effects of adjacent traffic, but were no more or less attractive than a basic low traffic volume street. Finally, route preferences differ between commute and other utilitarian trips; cyclists were more sensitive to distance and less sensitive to other infrastructure characteristics for commute trips.
Article
Volunteered geographic information (VGI) data-sets are characterised by heterogeneity due to influences from technical, social, environmental or economic factors. As a result, mapping progress does neither follow a spatially nor a temporally equal distribution, and thus can be hardly measured or predicted. Positively stated, heterogeneity leads to interesting VGI data-sets revealing regional peculiarities such as diverse community activities. This work proposes an approach for identifying regionally and temporally different developments with respect to mapping progress. Regional mapping progress is measured with a modified version of a previously proposed model for classifying activity stages, which has been used as foundation for a massive spatial and temporal analysis of the worldwide OpenStreetMap contributions between the years 2006 and 2013. It also allows the evaluation of rural and unpopulated areas. Results reveal that regional mapping progress heavily depends on a number of distinct influences such as geographical or legal borders, data imports, unexpected events or diverse community developments. The work highlights regions with distinct results by revealing individual mapping stories.
Conference Paper
When providing directions to a place, web and mobile mapping services are all able to suggest the shortest route. The goal of this work is to automatically suggest routes that are not only short but also emotionally pleasant. To quantify the extent to which urban locations are pleasant, we use data from a crowd-sourcing platform that shows two street scenes in London (out of hundreds), and a user votes on which one looks more beautiful, quiet, and happy. We consider votes from more than 3.3K individuals and translate them into quantitative measures of location perceptions. We arrange those locations into a graph upon which we learn pleasant routes. Based on a quantitative validation, we find that, compared to the shortest routes, the recommended ones add just a few extra walking minutes and are indeed perceived to be more beautiful, quiet, and happy. To test the generality of our approach, we consider Flickr metadata of more than 3.7M pictures in London and 1.3M in Boston, compute proxies for the crowdsourced beauty dimension (the one for which we have collected the most votes), and evaluate those proxies with 30 participants in London and 54 in Boston. These participants have not only rated our recommendations but have also carefully motivated their choices, providing insights for future work.
Article
Expanding our knowledge about human mobility is essential for building efficient wireless protocols and mobile applications. Previous human mobility studies have typically been built upon empirical single-source data (e.g., cellphone or transit data), which inevitably introduces a bias against residents not contributing this type of data, e.g., call detail records cannot be obtained from the residents without cellphone activities, and transit data cannot cover the residents who walk or ride private vehicles. To address this issue, we propose and implement a novel architecture mPat to explore human mobility using multi-source data. A reference implementation of mPat was developed at an unprecedented scale upon the urban infrastructures of Shenzhen, China. The novelty and uniqueness of mPat lie in its three layers: (i) a data feed layer consisting of real-time data feeds from 24 thousand vehicles, 16 million smart cards and 10 million cellphones; (ii) a mobility abstraction layer exploring the correlation and divergence among the multi-source data to analyze and infer human mobility; and (iii) an application layer to improve urban efficiency based on the human mobility findings of the study. The evaluation shows that mPat achieves a 75% inference accuracy, and that its real-world application reduces passenger travel time by 36%.
Article
Current navigation systems/services allow drivers to keep track of their precise whereabouts and provide optimal routes to reach specified locations. A reliable map-matching algorithm is an indispensable and integral part of any land-based navigation system/service. This paper reviews existing map-matching algorithms with the aim of highlighting their qualities as well as unfolding their unresolved issues as a means to provide directions for future studies in this field. Existing map-matching algorithms are compared and contrasted with respect to positioning sensors, map qualities, assumptions and accuracies. The results of these comparisons provide interesting insights into the workings of existing algorithms and the issues they must address for improving their performance. Example findings are: (a) not all map-matching algorithms pay sufficient attention to topology of networks, directionality of roads or turn-restrictions; (b) most map-matching algorithms make an unbalanced trade-off between performance and accuracy; and (c) weight-based map-matching algorithms balance simplicity and accuracy and advanced map-matching algorithms provide high accuracy but with low performance. Based on the findings, suggestions are made to improve existing algorithms.
Conference Paper
This paper describes a map matching program submitted to the ACM SIGSPATIAL Cup 2012. We first summarize existing map matching algorithms into three categories, and compare their performance thoroughly. In general, global max-weight methods using the Viterbi dynamic programming algorithm are the most accurate but the accuracy varies at different sampling intervals using different weight functions. Our submission selects a hybrid that improves upon the best two weight functions such that its accuracy is better than both and the performance is robust against varying sampling rates. In addition, we employ many optimization techniques to reduce the overall latency, as the scoring heavily emphasizes on speed. Using the training dataset with manually corrected ground truth, our Java-based program matched all 14,436 samples in 5 seconds on a dual-core 3.3 GHz iCore 3 processor, and achieved 98.9% accuracy.
Article
Current advancements in pervasive technologies allow users to create and share an increasing amount of whereabouts data. Thus, some rich datasets on human mobility are becoming available on the web. In this paper we extracted approximately 790,000 mobility traces from a web-based repository of GPS tracks—the Nokia Sports Tracker Service. Using data mining mechanisms, we show that this data can be analyzed to uncover daily routines and interesting schemes in the use of public spaces. We first show that our approach supports large-scale analysis of people’s whereabouts by comparing behavioral patterns across cities. Then, using Kernel Density Estimation, we present a mechanism to identify popular sport areas in individual cities. This kind of analysis allows us to highlight human-centered geographies that can support a wide range of applications ranging from location-based services to urban planning.