PreprintPDF Available

Synthetic population and travel demand for Paris and Île-de-France based on open and publicly available data

Authors:

Abstract

Synthetic populations of travelers and their detailed mobility behavior are an important basis for agent-based transport simulations, which are increasingly used in transport planning and research today. To date, research based on such simulations is rarely replicable as it is based on proprietary data and tools. To foster the discussion and steer research towards reproducible transport simulations, this paper introduces a process for generating a synthetic travel demand with individual households, persons, and their daily activity chains for Paris and its surrounding region Île-de-France — entirely based on open data and open software and replicable by any researcher. The resulting travel demand is published for others to use as a comprehensive data basis for agent-based transport simulations and as a test bed for population and demand synthesis algorithms. Furthermore, it is discussed how implicit correlation structures impact the potential use cases of the synthetic travel demand for simulation and analysis purposes and how the common practice of using population samples for downstream simulations affects the results.
Synthetic population and travel demand for Paris and Île-de-France based on open
and publicly available data
The published version can be obtained as open access from the following link:
https://www.sciencedirect.com/science/article/pii/S0968090X21003016?via%3Dihub
... In order to prepare the MATSim simulation inputs, and more importantly a simplified microscopic representation of the actual population, the Eqasim project, as developed by Hörl and Balac (2020), can be used. This project consists of a modeling pipeline of data consumption and processing steps, making the transition from available public census data and other open source databases to a fully functional MATSim simulation in the French context. ...
... generate a schedule, i.e. the sequence of activities that the agents will try to follow during the day, for every agent. Applying a procedure inspired by statistical matching algorithms, trip chains from household travel survey data are attached to the synthetic population (Hörl and Balac, 2020). ...
... As regards the multi-agent traffic simulation component, Zughe et al. (2019) showed the population scaling factor to be the most influential parameter in MATSim simulations. Some authors (Hörl and Balac, 2020;Ben-Dor et al., 2020Zughe et al., 2019) Lastly, a few limitations are specific to the Eqasim Project when using it to compute noise emissions, namely: ...
Article
In light of the growing concern about the adverse effects of noise pollution on health, a better understanding is needed of the relationships between urban transport and individual exposure. To improve the scientific community's modeling capabilities specific to this issue, we propose a noise exposure modeling framework that uses agent-based activity, multi-agent travel simulation and a European standardized noise emission and propagation model. Based on two open source software packages, MATSim and NoiseModelling, this framework aims to simulate the spatiotemporal distributions of daily individual activity and road traffic noise. The proposed approach allows to use the tools and methods proposed in the NoiseModelling software by importing MATSim outputs , therefore, taking full advantage of the development work carried out within the two communities. As such, it enables both characterizing the individual exposure to road traffic-related noise and investigating noise exposure inequality problems based on the attributes of individuals and their activities.
... For this study, the following three pre-existent scenarios are used: Munich Metropolitan Region (hereinafter abbreviated as MUC), Île-de-France (PAR), and San Francisco Bay Area (SFO), as listed in Table 1. While the MUC scenario has been authored by Moeckel et al. [36], PAR and SFO originate from Hörl and Balać [37], Balać and Hörl [38], with both sets of authors using different methods for generating and calibrating their respective scenarios. ...
... Hörl and Balać [37] developed and calibrated an agent-based scenario for Île-de-France. Their scenario was built using only publicly-accessible data from sources such as population census, national and local household travel surveys, and tax registries to form a synthetic population with activity plans; and general transit feed specification (GTFS) schedules and OpenStreetMap to generate a multi-modal transport network. ...
... Their scenario was built using only publicly-accessible data from sources such as population census, national and local household travel surveys, and tax registries to form a synthetic population with activity plans; and general transit feed specification (GTFS) schedules and OpenStreetMap to generate a multi-modal transport network. By using their selfdeveloped framework eqasim [42,43], which builds on MATSim's functionality, but replaces plan scoring with discrete mode-choice models [44,45], Hörl and Balać [37] obtained each agent's mobility choices by applying a multinomial logit model. The calibrated PAR scenario includes trips towards six activity types' home (41%), leisure (13%), work (13%), errand (13%), shop (11%), and education (8%), using four distinct modes of transportation walk (43%), car (33%), PT (22%), and bike (1%). ...
Article
Full-text available
The advent of electrified, distributed propulsion in vertical take-off and landing (eVTOL) aircraft promises aerial passenger transport within, into, or out of urban areas. Urban air mobility (UAM), i.e., the on-demand concept that utilizes eVTOL aircraft, might substantially reduce travel times when compared to ground-based transportation. Trips of three, pre-existent, and calibrated agent-based transport scenarios (Munich Metropolitan Region, Île-de-France, and San Francisco Bay Area) have been routed using the UAM-extension for the multi-agent transport simulation (MATSim) to calculate congested trip travel times for each trip’s original mode—i.e., car or public transport (PT)—and UAM. The resulting travel times are compared and allow the deduction of potential UAM trip shares under varying UAM properties, such as the number of stations, total process time, and cruise flight speed. Under base-case conditions, the share of motorized trips for which UAM would reduce the travel times ranges between 3% and 13% across the three scenarios. Process times and number of stations heavily influence these potential shares, where the vast majority of UAM trips would be below 50 km in range. Compared to car usage, UAM’s (base case) travel times are estimated to be competitive beyond the range of a 50-minute car ride and are less than half as much influenced by congestion.
... Currently, the main benefit of helicopter transport services seem to 33 be their relatively short travel times that-generally-seem unaffected by conventional 34 ground-based traffic congestion. 35 This study aims at exploring the potential travel time savings that various UAM 36 implementations might allow. The main objectives of this study are to provide answers 37 to the following three key research questions: 38 • ...
... but replaces plan scoring with discrete mode-choice models [41,42], Hörl and Balać [36] 198 obtained each agent's mobility choices by applying a multinomial logit model. The Table 1). ...
Preprint
The advent of electrified, distributed propulsion in vertical take-off and landing (eVTOL) aircraft promises aerial passenger transport within, into, or out of urban areas. Urban air mobility (UAM), i.e. the on-demand concept that utilizes eVTOL aircraft, might substantially reduce travel times when compared to ground-based transportation. Trips of three, pre-existent, and calibrated agent-based transport scenarios (Munich Metropolitan Region, Île-de-France, and San Francisco Bay Area) have been routed using the UAM-extension for the multi-agent transport simulation (MATSim) to calculate congested trip travel times for each trip's original mode - i.e. car or public transport (PT) - and UAM. The resulting travel times are compared and allow the deduction of potential UAM trip shares under varying UAM properties, such as the number of stations, total process time, and cruise flight speed. Under base case conditions, the share of motorized trips for which UAM would reduce the travel times ranges between 3% and 13% across the three scenarios. Process times and number of stations heavily influence these potential shares, where the vast majority of UAM trips would be below 50 km in range. Compared to car usage, UAM's (base case) travel times are estimated to be competitive beyond the range of a 50-minute car ride and are less than half as much influenced by congestion.
... Ziemke et al. [100] describe the process of initial demand generation for the Open Berlin scenario based on census data and commuter statistics. Hörl and Balac [101] describe an open-source workflow for generating fully replicable synthetic populations in the context of agent-based transport simulation. Another option is to use data that is generated by a privacy-compliant synthesization process from mobile phone data [102]. ...
Article
Full-text available
This paper presents a new methodology to derive and analyze strategies for a fully decarbonized urban transport system which combines conceptual vehicle design, a large-scale agent-based transport simulation, operational cost analysis, and life cycle assessment for a complete urban region. The holistic approach evaluates technical feasibility, system cost, energy demand, transportation time, and sustainability-related impacts of various decarbonization strategies. In contrast to previous work, the consequences of a transformation to fully decarbonized transport system scenarios are quantified across all traffic segments, considering procurement, operation, and disposal. The methodology can be applied to arbitrary regions and transport systems. Here, the metropolitan region of Berlin is chosen as a demonstration case. The first results are shown for a complete conversion of all traffic segments from conventional propulsion technology to battery electric vehicles. The transition of private individual traffic is analyzed regarding technical feasibility, energy demand and environmental impact. Commercial goods, municipal traffic and public transport are analyzed with respect to system cost and environmental impacts. We can show a feasible transition path for all cases with substantially lower greenhouse gas emissions. Based on current technologies and today’s cost structures our simulation shows a moderate increase in total systems cost of 13–18%.
... Ziemke et al. [99] describe the process of initial demand generation for the Open Berlin scenario based on census data and commuter statistics. Hörl and Balac [100] describe an open-source workflow for generating fully replicable synthetic populations in the context of agent-based transport simulation. Another option is to use data that is generated by a privacy-compliant synthesization process from mobile phone data [101]. ...
Preprint
Full-text available
This paper presents a new methodology to derive and analyze strategies for a fully decarbonized urban transport system which combines conceptual vehicle design, a large-scale agent-based transport simulation, operational cost analysis, and life cycle assessment for a complete urban region. The holistic approach evaluates technical feasibility, system cost, energy demand, transportation time and sustainability-related impacts of various decarbonization strategies. In contrast to previous work, the consequences of a transformation to fully decarbonized transport system scenarios are quantified across all traffic segments, considering procurement, operation and disposal. The methodology can be applied to arbitrary regions and transport systems. Here, the metropolitan region of Berlin is chosen as a demonstration case. First results are shown for a complete conversion of all traffic segments from conventional propulsion technology to battery electric vehicles. The transition of private individual traffic is analyzed regarding technical feasibility, energy demand and environmental impact. Commercial goods, municipal traffic and public transport are analyzed with respect to system cost and environmental impacts. We can show a feasible transition path for all cases with substantially lower greenhouse gas emissions. Based on current technologies and today's cost structures our simulation shows a moderate increase in total systems cost of 13-18%.
Conference Paper
Full-text available
This paper describes the steps to create a synthetic population of any region in California. By using only open data, and an open-source population synthesis pipeline, we ensure that the whole process can be easily repeated by others. This not only ensures reproducibility and transparency of the synthesis process, but also allows that studies using this population can be easily replicated. As agent-based models are gaining in popularity in recent times due to the rapid developments in the transportation sector, the need for convenient ways to generate synthetic individuals and their daily patterns has grown as well. We present our approach for two regions: nine-county San Francisco Bay area and San Diego County. The validation results show that the methodology used is suitable to replicate socio-demograpahics and activity patterns of the population. However, it also points to some limitations due to the lack of data and the methods used. Nevertheless, the approach could be a good complement to the local and regional transportation models, as it allows easy access and can be readily used in agent-based models.
Article
Full-text available
With the rise of smart cities, a number of new mobility services has emerged to drive changes in built environment policies. Up-to-date demand models are needed to capture the impact of these policies on emerging mobility-enabled travel patterns. The study explores modeling requirements to assess the impact of such built environment policies. A synthetic population of New York City with a tour-based mode choice model was developed with accessibility to bikesharing and ride hail services via smartphone ownership. The model results suggest Manhattanites have a value of time of $29/h, consistent with the literature. Smartphone ownership is positively influenced by income and negatively influenced by age, and in turn negatively impacts Citi Bike ridership relative to other modes available. The synthetic population is also applied to analyze two city-scale built environment scenarios: a hypothetical Amazon headquarter deployment and a Citi Bike service expansion. If Amazon succeeded in Long Island City, it would have increased the number of trips to/from that neighborhood by 239%, of which FHVs would grow by over 441% and transit by 294%. It would have led to an increase of peak morning trips from 5000 to 8000. Citi Bike's expansion plan would grow ridership by 92%, and if they were able to expand efficiently throughout NYC this would grow further to 210% over the baseline.
Article
Full-text available
While activity-based travel demand generation has improved over the last few decades, the behavioural richness and intuitive interpretation remain challenging. This paper argues that it is essential to understand why people travel the way they do and not only be able to predict the overall activity patterns accurately. If one cannot understand the ‘‘why?’’ then a model’s ability to evaluate the impact of future interventions is severely diminished. Bayesian networks (BNs) provide the ability to investigate causality and is showing value in recent literature to generate synthetic populations. This paper is novel in extending the application of BNs to daily activity tours. Results show that BNs can synthesise both activity and trip chain structures accurately. It outperforms a frequentist approach and can cater for infrequently observed activity patterns, and patterns unobserved in small sample data. It can also account for temporal variables like activity duration
Technical Report
Full-text available
This paper presents an open-source generalized pipeline for the creation of a synthetic population for the Greater São Paulo Metropolitan Region, entirely based on open data. The pipeline that is first developed and applied to the Île-de-France region is used as a baseline. Using data-driven algorithms, the pipeline creates a path from raw data to the synthetic population and, further, to the final mobility scenario. A definite advantage of this approach is that it enables to easily reproduce not only synthetic populations , but also to reproduce transportation studies. The São Paulo's synthetic population, that comprises of as many agents as there are inhabitants in this area, is created using this framework and then analyzed. All considered indicators suggest that this approach is able to model the population on a high level, even if certain gaps could be filled with additional information.
Article
Full-text available
A large and growing body of evidence suggests fundamental changes are needed in transport systems, to tackle issues such as air pollution, physical inactivity and climate change. Transport models can play a major role in tackling these issues through the transport planning process, but they have historically been focussed on motorised modes (especially cars) and available only to professional transport planners working within the existing paradigm. Building on the principles of open access software, first developed in the context of geographic information systems, this paper develops and discusses the concept of open access transport models, which we define as models that are both developed using open source software and are available to be used by the public without the need for specialist training or the purchase of software licences. We explore the future potential of open access transport models to support the transition away from fossil fuels in the transport sector. We do this with reference to the literature on the use of tools in the planning process, and by exploring an example that is already in use: the ‘Propensity to Cycle Tool’. We conclude that open access transport models can be a leverage point in the planning process due to their ability to provide robust, transparent and actionable evidence that is available to a range of stakeholders, not just professional transport planners. Open access transport models represent a disruptive technology deserving further research and development, by planners, researchers and citizen scientists, including open source software developers and advocacy groups but, in order to fulfil their potential, they will require both financial and policy support from government bodies.
Article
Population synthesis is concerned with the generation of agents for agent-based modelling in many fields, such as economics, transportation, ecology and epidemiology. When the number of attributes describing the agents and/or their level of detail becomes large, survey data cannot densely support the joint distribution of the attributes in the population due to the curse of dimensionality. It leads to a situation where many attribute combinations are missing from the sample data while such combinations exist in the real population. In this case, it becomes essential to consider methods that are able to impute such missing information effectively. In this paper, we propose to use deep generative latent models. These models are able to learn a compressed representation of the data space, which when projected back to the original space, leads to an effective way of imputing information in the observed data space. Specifically, we employ the Wasserstein Generative Adversarial Network (WGAN) and the Variational Autoencoder (VAE) for a large-scale population synthesis application. The models are applied to a Danish travel survey with a feature-space of more than 60 variables and trained and tested using cross-validation. A new metric that applies to the evaluation of generative models in an unsupervised setting is proposed. It is based on the ability to generate diverse yet valid synthetic attribute combinations by comparing if the models can recover missing combinations (sampling zeros) while keeping truly impossible combinations (structural zeros) models at a minimum. For a low-dimensional experiment, the VAE, the marginal sampler and the fully random sampler generate 5%, 21% and 26% more structural zeros per sampling zero when compared to the WGAN. For a high dimensional case, these figures increase to 44%, 2217% and 170440% respectively. This research directly supports the development of agent-based systems and in particular cases where detailed socio-economic or geographical representations are required.
Article
Over the last decades, technological advances have allowed the capturing of travel behaviour at large-scale. Despite the unprecedented volume and the variety of personal mobility data, aggregate Origin-Destination (OD) matrices are still the most widespread means to organise and represent travel demand. Nonetheless, standard ODs cannot adequately capture significant elements affecting travel behaviour such as trip-interdependency and trip-chaining, therefore they are not particularly suitable for travel behaviour analysis at person-level. The currently presented modelling framework enables the in-depth study of personal mobility by firstly combining the trips present in OD matrices into home-based trip-chains (i.e. tours) and subsequently into sequences of activities (activity schedules). The above-mentioned process is completed based on advanced graph-theoretical and combinatorial optimisation concepts. The applicability of the methodology is meticulously verified through a large-scale test case where a set of multi-period, purpose dependent ODs is converted into realistic activity schedules able to incorporate more than 99% of the inputted travel demand. The accurate and highly detailed results showcase the significant potential of the proposed methodology to support the comprehensive analysis of travel behaviour at person level.
Article
An increasing amount of research is dedicated to the consideration of tour formation in freight transportation demand models. While empirical tour formation models so far have been starting from limiting assumptions about the resulting trips, we develop a generalized shipment-based model. We formulate a random utility model embedded in an iterative algorithm to construct tours through the incremental allocation of shipments. It considers different objectives and constraints and acknowledges the difference between commodity, vehicle and location types. Parameters are estimated on a large and comprehensive shipment database. The model reproduces observed tour statistics well for the given set of shipments.
Article
The classical monocentric city model suggests that property prices decrease and transport costs rise with distance to the urban centre, implying that employees face a trade-off between long commutes and high housing costs when making location decisions. Accordingly, some commuters might be forced to take on longer commutes due to rising rents in central locations. In this study, we investigate empirically whether the rental differential between employment centres and residential areas predicts changes in average commuting times. To this end, we consider a gravity model of commuting flows for Ireland over 2011–2016. We present results for Ireland and the metropolitan area of Dublin, which constitutes the largest commuting region in Ireland. The results imply that a 10% rise in rents in employment centres is associated with an up to 0.6 minute rise in one-way daily average commuting times nationally (about 2.2% of the average commute duration).