Article

Applying machine learning approaches to analyze the vulnerable road-users' crashes at statewide traffic analysis zones

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Introduction: In this paper, we present machine learning techniques to analyze pedestrian and bicycle crash by developing macro-level crash prediction models. Methods: We collected the 2010-2012 Statewide Traffic Analysis Zone (STAZ) level crash data and developed rigorous machine learning approach (i.e., decision tree regression (DTR) models) for both pedestrian and bicycle crash counts. To our knowledge, this is the first application of DTR models in the burgeoning macro-level traffic safety literature. Results: The DTR models uncovered the most significant predictor variables for both response variables (pedestrian and bicycle crash counts) in terms of three broad categories: traffic, roadway, and socio-demographic characteristics. Additionally, spatial predictor variables of neighboring STAZs were considered along with the targeted STAZ in both DTR models. The DTR model considering spatial predictor variables (spatial DTR model) were compared without considering spatial predictor variables (aspatial DTR model) and the model comparison results discovered that the prediction accuracy of the spatial DTR model performed better than the aspatial DTR model. Finally, the current research effort contributed to the safety literature by applying some ensemble techniques (i.e. bagging, random forest, and gradient boosting) in order to improve the prediction accuracy of the DTR models (weak learner) for macro-level crash count. The study revealed that all the ensemble techniques performed slightly better than the DTR model and the gradient boosting technique outperformed other competing ensemble techniques in macro-level crash prediction models.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The applicability of the TAZ scheme significantly affects the results of passenger demand analysis and forecasting [2,3], as well as traffic safety [4,5] and the scientific reliability and rationality of transportation planning [6,7]. An effective TAZ division scheme aids in the development of macroscopic traffic prediction models [1,[7][8][9][10][11]. However, a unified standard and relevant regulations for TAZ division are absent both domestically and internationally. ...
... (3) Cell area A: This describes the space area of each cell, where K is the number of cells, as shown in Equation (8). Since square isometric grids are adopted in this study, the area of all cells is equal. ...
Article
Full-text available
Urban rail transit passenger flow forecasting often relies on the traditional “four-step” method, where the division of traffic analysis zones (TAZs) is critical to ensuring prediction accuracy. As the fundamental units for describing trip origins and destinations, TAZs also encompass socioeconomic attributes such as land use, population, and employment. However, traditional TAZs, typically based on administrative boundaries, fail to reflect evolving urban travel behavior, particularly when transit stations are located near TAZ boundaries. Additionally, the emergence of urban big data allows for more refined spatial analyses based on individual travel patterns, addressing the limitations of administrative divisions. This study proposes an innovative TAZ aggregation model based on travel similarity, integrating public transit smart-card data and GIS data from bus networks. First, individual spatiotemporal travel patterns are mapped and discretized in both the spatial and temporal dimensions. Travel characteristic data are then extracted for spatial grid units. The TAZ division problem is defined as a multiobjective optimization problem, including factors such as travel similarity, the homogeneity of travel intensity, the statistical accuracy of the area, geographic information preservation, travel ratio constraints, and shape constraints. Multiple TAZ division schemes are produced and assessed using the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS), resulting in the selection of the optimal scheme. The proposed method is implemented on bus passenger travel data in Beijing, showing that the optimized scheme significantly reduces the number of zones with travel ratios exceeding 10%. Compared with existing schemes, the optimized division yields more uniform distributions of travel ratios, area, and travel density, while significantly minimizing the number of zones with a high travel concentration. These results demonstrate that the proposed method better reflects residents’ actual travel behaviors, offering a notable improvement over traditional approaches. This research provides a novel and practical framework for data-driven TAZ optimization.
... According to Di Pasquale, Fruggiero, Iannone and Miranda (2017), who studied the accident trend over the last three decades, while technical and system reliability has improved over time, human error, the underlying cause of technical failures, has remained unchanged. The organization can minimize disruptions caused by mechanical or technical risk factors by identifying the root cause, performing preventive maintenance, holding toolbox talks, and maintaining asset-operational integrity (Rahman, Abdel-Aty, Hasan, & Cai, 2019;Gould, Ringstad, & van de Merwe, 2012). ...
... In the petroleum industry, several operational issues such as lack of safety inspection (OP 1 ), work under pressure (OP 8 ), unsafe acts (OP 3 ), and improper communication are major concerns. The majority of accidents in the petroleum industry are caused by unsafe acts caused by human error (Rahman et al., 2019;Gould et al., 2012), and because human behavior is unpredictable, increasing training programs and putting up safety barriers can help to reduce human errors. Work under extreme stress (Gangadhari, Murty, & Khanzode, 2020), issues emerging from complex working conditions (Berx, Decré, Morag, Chemweno, & Pintelon, 2022), insufficient standard operating procedures, non-compliance with prescribed procedures (Verma, Khan, Maiti, & Krishna, 2014;Kontogiannis, 2011), and poor communication between workers, supervisors, and management (Hon, Randhawa, Lun, Fairclough, & Rothman, 2023) continue to be major issues for petroleum companies. ...
Article
Introduction: Workplace accidents in the petroleum industry can cause catastrophic damage to people, property, and the environment. Earlier studies in this domain indicate that the majority of the accident report information is available in unstructured text format. Conventional techniques for the analysis of accident data are time-consuming and heavily dependent on experts' subject knowledge, experience, and judgment. There is a need to develop a machine learning-based decision support system to analyze the vast amounts of unstructured text data that are frequently overlooked due to a lack of appropriate methodology. Method: To address this gap in the literature, we propose a hybrid methodology that uses improved text-mining techniques combined with an un-bias group decision-making framework to combine the output of objective weights (based on text mining) and subjective weights (based on expert opinion) of risk factors to prioritize them. Based on the contextual word embedding models and term frequencies, we extracted five important clusters of risk factors comprising more than 32 risk sub-factors. A heterogeneous group of experts and employees in the petroleum industry were contacted to obtain their opinions on the extracted risk factors, and the best-worst method was used to convert their opinions to weights. Conclusions and Practical Applications: The applicability of our proposed framework was tested on the data compiled from the accident data released by the petroleum industries in India. Our framework can be extended to accident data from any industry, to reduce analysis time and improve the accuracy in classifying and prioritizing risk factors.
... With the rapid development of machine learning techniques and the increase of data accumulations, it becomes popular with applying machine learning to handle transportationrelated problems. Compared to conventional statistical and econometric DCMs, fewer requirements on the pre-defined assumptions about the relationships between outcomes of injury severity and contributing factors as well as higher prediction accuracy are important advantages of machine learning methods as a non-parametric method (Gong et al., 2019;Rahman et al., 2019). The spatial correlation and temporal variation have attracted growing interests in crash studies (Wang et al., 2016;Xu et al., 2020;Zeng et al., 2021). ...
... It should be pointed out that the traffic safety-related studies with the identifications of contributing factors in traffic accidents are essentially multiclass classification problems. Among all the machine learning methods implemented in the field of transportation safety research, decision/binary tree-based models would be the most popular and appropriate techniques (Pande et al., 2010;Chang & Chien, 2013;Rahman et al., 2019). Hence, applying tree-based machine learning methods in exploring the issues of pedestrian safety within the transportation system is worth of study. ...
Article
Pedestrians might face more dangers and sustain severer injuries in crashes than others. Also, the crash data has inherent patterns related to both space and time. Crashes that happened in locations with highly aggregated uptrend patterns should be worth exploring to examine the most recently deteriorative factors affecting pedestrian-injury severities in crashes. Therefore, applying proper modeling approaches is needed to identify the causes of pedestrian-vehicle crashes to improve pedestrian safety. In this study, an emerging hotspot analysis is firstly utilized to identify the most targeted hotspots, followed by a proposed XGBoost model that analyzes the most recently deteriorative factors affecting pedestrian injury severities. The overall accuracy of the best model on the hotspot dataset is 94.49%, which shows a relatively high performance compared to conventional models. Seven factors are identified to increase the likelihood of fatal injury, including “land development: farm, wood and pasture” (FWP), “interstate”, “US route”, “hit and run”, “alcohol-impaired driver” (AID), “urban”, and “alcohol-impaired-pedestrian”. While for incapacitating injury, there are five significant factors including “work zone”, “interstate”, “US route”, “curved roadway” and “alcohol-impaired-pedestrian”. The results of this research could give a solid reference for the identification of contributing factors affecting pedestrian-injury severities to policymakers and researchers.
... The study developed an improved clustering algorithm to enhance prediction accuracy and significantly improved prediction accuracy [48]. However, the application of machine learning approaches in analysing crash severity outcomes for VRU is limited, and very few studies have focused on VRU crash severity analysis [35,36,49]. These studies were also limited by the number of explanatory variables considered in existing machine learning-based crash severity modelling techniques. ...
... VRU Crash severity was predicted in a study using decision tree and ensemble prediction models for bicyclist and pedestrians, where the study found significant prediction improvement from ensemble techniques [49]. For motorcyclist crash severities in Ghana, an analogous machine learning algorithm was demonstrated and compared with a multinomial logit model. ...
Article
Full-text available
Road crash fatality is a universal problem of the transportation system. A massive death toll caused annually due to road crash incidents, and among them, vulnerable road users (VRU) are endangered with high crash severity. This paper focuses on employing machine learning-based classification approaches for modelling injury severity of vulnerable road users—pedestrian, bicyclist, and motorcyclist. Specifically, this study aims to analyse critical features associated with different VRU groups—for pedestrian, bicyclist, motorcyclist and all VRU groups together. The critical factor of crash severity outcomes for these VRU groups is estimated in identifying the similarities and differences across different important features associated with different VRU groups. The crash data for the study is sourced from the state of Queensland in Australia for the years 2013 through 2019. The supervised machine learning algorithms considered for the empirical analysis includes the K-Nearest Neighbour (KNN), Support Vector Machine (SVM) and Random Forest (RF). In these models, 17 distinct road crash parameters are considered as input features to train models, which originate from road user characteristics, weather and environment, vehicle and driver condition, period, road characteristics and regions, traffic, and speed jurisdiction. These classification models are separately trained and tested for individual and unified VRU to assess crash severity levels. Afterwards, model performances are compared with each other to justify the best classifier where Random Forest classification models for all VRU modes are found to be comparatively robust in test accuracy: (motorcyclist: 72.30%, bicyclist: 64.45%, pedestrian: 67.23%, unified VRU: 68.57%). Based on the Random Forest model, the road crash features are ranked and compared according to their impact on crash severity classification. Furthermore, a model-based partial dependency of each road crash parameters on the severity levels is plotted and compared for each individual and unified VRU. This clarifies the tendency of road crash parameters to vary with different VRU crash severity. Based on the outcome of the comparative analysis, motorcyclists are found to be more likely exposed to higher crash severity, followed by pedestrians and bicyclists.
... For instance, a wide range of bicycle safety analyses have been conducted using GBDTs and RFs due to their outstanding predictive performance and generalizability Li et al., 2020;Bi et al., 2023). Rahman et al. (2019) used GBDT to predict pedestrian and bicycle crashes at a macro level. Using the RFs method, Zhu et al. (2023) explored the contributing factors to bicycle-involved crashes. ...
Article
Statistical modeling and data-driven studies on bicycle accidents are widespread, however, explanations of the underlying mechanisms remain limited, particularly regarding the impact of key risk factors on the bicycle crash frequency across different crash severities. This study aims to examine the effects of various risk factors on the frequency of bicycle crashes using Random Forest and Shapley Additive Explanations (RF-SHAP), taking into account the different crash severity levels. Data from three years of London crash data (2017 to 2019) is utilized. Population demographics, land use, road infrastructure, and traffic flows, are collected in Greater London. In addition to providing superior predictive accuracy, our proposed method identified critical risk factors at different levels of severity associated with bicycle crashes. The distinct contribution of this study is the identification of the primary factors influencing the severity of bicycle collisions in London through the use of RF-SHAP. The study quantifies both the main and interactive effects of various severity risk factors on bicycle collisions. Results suggest that the proportion of building areas and population density are most critical to bicycle crash numbers in different severity levels. Also, the interaction effects of the risk factors on bicycle crashes are revealed. Specifically, results reveal a negative correlation between traffic flow and overall bicycle crash frequency when the average road network connectivity is below 2.25. After controlling the population density, the proportion of residential areas shows a three-stage pattern of influence on the slight injury crash frequency. Furthermore, a boundary value of 6.3 is identified for the safety impact of road density on fatal and severely-injured bicycle crashes. Study findings should provide insights into cost-effective safety countermeasures for bicycle infrastructures, traffic controls, and safety education. Bicycle safety can be improved through these measures over the long term.
... The study was conducted on FARS dataset during 2007. Rahman et al., (2019) used machine learning techniques to analyze the risk of pedestrian and bicycle accidents by considering various traffic, roadway, and socio-demographic characteristics. They applied DTR modeling analysis and spatial predictor variables to improve the accuracy of neighboring STAZs assessment. ...
Article
Full-text available
This study aims to explore the use of machine learning algorithms in predicting the likelihood of road traffic accidents in Katsina State, Nigeria, with the goal of reducing the high rate of road traffic accidents and associated loss of lives and properties. The study collects and analyzes data on road traffic accidents in Katsina State, including the number of accidents, location, time of day, and the type of vehicles involved. Machine learning algorithms such as Decision Trees, Random Forest, and K-Nearest Neighbors, are trained using the data to predict the likelihood of road traffic accidents. The study also identifies the factors that contribute to road traffic accidents in Katsina State by using techniques such as feature selection and correlation analysis, to identify the most important variables. The findings can be used by stakeholders, including the government, law enforcement agencies, and road safety organizations, to develop and implement effective strategies to reducing road traffic accidents in Katsina State. The study utilizes data collected from the Federal Road Safety Corps database in Katsina, Nigeria, and employs data cleaning and feature selection techniques to improve data quality. The Random Forest algorithm achieved the predictive value as 85% while K-Nearest Neighbors (KNN) and Decision Tree algorithms yielded 17% and 42% respectively.
... Cycling safety is a multifaceted issue that requires innovative approaches. Commonly used methods like surveys and accident data analysis may have several limitations in addressing all cycling safety issues ( Rahman et al., 2019 ). In contrast, the implementation of simulators in research offers the opportunity to study several traffic scenarios safely and under control ( Godley et al., 2002 ;Meuleners and Fraser, 2015 ;O'Hern et al., 2017 ). ...
... Count-based regression has been implemented for modelling crash counts, for instance by applying decision tree regression (DTR) in VRU crashes using data from Florida, USA [48]. In addition to traditional predictor features, spatial features were of nearby traffic analysis zones were incorporated, improving the tree performance. ...
Chapter
The various rapid advances in computer systems and new algorithms have applications in all aspects of transport, and road safety could not be an exception. Artificial Intelligence (AI)-based modelling has become more accessible and approachable, and road safety experts have been acquiring new knowledge on how to enhance road safety in the examined networks using increasingly sophisticated algorithms. Simultaneously, data collection is becoming more affordable, seamless, voluminous and multifaceted, departing from traditional, rare road safety indicators, such as crashes and casualties, to an array of surrogate safety measures with high recording frequency, such as harsh events, Time-To-Collision or a wealth of driver behavior data. The aim of the present chapter is to provide an overview of current progress. Specifically, aspects of AI for (i) modelling road user behavior, (ii) modelling network-level performance and (iii) conducting in-depth crash analysis are discussed. Promising research directions to be explored in the imminent future are presented as well, in the form of high-impact feature engineering, crash and injury causality analyses and ethical AI applications.
... In addition, the importance of factors is crucial for injury severity, as it can inform policy prioritization [15]. For instance, Panicker and Ramadurai [16] used RF and Conditional Inference Forest to model the injury severities of two-wheeler crashes and explore the crucial factors. ...
Article
Full-text available
Road traffic safety is an essential component of public safety and a globally significant issue. Pedestrians, as crucial participants in traffic activities, have always been a primary focus with regard to traffic safety. In the context of the rapid advancement of intelligent transportation systems (ITS), it is crucial to explore effective strategies for preventing pedestrian fatalities in pedestrian–vehicle crashes. This paper aims to investigate the factors that influence pedestrian injury severity based on pedestrian-involved crash data collected from several sensor-based sources. To achieve this, a hybrid approach of a random parameters logit model and random forest based on the SHAP method is proposed. Specifically, the random parameters logit model is utilized to uncover significant factors and the random variability of parameters, while the random forest based on SHAP is employed to identify important influencing factors and feature contributions. The results indicate that the hybrid approach can not only verify itself but also complement more conclusions. Eight significant influencing factors were identified, with seven of the factors identified as important by the random forest analysis. However, it was found that the factors “Workday or not” (Not), “Signal control mode” (No signal and Other security facilities), and “Road safety attribute” (Normal Road) are not considered significant. It is important to note that focusing solely on either significant or important factors may lead to overlooking certain conclusions. The proposed strategies for ITS have the potential to significantly improve pedestrian safety levels.
... Most of the published studies have focused on analysis of macroscopic characteristics (e.g. [2]) and classification of pedestrian behaviour (e.g. [3]). ...
Article
Pedestrians comprise a major component of the transport system and their incorporation in the design of future cities is of vital importance. The development of C-ITS systems requires real-time information concerned with the accurate position and kinematics of pedestrians. Therefore, it necessitates the existence of reliable pedestrian simulation models. The objective of this study is the exploration of the existing tools considering modeling techniques and data collection technologies for pedestrian simulation under normal conditions and their interrelations within the framework of pedestrian simulation towards the design and implementation of future simulation approaches. To achieve this, an overview of the main microscopic modeling techniques and data collection methods i.e., emerging positioning technologies is presented, with their potentials and limitations being demonstrated through the existing state-of-the-art. Pedestrian simulation is then approached employing two core elements: conceptual and operational factors. The challenges that these factors entail considering modeling techniques and positioning technologies are elaborated. Last, simulation performance is discussed and future prospects relevant to research needs considering both pedestrian modeling techniques and emerging positioning technologies are proposed.
... To this end, researchers have used machine learning models such as Decision Tree (DT) models, BN models, Neural Network (NN) models, and Support Vector Machines (SVM) for traffic accident rate prediction and analysis and accident risk evaluation, taking advantage of the strengths of machine learning models in dealing with small-sample, nonlinear, and highdimensional traffic accident data [16][17][18]. Although these models have strong predictive and analytical capabilities, they are weak in explaining the differences between risk factors, ignoring the degree of influence of different risk factors on accident risk evaluation, and may also lead to biased estimation of traffic accident risk evaluation. ...
Article
Full-text available
Bicycle safety has emerged as a pressing concern within the vulnerable transportation community. Numerous studies have been conducted to identify the significant factors that contribute to the severity of cyclist injuries, yet the findings have been subject to uncertainty due to unobserved heterogeneity and class imbalance. This research aims to address these issues by developing a model to examine the impact of key factors on cyclist injury severity, accounting for data heterogeneity and imbalance. To incorporate unobserved heterogeneity, a total of 3,895 bicycle accidents were categorized into three homogeneous sub-accident clusters using Latent Class Cluster Analysis (LCA). Additionally, five over-sampling techniques were employed to mitigate the effects of data imbalance in each accident cluster category. Subsequently, Bayesian Network (BN) structure learning algorithms were utilized to construct 32 BN models after pairing the accident data from the four accident cluster types before and after sampling. The optimal BN models for each accident cluster type provided insights into the key factors associated with cyclist injury severity. The results indicate that the key factors influencing serious cyclist injuries vary heterogeneously across different accident clusters. Female cyclists, adverse weather conditions such as rain and snow, and off-peak periods were identified as key factors in several subclasses of accident clusters. Conversely, factors such as the week of the accident, characteristics of the trafficway, the season, drivers failing to yield to the right-of-way, distracted cyclists, and years of driving experience were found to be key factors in only one subcluster of accident clusters. Additionally, factors such as the time of the crash, gender of the cyclist, and weather conditions exhibit varying levels of heterogeneity across different accident clusters, and in some cases, exhibit opposing effects.
... In a study of bicyclists and pedestrians, vulnerable road users' crash severity was predicted using decision tree and ensemble prediction models. The study demonstrated that ensemble approaches improved performance significantly (Rahman et al. 2019). An analogous machine learning approach was developed and compared to a multinomial logit model for motorcycle collision severity in Ghana. ...
Article
Full-text available
Objective: Motorcycle crashes often result in severe injuries on roads that affect people’s lives physically, financially, and psychologically. These injuries could be notably harmful to drivers of all age groups. The main objective of this study is to investigate the risk factors contributing to the severity of crash injuries in different age groups. Methods: This Objective is achieved by developing accurate machine learning (ML) based prediction models. This research examines the relationship between potential risk factors of motorcycle-associated crashes using (ML) and Shapley Additive explanations (SHAP) technique. The SHAP technique further helped interpreting ML methods for traffic injury severity prediction. It indicates the significant non-linear interactions between dependent and independent variables. The data for this study was collected from the Provincial Emergency Response Service RESCUE 1122 for the Rawalpindi region (Pakistan) over three years (from 2017 to 2020). The Synthetic Minority Oversampling Technique (SMOTE) is employed to balance injury severity classes in the pre-processing phase. Results: The results demonstrate that age, gender, posted speed limit, the number of lanes, and month of the year are positively associated with severe and fatal injuries. This research also assesses how the modeling framework varies between the ML and classical statistical methods. The predictive performance of proposed ML models was assessed using several evaluation metrics, and it is found that Catboost outperformed the XGBoost, Random Forest (RF) and Multinomial Logit (MNL) model. Conclusion: The findings of this study will assist road users, road safety authorities, stakeholders, policymakers, and decision-makers in obtaining substantial and essential guidance for reducing the severity of crash injuries in Pakistan and other countries with prevailing conditions.
... As cities continue to grow and transportation demands increase, the potential impact of traffic signal failures becomes even more significant [1]. This study aims to shed light on the multifaceted issues surrounding traffic signal failures and to provide insights into how urban planners, traffic engineers, and policymakers can better prepare for and respond to such events. ...
Preprint
Full-text available
Traffic congestion is a persistent and challenging problem in urban areas, leading to increased travel times, fuel consumption, and environmental pollution. Signalized intersections play a pivotal role in regulating traffic flow, and their efficiency has a direct impact on the overall traffic performance of a city. This study investigates the effect of traffic signal in managing traffic volume and reducing congestion and delays at signalized intersections through a comprehensive analysis of existing research, data collection, and simulations.The research begins by analyzing the traffic characteristics by an installed LiDAR sensor at E Cold Spring Ln – Hillen Rd intersection in Baltimore City, MD. When the signal at this intersection stopped working for some hours during a working day in September 2023, the LiDAR recorded vehicle and pedestrian counts, vehicle-vehicle and vehicle-pedestrian conflicts, and jaywalking events conflicts. The research aims to assess the impact of traffic signal failures on traffic flow, congestion, safety (V2V and V2P conflicts), and the frequency of jaywalking events before, during, and after improper performance of the traffic signal. Furthermore, this study explores the factors influencing traffic signal performance, including traffic demand, geometric layout, pedestrian interactions, and the integration of emerging technologies. The analysis results highlighted the importance of signal control systems existence at this intersection that can adjust signal timing in response to changing the real-time traffic conditions.Reduced congestion, minimized delays, and enhanced traffic flow are observed outcomes, contributing to a more sustainable and efficient urban transportation system. However, it is crucial to consider the trade-offs and challenges associated with traffic signal optimization, such as the potential for increased travel times for certain modes of transportation and the need for ongoing maintenance and updates. In conclusion, this study underscores the pivotal role of traffic signals in managing traffic volume and reducing congestion and delays at signalized intersections. Through evidence-based analysis and innovative signal control strategies, urban planners and transportation authorities can work towards creating more efficient, sustainable, and less congested transportation networks. The insights derived from this research can inform policy decisions and guide the development of future traffic management solutions, ultimately leading to improved quality of life in urban areas.
... As cities continue to grow and transportation demands increase, the potential impact of traffic signal failures becomes even more significant [1]. This study aims to shed light on the multifaceted issues surrounding traffic signal failures and to provide insights into how urban planners, traffic engineers, and policymakers can better prepare for and respond to such events. ...
Preprint
Full-text available
Traffic congestion is a persistent and challenging problem in urban areas, leading to increased travel times, fuel consumption, and environmental pollution. Signalized intersections play a pivotal role in regulating traffic flow, and their efficiency has a direct impact on the overall traffic performance of a city. This study investigates the effect of traffic signal in managing traffic volume and reducing congestion and delays at signalized intersections through a comprehensive analysis of existing research, data collection, and simulations. The research begins by analyzing the traffic characteristics by an installed LiDAR sensor at E Cold Spring Ln-Hillen Rd intersection in Baltimore City, MD. When the signal at this intersection stopped working for some hours during a working day in September 2023, the LiDAR recorded vehicle and pedestrian counts, vehicle-vehicle and vehicle-pedestrian conflicts, and jaywalking events conflicts. The research aims to assess the impact of traffic signal failures on traffic flow, congestion, safety (V2V and V2P conflicts), and the frequency of jaywalking events before, during, and after improper performance of the traffic signal. Furthermore, this study explores the factors influencing traffic signal performance, including traffic demand, geometric layout, pedestrian interactions, and the integration of emerging technologies. The analysis results highlighted the importance of signal control systems existence at this intersection that can adjust signal timing in response to changing the real-time traffic conditions. Reduced congestion, minimized delays, and enhanced traffic flow are observed outcomes, contributing to a more sustainable and efficient urban transportation system. However, it is crucial to consider the trade-offs and challenges associated with traffic signal optimization, such as the potential for increased travel times for certain modes of transportation and the need for ongoing maintenance and updates. In conclusion, this study underscores the pivotal role of traffic signals in managing traffic volume and reducing congestion and delays at signalized intersections. Through evidence-based analysis and innovative signal control strategies, urban planners and transportation authorities can work towards creating more efficient, sustainable, and less congested transportation networks. The insights derived from this research can inform policy decisions and guide the development of future traffic management solutions, ultimately leading to improved quality of life in urban areas.
... According to the diurnal distribution of bicyclist deaths in 2020, approximately 40% of fatalities occurred after 6 PM when the weather turns dark. Considering the importance of bicyclists safety as one of vulnerable road users [15][16][17][18], it is important to investigate accurately the interaction between motorized vehicles and bicyclists on different types of roads. In recent years, various technologies were used to collect traffic data for vehicle-bike interaction e.g., bike or car simulators [1,[19][20][21], video recording or CCTVs [22][23][24], and LIDAR sensors [8,[25][26][27]. ...
Article
Full-text available
Light Detection and Ranging (LiDAR) sensors is capable of recording traffic data including the number of passing vehicles and bicyclists, the speed of vehicles and bicyclists, and the number of conflicts among both road users. In order to collect real-time traffic data and investigate the safety of different road users, a LiDAR sensor was installed at Cold Spring Ln-Hillen Rd intersection in Baltimore city. The frequency and severity of collected real-time conflicts were analyzed and the results highlighted that 122 conflicts were recorded over a 10-month time interval from May 2022 to February 2023. By using an innovative image-processing algorithm, a new safety Measure of Effectiveness (MOE) was proposed so that recognize the critical zones for bicyclist entering each zone. Considering the trajectory of conflicts, the results of analysis demonstrated that conflicts in the northern approach (zone N) are more frequent and severe. Additionally, sunny weather is more likely to cause severe vehicle-bike conflicts.
... The road safety of VRUs has also been studied in the past. Crash prediction models were one of the most commonly used methodologies to reveal patterns and risk factors associated with VRU crashes [12,13,14,15]. Factors such as pedestrians' age and gender, crash time and location, lighting condition, and vehicle maneuver as well as type were indicated as significant contributors to severe VRU crashes [16,17,18]. ...
Conference Paper
div class="section abstract"> Motor vehicle crashes involving child Vulnerable Road Users (VRUs) remain a critical public health concern in the United States. While previous studies successfully utilized the crash scenario typology to examine traffic crashes, these studies focus on all types of motor vehicle crashes thus the method might not apply to VRU crashes. Therefore, to better understand the context and causes of child VRU crashes on the U.S. road, this paper proposes a multi-step framework to define crash scenario typology based on the Fatality Analysis Reporting System (FARS) and the Crash Report Sampling System (CRSS). A comprehensive examination of the data elements in FARS and CRSS was first conducted to determine elements that could facilitate crash scenario identification from a systematic perspective. A follow-up context description depicts the typical behavioral, environmental, and vehicular conditions associated with an identified crash scenario. In addition, hypothesis tests are used to reveal over-represented element conditions that separate a specific crash scenario from others. A case study is given on fatal crashes with a single vehicle and a single-child pedestrian to demonstrate the proposed framework. Insights are obtained on the similarities and more interestingly the differences in the context among crash scenarios. For example, compared to crashes noted with “Non-Motorist Contributing Factors” (actions and/or circumstances that may have contributed to the crash) for child pedestrians, crashes without the type of factors noted were associated with a significantly higher proportion of driver violations charged and/or driving under the influence. When involved in a crash, child pedestrians who failed to yield the right-of-way were significantly more likely to be young teens (13-14 years) while those in the roadway improperly were more likely playing toddlers (1-3 years). We expect the work to serve as a fundamental and practical tool for further examination of crash context and causation, especially those involving children, and to improve their safety traveling on the road. </div
... Commonly used SUs in the study of BE and TSS interactions include the TAZ [19,20], buffer zone [21], and grid [22]. Considering the need for multiscale design of the SU scale and types of this study, we note that once the traditional TAZ division has been completed, it is difficult to change. ...
Article
Full-text available
Spatially aggregated data are prone to the effects of the modifiable areal unit problem (MAUP), which applies to built environments and traffic data. Although various studies have been carried out to explore the impact of built environment factors on traffic systems, few have considered MAUPs, which may result in statistical inconsistency. The purpose of this study is to assess the effects of MAUPs on statistical variables and geographically weighted regression results when evaluating the influence of the built environment on the traffic system state. Fifty sets of spatial configurations were created using the different aggregation criteria. The variance inflation factor and spatial autocorrelation of the variables, as well as the R2 and root mean squared error of the GWR model, were used to assess the MAUP effect. The results show that the index variation is more dependent on the scale of the spatial unit than on zoning type. In the case study presented, based on the available dataset, the optimal spatial unit size for analyzing the influence of the built environment on Jinan’s traffic system was 900 m × 900 m.
... Several studies have explored the various factors of road traffic accidents and their impact on the risk of fatal traffic accidents [5,6], and most used statistical methods, such as polynomial logistic [7] and logistic regression models [8]. Statistical methods offer an advantage in that they can assess the correlations between potential factors and accident severity levels, but the statistical regression model must follow certain assumptions, such as normal distribution. ...
Article
Full-text available
Road accidents are one of the primary causes of death worldwide; hence, they constitute an important research field. Taiwan is a small country with a high-density population. It particularly has a considerable number of locomotives. Furthermore, Taiwan’s traffic accident fatality rate increased by 23.84% in 2019 compared with 2018, primarily because of human factors. Road safety has long been a challenging problem in Taiwanese cities. This study collected public data pertaining to traffic accidents from the Taoyuan city government in Taiwan and generated six datasets based on the various accident frequencies at the same location. To find key attributes, this study proposes a three-stage dimension reduction to filter attributes, which includes removing multicollinear attributes, the integrated attribute selection method, and statistical factor analysis. We applied five rule-based classifiers to classify six different frequency datasets and generate the rules of accident severity. The order of top ten key attributes was hit vehicle > certificate type > vehicle > action type > drive quality > escape > accident type > gender > job > trip purposes in the maximum accident frequency CF ≥ 10 dataset. When locomotives, bicycles, and people collide with other locomotives or trucks, injury or death can easily occur, and the motorcycle riders are at the highest risk. The findings of this study provide a reference for governments and stakeholders to reduce the road accident risk factors.
... There is a growing body of research on the analysis of road crash severity using ML techniques. In these studies, ML techniques are mainly used for the prediction of crash severity, including artificial neural networks [10][11][12][13][14] , tree-based models 4,5,8,[15][16][17][18][19][20][21][22][23][24][25] , Support Vector Machines (SVM), K-Nearest Neighbour (KNN), clustering methods, Naïve Bayes (NB), and hybrid models 3,20,[26][27][28][29][30] . Heuristic and metaheuristic methods are also used in a few studies, and their results have been compared with machine learning-based models 31,32 . ...
Article
Full-text available
Crash severity models play a crucial role in evaluating the influencing factors in the severity of traffic crashes. In this study, Extremely Randomised Tree (ERT) is used as a machine learning technique to analyse the severity of crashes. The crash data in the province of Khorasan Razavi, Iran, for a period of 5 years from 2013 to 2017, is used for crash severity model development. The dataset includes traffic-related variables, vehicle specifications, vehicle movement, land use characteristics, temporal characteristics, and environmental variables. In this paper, Feature Importance Analysis (FIA), Partial Dependence Plots (PDP), and Individual Conditional Expectation (ICE) plots are utilised to analyse and interpret the results. According to the results, the involvement of vulnerable road users such as motorcyclists and pedestrians alongside traffic-related variables are among the most significant variables in crash severity. Results show that the presence of motorcycles can increase the probability of injury crashes by around 30% and almost double the probability of fatal crashes. Analysing the interaction of PDPs shows that driving speeds above 60 km/h in residential areas raises the probability of injury crashes by about 10%. In addition, at speeds higher than 70 km/h, the presence of pedestrians approximately increases the probability of fatal crashes by 6%.
... Overall, traffic safety research has been addressed following statistical approaches. However, different authors have shown that statistical models may lead to inaccurate predictions of fatality probabilities if the prespecified model hypotheses and the underlying correlation between the independent and dependent features of the models are not valid (Chang & Chen, 2005;Rahman et al., 2019). ML techniques stand out as an alternative to statistical methods because, unlike statistical models, ML models are nonparametric approaches that do not need any predefined causal relationships between independent and dependent features (Gong et al., 2019). ...
Article
Full-text available
Purpose: One of the leading causes of violent fatalities around the world is road traffic collisions, and pedestrians are among the most vulnerable road users with respect to such incidents. Since walking is highly promoted in urban areas to alleviate motor-vehicle externalities, it is paramount to understand the causes associated with vehicle-pedestrian collisions and their severity to provide safe environments. Although traffic enforcement cameras can address vehicle-vehicle collisions, little is known about their effectiveness with respect to vehicle-pedestrian incidents. Methodology: In this study, we trained a set of machine learning models to forecast if a vehicle-pedestrian collision will turn into an injury or fatality, and the most suitable model was used to investigate the contributing features associated with such events with emphasis on the impact of traffic enforcement cameras. In addition to traffic enforcement camera proximity, features associated with the collision, weather, vehicle, victim, and infrastructure are included in the model to reduce unobserved heterogeneity. Results: Results show that a Linear Discriminant Analysis model surpasses other machine learning models considering the evaluation metrics. Results reveal that the age and gender of the victim, the involvement of larger vehicles in the collision, and the quality of the illumination are the causes associated with pedestrian fatalities. On the other hand, involvement of motorcycles and collisions that occurred in densely populated locations are the causes associated with pedestrian injuries. Conclusions: This investigation demonstrates how to articulate machine learning into a vehicle-pedestrian crash analysis to understand the direction and magnitude of covariates in the corresponding severity outcome. Furthermore, it highlights the remarkable effect that traffic enforcement cameras and other features have on vehicle-pedestrian crash severity. These results provide actionable guidance for educational campaigns, enhanced traffic engineering, and infrastructure improvements that could be implemented in the analyzed region to provide safer transportation.
... It should be noted, however, that in many cases the limitation of the conducted research was the possibility/inability to obtain detailed or specific data for the analysis. On the other hand, taking into account the mathematical apparatus used in the analysis, it should be stated that despite the research work with the use of machine learning models to identify significant factors affecting bicyclists' injuries [26][27][28], the dominant works are those in which the authors used in their analyzes discrete choice models (i.e., logit models [29,30], mixed logit models [16,31], probit models [7], or polynomial logit models [32][33][34][35]). Discrete choice models are more often selected for analysis of factors affecting bicyclists' injury, as the model inputs, as well as the outputs, are usually discrete. ...
Article
Full-text available
Transportation and technological development have for centuries strongly influenced the shaping of urbanized areas. On one hand, it undoubtedly brings many benefits to their residents. However, also has a negative impact on urban areas and their surroundings. Many transportation and technological solutions lead, for example, to increased levels of pollution, noise, excessive energy use, as well as to traffic accidents in cities. So, it is important to safe urban development and sustainability in all city aspects as well as in the area of road transport safety. Due to the long-term policy of sustainable transport development, cycling is promoted, which contributes to the increase in the number of this group of users of the transport network in road traffic for short-distance transport. On the one hand, cycling has a positive effect on bicyclists’ health and environmental conditions, however, a big problem is an increase in the number of serious injuries and fatalities among bicyclists involved in road incidents with motor vehicles. This study aims to identify factors that influence the occurrence and severity of bicyclist injury in bicyclist-vehicle crashes. It has been observed that the factors increasing the risk of serious injuries and deaths of bicyclists are: vehicle driver gender and age, driving under the influence of alcohol, exceeding the speed limit by the vehicle driver, bicyclist age, cycling under the influence of alcohol, speed of the bicyclist before the incident, vehicle type (truck), incident place (road), time of the day, incident type. The obtained results can be used for activities aimed at improving the bicyclists’ safety level in road traffic in the area of analysis.
... Beshah and Hill (2010) compared different classification models and concluded that CART models provide both theoretical and applied advantages over parametric models. All of these studies indicate the usefulness of data driven and data mining techniques in transportation studies (Pande & Abdel-Aty, 2009;Montella et al., 2012;De Oña et al., 2013;Khan et al., 2017;Rahman et al., 2019;Mokhtari et al., 2021;Morshed et al., 2021;Saha et al., 2020). ...
Article
Full-text available
Wrong-way driving (WWD) crashes result in more fatalities per crash, involve more vehicles, and cause extended road closures compared to other types of crashes. Previous studies have used descriptive and parametric statistical models to identify factors that affect WWD crash severity on limited access facilities. This study adopted a combination of non-parametric data mining techniques aiming to recognize the pattern of contributing factors that affect the WWD crash severity on non-limited access facilities. These non-parametric methods can handle heterogeneity in crash datasets well. In this study, hierarchical clustering was used to divide the crash dataset into homogeneous clusters. A random forests analysis was used to select important variables, and decision trees and decision rules were generated to show the underlying pattern and interactions between different factors that affect WWD crash severity. The analysis was based on 1,475 WWD crashes that occurred on arterial streets from 2012-2016 in Florida. Results show that head-on collisions, weekend days, high-speed facilities, crashes involving vehicles entering from a driveway, dark-not lighted roadways, older drivers, and driver impairment are important factors that play a crucial role in WWD crash severity on non-limited access facilities.
... In the area of transportation, the CART method has been applied to study the utility factors of plug-in hybrid electric vehicles [57], to explore causes and effects of automated vehicle disengagement [58], and in the development of models for vehicular traffic noise prediction [59]. It has also been widely used to study road safety, as shown in the summary presented by [60], which cites 14 studies related to traffic accidents. ...
Article
Full-text available
Knowledge of the kilometers traveled by vehicles is essential in transport and road safety studies as an indicator of exposure and mobility. Its application in the determination of user risk indices in a disaggregated manner is of great interest to the scientific community and the authorities in charge of ensuring road safety on highways. This study used a sample of the data recorded during passenger vehicle inspections at Vehicle Technical Inspection stations and housed in a data warehouse managed by the General Directorate for Traffic of Spain. This study has three notable characteristics: (1) a novel data source is explored, (2) the methodology developed applies to other types of vehicles, with the level of disaggregation the data allows, and (3) pattern extraction and the estimate of mobility contribute to the continuous and necessary improvement of road safety indicators and are aligned with goal 3 (Good Health and Well-Being: Target 3.6) of The United Nations Sustainable Development Goals of the 2030 Agenda. An Operational Data Warehouse was created from the sample received, which helped in obtaining inference values for the kilometers traveled by Spanish fleet vehicles with a level of disaggregation that, to the knowledge of the authors, was unreachable with advanced statistical models. Three machine learning methods, CART, random forest, and gradient boosting, were optimized and compared based on the performance metrics of the models. The three methods identified the age, engine size, and tare weight of passenger vehicles as the factors with greatest influence on their travel patterns.
Article
Full-text available
This study applied 2019 macro-level data from DATASUS to model traffic fatalities at the scene. Ordinary least squares (OLS) and censored regression models (TOBIT) were the methodologies used to identify the significant variables explaining the occurrence of deaths on public roads due to crashes. The number of fatalities on public roadways was then modeled using a multilayer perceptron artificial neural network employing the significant variables as predictors according to the generalization capacity of complex predictive models. The OLS and TOBIT findings indicated that the variables motorcycles and scooters per capita, municipal human development index, and number of SUS emergency units were the most important for modeling traffic fatalities at the scene at the national and regional levels. Applying these variables, the neural network's best results achieved a hit rate of 88% for Brazil and 95% for the Northeast model. The contribution of this study is providing an approach combining various methods and considering a range of variables influencing traffic fatalities at the scene. The findings offer insights for policymakers, researchers, and practitioners involved in road safety initiatives, mainly where crash data are scarce, and macro-level analysis is necessary.
Conference Paper
Full-text available
Road crashes represent a significant public health concern, causing immense human suffering and economic losses worldwide. Vulnerable road users, such as pedestrians, cyclists, three-wheelers and motorcyclists, account for half of all deaths on roadways. Among them, pedestrians alone contribute to more than half of all fatalities, highlighting the severity of the dangers they face. Crash prediction studies play an important role in identifying high-risk areas and implementing measures to mitigate pedestrian crashes, thus actively contributing to the reduction of pedestrian fatalities and injuries on roadways. Through a comprehensive literature review, it was found that during the 1960s, pedestrian safety studies gained prominence, indicating a significant shift in recognizing the importance of pedestrian safety. Statistical models such as Multiple Linear Regression, Poisson Regression, and Negative Binomial Regression were introduced to analyze the relationship between pedestrian crashes and various factors, including traffic volume, road characteristics, and environmental variables. Moreover, recent advancements in machine learning have enhanced the accuracy and predictive capabilities of pedestrian crash prediction models. The present review paper explains a comparative analysis between statistical modeling and machine learning modeling as applied to crash prediction modeling. Machine learning models, adept at handling large and complex datasets, generally offer higher predictive accuracy compared to statistical models, which are more suitable for smaller datasets and understanding causal relationships. In both statistical modeling and machine learning approaches, common parameters such as traffic volume, road characteristics, and environmental variables are utilized for pedestrian crash prediction, while the consideration of pedestrian behaviour factors remains limited.
Article
Pedestrian crashes represent a critical traffic safety issue, often resulting in fatal outcomes and raising significant equity concerns. This study analyzed detailed records of pedestrian-involved crashes in California from 2018 to 2021, employing a novel clustering framework enhanced by the SHapley Additive exPlanations approach. The proposed method significantly enhanced interpretability by effectively capturing complex non-linear relationships and interactions among features. The results indicate that impairment status and lighting conditions are pivotal in severe crash outcomes, while broader societal and demographic factors are more substantially associated with less severe cases. Non-injury pedestrian crashes tend to occur in less underserved, more resilient communities, whereas fatal crashes are more common in underserved communities with poor lighting and incomplete pedestrian infrastructure, particularly when pedestrians are under the influence of drugs or alcohol. The findings underscore the necessity for developing comprehensive safety measures that not only address situational risks but also consider broader societal conditions.
Article
Full-text available
Rear-end crashes are a major type of traffic crash that occur more frequently on the road, leading to a large number of injuries and fatalities each year around the world. Examining the overtaking behaviors and predicting the collision risk probability are essential issues for preventing a rear-end collision risk and improving road safety. To this end, we proposed the rear-end collision model to examine the risks associated with the lagging vehicle (LAV) movements. Specifically, this research aims to develop a model for assessing rear-end collision risks by considering different braking reaction times (BRTs) of the LAV. Firstly, we introduce the deep neural network (DNN) to learn the movements of LAV. Then, a collision-free modeling based on the deep reinforcement model (DRM) is proposed to mitigate the collision risks associated with LAV movements to nearby vehicles thus improving traffic safety. Finally, we incorporate the generalized linear model (GLM)-based BR time with driver’s driving behavior, aiming to identify the driver’s distraction with different vehicle movements. Various performance metrics, such as modified time to collision (MTTC), deceleration rate to avoid collision (DRAC), and post-collision change in velocity (Delta-V), are used to identify rear-end conflicts and to demonstrate the effectiveness of the developed model. The simulation results indicated that the developed model could reduce rear-end collision risks with the leading vehicle (LEV) based on different LAV speeds. Furthermore, the North Carolina (NC) traffic real-time crash data is used to demonstrate the efficacy of the developed model. The results indicated that different traffic conditions, such as driver behaviors, and road and climate conditions influence the severity of rear-end crashes.
Article
This study used a detailed explainable AI for automatic machine learning, AutoGluon, to predict pedestrian injury severity using data collected over five years (2016-2021) in Louisiana. The final dataset includes forty variables related to pedestrian characteristics, environmental circumstances, and vehicle specifications. Pedestrian injury severity was divided into three categories: fatal, injury, and no injury. The novelty of this approach lies in the application of explainable AI (XAI), specifically SHAP (SHapley Additive exPlanations) values, to interpret the AutoML model’s predictions. This combination not only addressed the opaqueness typically associated with AI “black box” models but also illuminated the critical variables influencing pedestrian injury severity outcomes. The results revealed that the weighted ensemble model emerged as top performers, showcasing high accuracy with minimal prediction times, demonstrating the potential of ensemble methods in improving prediction outcomes by integrating the strengths of various individual models. Furthermore, the global and local explainability analyses provided by SHAP values afforded us an in-depth understanding of the variables influencing pedestrian injury severity. This dual-level explanation offered valuable insights into the complex dynamics at play, ranging from pedestrian impairment and driver condition to environmental variables like lighting and weather conditions. These findings underscore the importance of specific variables in crash outcomes, offering actionable intelligence for targeted interventions.
Article
Full-text available
Traditionally, road safety studies have been conducted independently, either at microscopic or macroscopic levels. This study synthesizes existing literature on road safety research conducted at microscopic, macroscopic, and mesoscopic levels using a Systematic Literature Review (SLR). The objective of this research is to examine the advancement in crash prediction methodologies, crash analysis, and the integration of microscopic, macroscopic, and mesoscopic studies over the past two decades to understand the multiscale dynamics of crash occurrence. In addition, bibliometric analysis helps to map social, conceptual, and intellectual collaborations among sources, authors, and institutions. The comprehensive review of the existing literature shows that some analytical advancements in statistical approaches, as well as Machine Learning (ML) and Deep Learning (DL) approaches, have facilitated them to address data complexity issues. In the latter decade, researchers have started to integrate microscopic and macroscopic approaches to have a nuanced and cohesive understanding of the intrinsic relationships among crash contributing factors and to assess the impact of an integrated approach on the model's predictive performance. The bibliometric analysis of published literature revealed distinct clusters, each providing a unique perspective on road safety. The major gaps observed in the systematic review of studies are the lack of consideration of behavioural aspects of road users, the transferability of models between two independent frameworks, as well as across the integrated modelling methodologies. Another significant gap is the lack of a scale of adjacent street networks in mesoscopic studies. Overall, this review provided critical insights into safety studies that focus on distinct resolutions, analytical advancements in modelling methodologies, mapping of scientific collaborations and identifications of research gaps.
Article
Delineating urban land use patterns and the close relation to transport has long been a core research topic, in which most are analyzed at aggregate level from a macroscopic perspective. This study tended to create a new homogeneity-related zone system from the basic grid cells and examining if the improved system outperforms than traditional zone systems when conducting aggregate analysis in transportation. Within ride-hailing schemes, the selection process of drop-off locations of each trip is fully recorded via the big data technologies, which offers a promising way in addressing the spatial aggregation issue. Specifically, on the basis of destination retrieval record (DRR) data, this study explored the potential relevance among them and built homogeneity pairs to aggregate basic grid cells into the homogeneous traffic analysis zones (HTAZs) that are high in inter-zone consistency. The third ring area of Chengdu City as a case study was divided into 260 HTAZs using the proposed partition approach. Newly developed homogeneity-related zone system performs better than traditional systems in terms of geometry and internal consistency. Findings from this study also suggest that the zone partition should maintain internal consistency, namely homogeneity, as much as possible so that the aggregate-level spatial units can reflect and summarize the features of intra-zone individuals. This study demonstrates the mining potential of trip record data within transport systems, and provides a more reliable spatial structure to reveal travel patterns and conduct transportation design.
Article
During rapid growth in non-motorized vehicle (NMV) ownership, crash-oriented assessment methods make biased identification of key traffic safety management areas, leading to unclear analysis of safety problems and limited improvement. To improve NMV regional safety, this study developed an approach to identify hazardous crash and crash-involved rider (CIR) areas and explore the mechanisms of primary macro-level contributing factors by jointly modeling crashes and CIRs. Socio-economic, road network, traffic enforcement, and land-use intensity data were collected as independent variables in 115 towns in Suzhou, China. A Poisson lognormal bivariate conditional autoregressive model (PLN-BCAR) and a proposed four-quadrant classification method based on the potential for safety improvement (PSI) density were developed to identify crash-prone and CIR-prone towns. XGBoost and SHAP (SHapley Additive exPlanations) were applied to examine the importance and effects of contributing factors. Results showed that 49.6% of NMV crashes occurred outside the CIRs’ residence areas. The four-quadrant classification method accurately identified crash-prone and CIR-prone areas rather than crash-determined hot zone identification methods. There were nonlinear relationships between primary contributing factors and key areas. Differences of importance and effects for the contributing factors in different areas provided important insights into reducing crashes and CIRs in those areas; for example, NMV crashes and CIRs were higher in areas with low gross domestic product and high population density, and should be selected to make safety improvements such as traffic safety education at the macro level. The proposed approach can help traffic administrators identify the key areas and contributing factors and provide guidelines for improvement.
Article
Predicting the severity of injuries caused by traffic accidents is an important undertaking because it may lead to establishing regulations increasing road-user safety. Bicyclists are a particularly susceptible category of road users, which is especially troubling considering the environmental, financial, and health benefits of this mode of transportation. As a result, this study aims to apply machine learning to identify risk variables that may result in serious biker injuries in the case of an accident.Machine-learning models make no assumptions about the connections between variables. Hence, it has been argued that machine-learning approaches produce better outcomes than statistical procedures. This study selects the “best” machine-learning classification system from a vast pool of similar algorithms to predict the severity of bicycling injuries. Machine learning allows the system to learn from experience and improve without being too programmed.We first use a variety of feature selection algorithms to identify a list of features related to the accident and the environment that have the greatest impact on the severity of bicyclist injuries. This feature list is then used as input data to various machine-learning algorithms that predict the class of bicyclist injury severity at one of three levels (fatal, serious, and slight). The “best” machine-learning algorithm is identified on the basis of having the highest levels of accuracy, precision, recall, and F1 score. The current models were developed and trained based on Israeli road-traffic accident data from 2009 to 2019, meaning that new models would need to be developed for other geographical locations. In addition, the models would need to be updated to take account of the changing relationships between motorists, bicyclists, and the environment. Nevertheless, the proposed methodology has universal applicability.
Article
This paper studies the frequency of traffic crashes at intersections across Texas by employing Zero-inflated Negative Binomial (ZINB) and Negative Binomial-Lindley (NB-L) generalized linear models, as well as various tree-based machine learning (ML) methods, namely Random Forests (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Bayesian Additive Regression Trees (BART) to predict the frequency of crashes at intersections. Official crash reports from 2010 through 2019 were linked to Texas' over 700,000 intersections. RF provided best prediction performance (using R-square and Root Mean Square Error metrics) while serving well for highly imbalanced crash data (with many zero cases). Sensitivity analysis highlights the practical significance of signalized intersection, annual average daily traffic, number of lanes at intersection approach, and other covariates.
Preprint
Full-text available
Perception time (PT) is a major factor affecting a driver’s ability to detect and recognize a bicycle. Researchers have attempted to enhance the PT and gain insights into why drivers fail to see bicyclists before collisions despite looking, even when they are at a safe driving distance from the bicycle. Previous studies have focused on the detection distance and recognition distance of bicycles within 600 milliseconds (ms). PT is the main factor for avoiding collisions; however, it has been shown that a control bicycle as well as treated bicycles can be detected from greater distances. This study aims to evaluate the early detection and recognition of bicycles owing to the impact of conspicuity treatments such as white stripes on a red background (WRED), a high-visibility jacket (HVJ), reflective tape (RT), and their combinations in order to achieve longer detection and recognition distances under day/night conditions. The detection and recognition distances of WRED tire treatment were compared with those of an HVJ, RT, and their combinations, based on PTs of 250 and 600 ms. The same treatments were applied and compared at the required PT for the safe driving distance of a bicycle. The respondents provided their perceptions based on video surveillance data presented on a computer screen. The detection and recognition distance of WRED treatment combined with an HVJ was significantly greater under all conditions except twilight with car headlights and nighttime with car headlights for a PT of 600 ms. Furthermore, for this combination, the PT was significantly shorter under all conditions except nighttime with car headlights. The effects of gentle self-signaling of a bicycle via the combination of WRED treatment and an HVJ can reduce the PT for detecting a bicycle and increase the detection and recognition distance under all lighting conditions. Passive safety measures based on these results can support drivers, who might otherwise look but fail to see bicyclists in time. In summary, the combination of WRED treatment with an HVJ is strongly recommended to achieve cost-effective self-signaling of a bicycle.
Article
Extreme value theory is the state-of-the-art modelling technique for estimating crash risk from traffic conflicts, with two different sampling techniques, i.e. block maxima and peak-over-threshold, at its core. However, the uncertainty associated with the estimates obtained by these sampling techniques has been too large to enable its widespread practical use. A fundamental reason for this issue is the improper selection of extreme values and a lack of a suitable and efficient sampling mechanism. This study proposes a hybrid modelling framework of machine learning and extreme value theory to estimate crash risk from traffic conflicts with an efficient sampling technique for identifying extremes. More specifically, a machine learning approach replaces the conventional sampling techniques with anomaly detection techniques since an anomaly is a data point that does not conform with the rest of the data, making it very similar to the definition of an extreme value. Six representative machine learning-based unsupervised anomaly detection algorithms have been tested in this study. They include iforest, minimum covariance determinant, one-class support vector machine, k-nearest neighbours, local outlier factor, and connectivity-based outlier factor. The extremes identified by these algorithms are then fitted to extreme value distributions for both univariate and bivariate frameworks. These algorithms were tested on a large set of traffic conflict data collected for four weekdays (6 am to 6 pm) from three four-legged intersections in Brisbane, Australia. Results indicate that the proposed hybrid models consistently outperform the conventional extreme value models, which use block maxima and peak-over-threshold as the underlying sampling technique. Among the sampling algorithms, iforest has been found to perform better than other algorithms in estimating crash risks from traffic conflicts. The proposed hybrid modelling framework represents a methodological advancement in traffic conflict-based crash estimation models and opens new avenues for exploring the possibility of utilising machine learning techniques within the existing traffic conflict techniques.
Article
Full-text available
Introduction: Walking is an active way of moving the population, but in recent years there have been more pedestrian casualties in traffic, especially in developing countries such as Serbia. Macro-level road safety studies enable the identification of influential factors that play an important role in creating pedestrian safety policies. Method: This study analyzes the impact of traffic and infrastructure characteristics on pedestrian accidents at the level of traffic analysis zones. The study applied a geographically weighted regression approach to identify and localize all factors that contribute to the occurrence of pedestrian accidents. Taking into account the spatial correlations between the zones and the frequency distribution of accidents, the geographically Poisson weighted model showed the best predictive performance. Results: This model showed 10 statistically significant factors influencing pedestrian accidents. In addition to exposure measures, a positive relationship with pedestrian accidents was identified in the length of state roads (class I), the length of unclassified streets, as well as the number of bus stops, parking spaces, and object units. However, a negative relationship was recorded with the total length of the street network and the total length of state roads passing through the analyzed area. Conclusion: These results indicate the importance of determining the categorization and function of roads in places where pedestrian flows are pronounced, as well as the perception of pedestrian safety near bus stops and parking spaces. Practical applications: The results of this study can help traffic safety engineers and managers plan infrastructure measures for future pedestrian safety planning and management in order to reduce pedestrian casualties and increase their physical activity.
Article
Millions of car crashes occur annually in the US, leaving tens of thousands of deaths and many more severe injuries and debilitations. Thus, understanding the most impactful contributors to severe injuries in automobile crashes and mitigating their effects are of great importance in traffic safety improvement. This paper develops a hybrid framework involving predictive analytics, explainable AI, and heuristic optimization techniques to investigate and explain the injury severity risk factors in automobile crashes. First, our framework examines various machine learning models to identify the one with the best prediction performance as the base model. Then, it utilizes two popular state-of-the-art explainable AI techniques from the literature (i.e., leave-one-covariate-out and TreeExplainer) and our proposed explanation method based on the variable neighborhood search procedure to construe the importance of the variables. Finally, by applying an information fusion technique, our approach identifies a unified ranking list of the most important variables contributing to severe car crash injuries. Transportation safety planners and policymakers can use our findings to reduce the severity of car accidents, improve traffic safety, and save many lives.
Article
Full-text available
Road traffic crashes cause social, economic, physical and emotional losses. They also reduce operating speed and road capacity and increase delays, unreliability, and productivity losses. Previous crash duration research has concentrated on individual crashes, with the contributing elements extracted directly from the incident description and records. As a result, the explanatory variables were more regional, and the effects of broader macro-level factors were not investigated. This is in contrast to crash frequency studies, which normally collect explanatory factors at a macro-level. This study explores the impact of various factors and the consistency of their effects on vehicle crash duration and frequency at a macro-level. Along with the demographic, vehicle utilisation, environmental, and responder variables, street network features such as connectedness, density, and hierarchy were added as covariates. The dataset contains over 95,000 vehicle crash records over 4.5 years in Greater Sydney, Australia. Following a dimension reduction of independent variables, a hazard-based model was estimated for crash duration, and a Negative Binomial model was estimated for frequency. Unobserved heterogeneity was accounted for by latent class models for both duration and frequency. Income, driver experience and exposure are considered to have both positive and negative impacts on duration. Crash duration is shorter in regions with a dense road network, but crash frequency is higher. Highly connected networks, on the other hand, are associated with longer length but lower frequency.
Article
Understanding locally heterogeneous physical contexts in built environment is of great importance in developing preemptive countermeasures to mitigate pedestrian fatality risks. In this study, we aim to investigate the non-linear relationship between physical factors and pedestrian fatality at a location-specific level using a machine learning approach. The state-of-art machine learning algorithm, eXtreme Gradient Boosting (XGBoost), is employed for a binary classification problem, in which nationwide locations where fatal pedestrian accidents occurred for the years from 2012 to 2019 in Korea serve as positive samples (np = 13,366). For negative samples, locations with no pedestrian accidents are selected randomly to the size that is 10 times larger (nn = 133,660) than positive samples. Fifteen features under the categories of road conditions, road facilities, road networks, and land uses are assigned to both the positive and negative sample locations using Geographic Information System (GIS). A method is proposed to avoid the class imbalance problem, and a final unbiased model is utilized to predict fatal pedestrian risks at the negative sample locations. In addition, Shapley Additive Explanations (SHAP) is introduced to provide a robust interpretation of the XGBoos prediction results. It is shown that 21.6% of the negative sample locations have a probability of fatal pedestrian accidents greater than 0.5 (or 78.4% accuracy). Generally, a road segment that lies in many of the shortest routes in a dense residential area with many lively activities from aligned buildings is a potential spot for fatal pedestrian accidents. However, based on the SHAP interpretation, the relationships between the features and pedestrian fatality are found nonlinear and locally heterogeneous. We discuss the implications of this result has for drafting policy recommendations to reduce pedestrian fatalities.
Article
This study aims to explore the effects of drivers’ visual environment on speeding crashes by using different machine learning techniques. To obtain the data of drivers’ visual environment in the real world, a framework was proposed to obtain the Google street view (GSV) images. Deep neural network and computer vision technologies were applied to obtain the clustering and depth information from the GSV images. To reflect drivers’ visual environment in the real world, the coordinate transformation was conducted, and several visual measures were proposed and calculated. Three different tree-based ensemble models (i.e., random forest, adaptive boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost)) were applied to estimate the number of speeding crashes and the comparison results showed that XGBoost could provide the best data fit. The explainable machine learning method were applied to explore the effects of drivers’ visual environment and other features on speeding crashes. The results validated the visual environment data obtained by the proposed method for the speeding crash analysis. It was suggested that the proportion of trees in the drivers’ view and the proportion of road length with trees could reduce speeding crashes. In addition, the complexity level of drivers’ visual environment was found to increase the crash occurrence. This study provided new insights to obtain the detailed information from GSV images for traffic safety analysis. The findings based on the explainable machine learning could also provide road planners and engineers clear suggestions to select appropriate countermeasures to enhance traffic safety.
Article
This paper presents a study that evaluates the nature of the associations (i.e., linear or non-linear) between built environment variables and pedestrian crash frequency at the census block group level. A machine learning approach, called the componentwise model-based gradient boosting algorithm, was implemented to estimate the nature and effects of sociodemographic, land use, road network, and traffic attributes on pedestrian crashes from Broward and Miami-Dade Counties in Florida. The algorithm provides the flexibility to use different types of base-learners, including but not limited to decision tree (DT), generalized additive model (GAM), and Markov Random Field (MRF). While gradient boosting with DT base-learner has widely been used in safety studies, other base-learners and their performances in crash frequency predictions are yet to be explored. This study compared the performance of DT and GAM base-learners, with an MRF base-learner to account for spatial correlation among analysis units. Models fitted with GAM base-learner were found to perform better than the models fitted with DT base-learner, with several variables showing non-linear and several showing linear or approximately linear correlations with pedestrian crash frequency. The study provides useful insights on how the results can help urban planners and policy makers to optimize pedestrian safety measures.
Article
Full-text available
This study evaluated the effectiveness of connected vehicle (CV) technologies in adverse visibility conditions using microscopic traffic simulation. Traffic flow characteristics deteriorate significantly in reduced visibility conditions resulting in high crash risks. This study applied CV technologies on a segment of Interstate I-4 in Florida to improve traffic safety under fog conditions. Two types of CV approaches (i.e., connected vehicles without platooning (CVWPL) and connected vehicles with platooning (CVPL) were applied to reduce the crash risk in terms of three surrogate measures of safety: the standard deviation of speed, the standard deviation of headway, and rear-end crash risk index (RCRI). This study implemented vehicle-to-vehicle (V2V) communication technologies of CVs to acquire real-time traffic data using the microsimulation software VISSIM. A car-following model for both CV approaches was used with an assumption that the CVs would follow this car-following behavior in fog conditions. The model performances were evaluated under different CV market penetration rates (MPRs). The results showed that both CV approaches improved safety significantly in fog conditions as MPRs increase. To be more specific, the minimum MPR should be 30% to provide significant safety benefits in terms of surrogate measures of safety for both CV approaches over the base scenario (non-CV scenario). In terms of surrogate safety measures, CVPL significantly outperformed CVWPL when MPRs were equal to or higher than 50%. The results also indicated a significant improvement in the traffic operation characteristics in terms of average speed.
Thesis
Full-text available
This thesis presents different data mining/machine learning techniques to analyze the vulnerable road users’ (i.e., pedestrian and bicycle) crashes by developing crash prediction models at the macroscopic level. In this study, we developed data mining approaches (i.e., decision tree regression (DTR) models) for both pedestrian and bicycle crash counts. To the best of the author’s knowledge, this is the first application of DTR models in the growing traffic safety literature at the macro-level. The empirical analysis is based on the Statewide Traffic Analysis Zones (STAZ) level crash count data for both pedestrians and bicyclists from the state of Florida for the years of 2010 to 2012. The model results highlight the most significant predictor variables for pedestrian and bicycle crash counts in term of three broad categories: traffic, roadway, and socio demographic characteristics. Furthermore, spatial predictor variables of neighboring STAZs were utilized along with the targeted STAZ variables in order to improve the prediction accuracy of both DTR models. The DTR model considering spatial predictor variables (spatial DTR model) were compared without considering the spatial predictor variables (aspatial DTR model) and the models comparison results clearly uncovered that the spatial DTR model is superior model compared to aspatial DTR model in terms of prediction accuracy. Finally, this study contributed to the safety literature by applying three ensemble techniques (Bagging, Random Forest, and Boosting) in order to improve the prediction accuracy of weak learner (DTR models) for macro-level crash counts. The model’s estimation results revealed that all the ensemble techniques performed better than the DTR model and the gradient boosting technique outperformed other competing ensemble techniques in macro-level crash prediction models.
Conference Paper
Full-text available
This paper examines the safety benefit of connected vehicles (CV) and the connected vehicles lower level automation (CVLLA) on arterials’ using micro-simulation. Examining the lower level of automation is more realistic in the foreseeable future. This study considered two automated features: automated braking and lane keeping assistance which are available in the market. Driving behaviors of CV and CVLLA were proposed by considering car following models that approximate the decision processes of CV and CVLLA using C++ programming interface in VISSIM. The safety impact of both segment and intersection crash risks were quantified under various market penetration rates (MPRs) of CV and CVLLA based on surrogate safety assessment techniques. Two surrogate measures of safety: Time Exposed Time-to-Collision (TET) and Time Integrated Time-to-Collision (TIT) were considered to explore the segment crash risk and the results suggest that both CV and CVLLA reduce segment crash risk significantly compared to the baseline scenario. Furthermore, the intersection crash risk was evaluated through the number of conflicts extracted from microsimulation using the Surrogate Safety Assessment Model (SSAM). Towards that end, a binary logistic regression model was developed to quantify the crash risk in terms of observed conflicts obtained in the intersection influence areas. The logistic regression results clearly showed that both CV and CVLLA have lower intersection crash risks compared to the base scenario (non-CV scenario). Finally, in terms of both segment and intersection crash risks, CVLLA significantly outperformed CV when MPRs were equal or higher than 60%.
Conference Paper
Full-text available
This paper presents different data mining techniques to analyze the vulnerable road user (i.e., pedestrian and bicycle) crashes by developing crash prediction models at macro-level. In this study, we developed data mining approach (i.e., decision tree regression (DTR) models) for both pedestrian and bicycle crash counts. To author knowledge, this is the first application of DTR models in the growing traffic safety literature at macro-level. The empirical analysis is based on the Statewide Traffic Analysis Zones (STAZ) level crash count data for both pedestrian and bicycle from the state of Florida for the year of 2010 to 2012. The model results highlight the most significant predictor variables for pedestrian and bicycle crash count in terms of three broad categories: traffic, roadway, and socio demographic characteristics. Furthermore, spatial predictor variables of neighboring STAZ were utilized along with the targeted STAZ variables in order to improve the prediction accuracy of both DTR models. The DTR model considering spatial predictor variables (spatial DTR model) were compared without considering spatial predictor variables (aspatial DTR model) and the models comparison results clearly found that spatial DTR model is superior model compared to aspatial DTR model in terms of prediction accuracy. Finally, this study contributed to the safety literature by applying three ensemble techniques (Bagging, Random Forest, and Boosting) in order to improve the prediction accuracy of weak learner (DTR models) for macro-level crash count. The model’s estimation result revealed that all the ensemble technique performed better than the DTR model and the gradient boosting technique outperformed other competing ensemble technique in macro-level crash prediction model.
Article
Full-text available
With the challenges of increasing traffic congestion, the concept of managed lanes (MLs) has been gaining popularity recently as a means to effectively improve traffic mobility. MLs are usually designed to be left-lane concurrent with an at-grade access/exit. Such a design forms weaving segments since it requires vehicles to change multiple general purpose lanes (GPLs) to enter or exit the ML. The weaving segments could have a negative impact on traffic safety in the GPLs. This study provides a comprehensive investigation of the safety impact of different lengths for each lane change maneuver on GPL weaving segments close to the ingress and egress of MLs through two simulation approaches: VISSIM microsimulation and driving simula-tor. The two simulation studies are developed based on traffic data collected from freeway I-95 in Miami, Florida. The results from the two simulation studies support each other. Based on the two simulation studies, it is recommended that 1,000 feet be used as the optimal length for per lane change at the GPLs weaving segments with MLs. The safety impact of traffic volume, variable speed limit control strategies, and drivers' gender and age characteristics are also explored. This study can provide valuable insight for evaluating the traffic performance of freeway weaving segments with the presence of concurrent GPLs and MLs in a highway safety context. It also provides guidelines for future conversion of freeways to include MLs.
Article
Full-text available
Connected vehicles (CV) technology has recently drawn an increasing attention from governments, vehicle manufacturers, and researchers. One of the biggest issues facing CVs popularization associates it with the market penetration rate (MPR). The full market penetration of CVs might not be accomplished recently. Therefore, traffic flow will likely be composed of a mixture of conventional vehicles and CVs. In this context, the study of CV MPR is worthwhile in the CV transition period. The overarching goal of this study was to evaluate longitudinal safety of CV platoons by comparing the implementation of managed-lane CV platoons and all lanes CV platoons (with same MPR) over non-CV scenario. This study applied the CV concept on a congested expressway (SR408) in Florida to improve traffic safety. The Intelligent Driver Model (IDM) along with the platooning concept were used to regulate the driving behavior of CV platoons with an assumption that the CVs would follow this behavior in real-world. A high-level control algorithm of CVs in a managed-lane was proposed in order to form platoons with three joining strategies: rear join, front join, and cut-in joint. Five surrogate safety measures, standard deviation of speed, time exposed time-to-collision (TET), time integrated time-to-collision (TIT), time exposed rear-end crash risk index (TERCRI), and sideswipe crash risk (SSCR) were utilized as indicators for safety evaluation. The results showed that both CV approaches (i.e., managed-lane CV platoons, and all lanes CV platoons) significantly improved the longitudinal safety in the studied expressway compared to the non-CV scenario. In terms of surrogate safety measures, the managed-lane CV platoons significantly outperformed all lanes CV platoons with the same MPR. Different time-to-collision (TTC) thresholds were also tested and showed similar results on traffic safety. Results of this study provide useful insight for the management of CV MPR as managed-lane CV platoons.
Article
Full-text available
Safe, trusted drinking water is fundamental to society. Discolouration is a key aesthetic indicator visible to customers. Investigations to understand discolouration and iron failures in water supply systems require assessment of large quantities of disparate, inconsistent, multidimensional data from multiple corporate systems. A comprehensive data matrix was assembled for a seven year period across the whole of a UK water company (serving three million people). From this a novel data driven tool for assessment of iron risk was developed based on a yearly update and ranking procedure, for a subset of the best quality data. To avoid a ‘black box’ output, and provide an element of explanatory (human readable) interpretation, classification decision trees were utilised. Due to the very limited number of iron failures, results from many weak learners were melded into one high-quality ensemble predictor using the RUSBoost algorithm which is designed for class imbalance. Results, exploring simplicity vs predictive power, indicate enough discrimination between variable relationships in the matrix to produce ensemble decision tree classification models with good accuracy for iron failure estimation at District Management Area (DMA) scale. Two model variants were explored: ‘Nowcast’ (situation at end of calendar year) and ‘Futurecast’ (predict end of next year situation from this year’s data). The Nowcast 2014 model achieved 100% True Positive Rate (TPR) and 95.3% True Negative Rate (TNR), with 3.3% of DMAs classified High Risk for un-sampled instances. The Futurecast 2014 achieved 60.5% TPR and 75.9% TNR, with 25.7% of DMAs classified High Risk for un-sampled instances. The output can be used to focus preventive measures to improve iron compliance.
Article
Full-text available
Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
Article
Full-text available
The concept of boosting emerged from the field of machine learning. The basic idea is to boost the accuracy of a weak classifying tool by combining various instances into a more accurate prediction. This general concept was later adapted to the field of statistical modelling. This review article attempts to highlight this evolution of boosting algorithms from machine learning to statistical modelling. We describe the AdaBoost algorithm for classification as well as the two most prominent statistical boosting approaches, gradient boosting and likelihood-based boosting. Although both appraoches are typically treated separately in the literature, they share the same methodological roots and follow the same fundamental concepts. Compared to the initial machine learning algorithms, which must be seen as black-box prediction schemes, statistical boosting result in statistical models which offer a straight-forward interpretation. We highlight the methodological background and present the most common software implementations.
Article
Full-text available
Pedestrian safety has been a major concern for megacities such as New York City. Although pedestrian fatalities show a downward trend, these fatalities constitute a high percentage of overall traffic fatalities in the city. Data from New York City were used to study the factors that influence the frequency of pedestrian crashes. Specifically, a random parameter, negative binomial model was developed for predicting pedestrian crash frequencies at the census tract level. This approach allows the incorporation of unobserved heterogeneity across the spatial zones in the modeling process. The influences of a comprehensive set of variables describing the sociodemographic and built-environment characteristics on pedestrian crashes are reported. Several parameters in the model were found to be random, which indicates their heterogeneous influence on the numbers of pedestrian crashes. Overall, these findings can help frame better policies to improve pedestrian safety.
Article
Full-text available
Given the current number of road accidents, the aim of many road safety analysts is to identify the main factors that contribute to crash severity. To pinpoint those factors, this paper shows an application that applies some of the methods most commonly used to build decision trees (DTs), which have not been applied to the road safety field before. An analysis of accidents on rural highways in the province of Granada (Spain) between 2003 and 2009 (both inclusive) showed that the methods used to build DTs serve our purpose and may even be complementary. Applying these methods has enabled potentially useful decision rules to be extracted that could be used by road safety analysts. For instance, some of the rules may indicate that women, contrary to men, increase their risk of severity under bad lighting conditions. The rules could be used in road safety campaigns to mitigate specific problems. This would enable managers to implement priority actions based on a classification of accidents by types (depending on their severity). However, the primary importance of this proposal is that other databases not used here (i.e. other infrastructure, roads and countries) could be used to identify unconventional problems in a manner easy for road safety managers to understand, as decision rules.
Article
Full-text available
Gaining a better understanding of the factors that affect the likelihood of a vehicle crash has been an area of research focus for many decades. However, in the absence of detailed driving data that would help improve the identification of cause and effect relationships with individual vehicle crashes, most researchers have addressed this problem by framing it in terms of understanding the factors that affect the frequency of crashes – the number of crashes occurring in some geographical space (usually a roadway segment or intersection) over some specified time period. This paper provides a detailed review of the key issues associated with crash-frequency data as well as the strengths and weaknesses of the various methodological approaches that researchers have used to address these problems. While the steady march of methodological innovation (including recent applications of random parameter and finite mixture models) has substantially improved our understanding of the factors that affect crash-frequencies, it is the prospect of combining evolving methodologies with far more detailed vehicle crash data that holds the greatest promise for the future.
Article
Full-text available
This paper presents an empirical inquiry into the applicability of zero-altered counting processes to roadway section accident frequencies. The intent of such a counting process is to distinguish sections of roadway that are truly safe (near zero-accident likelihood) from those that are unsafe but happen to have zero accidents observed during the period of observation (e.g. one year). Traditional applications of Poisson and negative binomial accident frequency models do not account for this distinction and thus can produce biased coefficient estimates because of the preponderance of zero-accident observations. Zero-altered probability processes such as the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) distributions are examined and proposed for accident frequencies by roadway functional class and geographic location. The findings show that the ZIP structure models are promising and have great flexibility in uncovering processes affecting accident frequencies on roadway sections observed with zero accidents and those with observed accident occurrences. This flexibility allows highway engineers to better isolate design factors that contribute to accident occurrence and also provides additional insight into variables that determine the relative accident likelihoods of safe versus unsafe roadways. The generic nature of the models and the relatively good power of the Vuong specification test used in the non-nested hypotheses of model specifications offers roadway designers the potential to develop a global family of models for accident frequency prediction that can be embedded in a larger safety management system.
Article
Full-text available
During the last two decades changes in vehicle design and increase in the number of the light truck vehicles (LTVs) and vans have led to changes in pedestrian injury profile. Due to the dynamic nature of the pedestrian crashes biomechanical aspects of collisions can be better evaluated in field studies. s: The Pedestrian Crash Data Study, conducted from 1994 to 1998, provided a solid database upon which details and mechanism of pedestrian crashes can be investigated. From 552 recorded cases in this database, 542 patients had complete injury related information, making a meaningful study of pedestrian crash characteristics possible. Pedestrians struck by LTVs had a higher risk (29%) of severe injuries (abbreviated injury scale >/=4) compared with passenger vehicles (18%) (p = 0.02). After adjustment for pedestrian age and impact speed, LTVs were associated with 3.0 times higher risk of severe injuries (95% confidence interval (CI) 1.26 to 7.29, p = 0.013). Mortality rate for pedestrians struck by LTVs (25%) was two times higher than that for passenger vehicles (12%) (p<0.001). Risk of death for LTV crashes after adjustment for pedestrian age and impact speed was 3.4 times higher than that for passenger vehicles (95% CI 1.45 to 7.81, p = 0.005). Vehicle type strongly influences risk of severe injury and death to pedestrian. This may be due in part to the front end design of the vehicle. Hence vehicle front end design, especially for LTVs, should be considered in future motor vehicle safety standards.
Article
Managed lanes (MLs) have been implemented as a vital strategy for traffic management and traffic safety improvement. The majority of previous studies involving MLs have adopted a limited scope of examining the effect of MLs segments as a whole, without considering the safety and operational effects of the design of access to the MLs. In the study, several scenarios were tested using microscopic traffic simulation to determine the optimal access design while taking into consideration accessibility levels and weaving lengths. The studied accessibility levels varied from one to three along the studied network. Both safety (i.e., speed standard deviation, time-to-collision, and conflict rate) and operation (i.e., level of service, average speed, average delay) performance measures were included in the analysis. Tobit models were developed for investigating the factors that affect the safety measures. ANOVA and LOS calculations were used to evaluate traffic operation. The results of the safety and operational analysis suggested that one accessibility level is the optimal option in the nine-mile network. A weaving length between 1,000 feet and 1,400 feet per lane change was suggested based on the safety analysis. In addition, from the operation perspective, a weaving length between 1,000 feet and 2,000 feet per lane change was recommended. The results also showed that off-peak periods had better safety and operational performance (e.g., lower conflict rate, less delay) than peak periods. This study has major implications for improving MLs by recommending the optimal accessibility level and weaving length near access zones.
Article
This paper aims to investigate the safety impact of connected vehicles and connected vehicles with the lower level of automation features under vehicle-to-vehicle (V2V) and infrastructure-to-vehicle (I2V) communication technologies. Examining the lower level of automation is more realistic in the foreseeable future. This study considered two automated features such as automated braking and lane keeping assistance which are widely available in the market with low penetration rates. Driving behavior of connected vehicles (CV) and connected vehicles lower level automation (CVLLA) were modeled in the C++ programming language with considering realistic car following models in VISSIM. To this end, safety impact on both segment and intersection crash risks were explored through surrogate safety assessment techniques under various market penetration rates (MPRs). Segment crash risk was analyzed based on both time proximity-based and evasive action-based surrogate measures of safety: time exposed time-to-collision (TET), time integrated time-to-collision (TIT), time exposed rear-end crash risk index (TERCRI), lane changing conflicts (LCC), and number of critical jerks (NCJ). However, the intersection crash risk was evaluated through the number of conflicts extracted from micro-simulation (VISSIM) using the Surrogate Safety Assessment Model (SSAM). A logistic regression model was also developed to quantify the crash risk in terms of observed conflicts obtained in the intersection influence areas. The results suggest that both CV and CVLLA reduce segment crash risk significantly in terms of the five surrogate measures of safety. Furthermore, the logistic regression results clearly showed that both CV and CVLLA have lower intersection crash risks compared to the base scenario. In terms of both segment and intersection crash risks, CVLLA significantly outperforms CV when MPRs are 60% or higher. Thus, the results indicate a significant safety improvement resulting from implementing CV and CVLLA technologies at both segments and intersections on arterials.
Article
Safety issues at school zones have been an important topic in the traffic safety field. This study assesses the safety effects of different roadway countermeasure at school zones. Although several studies have evaluated the effectiveness of various traffic control devices (e.g., sign, flashing beacon), there is lack of studies proposing innovative operation and engineering countermeasures , which might have significant improvement of safety at school zones. In this study, the most crash-prone school zone is identified based on crash rates in Orange and Seminole Counties in Florida. Afterward, a microsimulation network is built to evaluate different safety countermeasures. Three different countermeasures i.e., two-step speed reduction, decreasing the number of driveways, and replacing the two-way left-turn lane (TWLTL) with raised median are implemented in the microsimulation. Multiple surrogate safety measures are utilized as indicators for safety evaluation. The results show that both two-step speed reduction and decreasing driveway access significantly reduce crash risks compared with the base condition. Moreover, the combination of these two countermeasures outperforms their individual effectiveness. On the other hand, for TWLTL to the raised median, the crash risk is higher than the base condition. The results of this study could help transportation planners and decision makers to understand the effect of these countermeasures prior to implementing them in the real field.
Article
A R T I C L E I N F O Keywords: Multinomial logit fractional split model Traffic crash analysis Macroscopic crash analysis Traffic analysis zones Vehicle type Screening A B S T R A C T In traffic safety literature, crash frequency variables are analyzed using univariate count models or multivariate count models. In this study, we propose an alternative approach to modeling multiple crash frequency dependent variables. Instead of modeling the frequency of crashes we propose to analyze the proportion of crashes by vehicle type. A flexible mixed multinomial logit fractional split model is employed for analyzing the proportions of crashes by vehicle type at the macro-level. In this model, the proportion allocated to an alternative is probabilistically determined based on the alternative propensity as well as the propensity of all other alternatives. Thus, exogenous variables directly affect all alternatives. The approach is well suited to accommodate for large number of alternatives without a sizable increase in computational burden. The model was estimated using crash data at Traffic Analysis Zone (TAZ) level from Florida. The modeling results clearly illustrate the applicability of the proposed framework for crash proportion analysis. Further, the Excess Predicted Proportion (EPP)-a screening performance measure analogous to Highway Safety Manual (HSM), Excess Predicted Average Crash Frequency is proposed for hot zone identification. Using EPP, a statewide screening exercise by the various vehicle types considered in our analysis was undertaken. The screening results revealed that the spatial pattern of hot zones is substantially different across the various vehicle types considered.
Article
A wide array of spatial units has been explored in macro-level modeling. With the advancement of Geographic Information System (GIS) analysts are able to analyze crashes for various geographical units. However, a clear guideline on which geographic entity should be chosen is not present. Macro level safety analysis is at the core of transportation safety planning (TSP) which in turn is a key in many aspects of policy and decision making of safety investments. The preference of spatial unit can vary with the dependent variable of the model. Or, for a specific dependent variable, models may be invariant to multiple spatial units by producing a similar goodness-of-fits. In this study three different crash models were investigated for traffic analysis zones (TAZs), block groups (BGs) and census tracts (CTs) of two counties in Florida. The models were developed for the total crashes, severe crashes and pedestrian crashes in this region. The primary objective of the study was to explore and investigate the effect of zonal variation (scale and zoning) on these specific types of crash models. These models were developed based on various roadway characteristics and census variables (e.g., land use, socioeconomic , etc.). It was found that the significance of explanatory variables is not consistent among models based on different zoning systems. Although the difference in variable significance across geographic units was found, the results also show that the sign of the coefficients are reasonable and explainable in all models. Key findings of this study are, first, signs of coefficients are consistent if these variables are significant in models with same response variables, even if geographic units are different. Second, the number of significant variables is affected by response variables and also geographic units. Admittedly, TAZs are now the only traffic related zone system, thus TAZs are being widely used by transportation planners and frequently utilized in research related to mac-roscopic crash analysis. Nevertheless, considering that TAZs are not delineated for traffic crash analysis but they were designed for the long range transportation plans, TAZs might not be the optimal zone system for traffic crash modeling at the macroscopic level. Therefore , it recommended that other zone systems be explored for crash analysis as well.
Article
Introduction: Macro-level traffic safety analysis has been undertaken at different spatial configurations. However, clear guidelines for the appropriate zonal system selection for safety analysis are unavailable. In this study, a comparative analysis was conducted to determine the optimal zonal system for macroscopic crash modeling considering census tracts (CTs), state-wide traffic analysis zones (STAZs), and a newly developed traffic-related zone system labeled traffic analysis districts (TADs). Method: Poisson lognormal models for three crash types (i.e., total, severe, and non-motorized mode crashes) are developed based on the three zonal systems without and with consideration of spatial autocorrelation. The study proposes a method to compare the modeling performance of the three types of geographic units at different spatial configurations through a grid based framework. Specifically, the study region is partitioned to grids of various sizes and the model prediction accuracy of the various macro models is considered within these grids of various sizes. Results: These model comparison results for all crash types indicated that the models based on TADs consistently offer a better performance compared to the others. Besides, the models considering spatial autocorrelation outperform the ones that do not consider it. Conclusions: Based on the modeling results and motivation for developing the different zonal systems, it is recommended using CTs for socio-demographic data collection, employing TAZs for transportation demand forecasting, and adopting TADs for transportation safety planning. Practical applications: The findings from this study can help practitioners select appropriate zonal systems for traffic crash modeling, which leads to develop more efficient policies to enhance transportation safety.
Article
Cycling is encouraged in countries around the world as an economic, energy efficient, and sustainable mode of transportation. Although there are many studies focusing on analyzing bicycle safety, they have limitations because of the shortage of bicycle exposure data. This study represents a major step forward in estimating safety performance functions for bicycle crashes at intersections by using crowdsourced data from STRAVA. Several adjustments in respect of the population distribution and field observations were made to overcome the disproportionate representation of the STRAVA data. The adjusted STRAVA data which include bicycle exposure information were used as input to develop safety performance functions. The functions are negative binomial models aimed at predicting frequencies of bicycle crashes at intersections. The developed model was compared with three counterparts: the model using the unadjusted STRAVA data, the model using the STRAVA data with field observation data adjustments only, and the model using the STRAVA data with adjusted population. The results revealed that the case of STRAVA data with both population and field observation data adjustments had the best performance in bicycle crash modeling. The results also addressed several key factors (e.g., signal control system, intersection size, bike lanes) which are associated with bicycle safety at intersections. Additionally, the safety-in-numbers effect was acknowledged when bicycle crash rates decreased as bicycle activities increased. The study concluded that crowdsourced data are a reliable source for exploring bicycle safety after the appropriate adjustments.
Article
On freeways, managed lanes (MLs) have emerged as an effective dynamic traffic management strategy. MLs have been successfully implemented as an important facility in improving traffic mobility and generating revenue for transportation agencies. In this study, scenarios were built and tested in microsimulation to specify the safest accessibility level and decide on the safe weaving length near access zones. The findings indicated that the conflict rate on MLs was 48% and 11% lower than that of general purpose lanes (GPLs) in the peak and the off-peak periods, respectively. A log-linear model was developed with estimation of odds multipliers for the conflict frequency analysis. The results suggested that one accessibility level was the safest option in the 14.5-km (9-mi) corridor. The length of 305 m (1,000 ft) per lane change was shown to be the safest weaving length near access zones. Additionally, a weaving length of 183 m (600 ft) per lane change was not recommended. The findings of this study represent a further step toward improving access design of MLs.
Article
Drivers are required to make many rapid decisions at expressway toll plazas, which could result in drivers' confusion, speed variation, and sudden lane change maneuvers. Thus, toll plazas are considered one of the most dangerous segments on expressways. Nevertheless, only a limited number of studies have explored the factors that affect driving behavior and safety at toll plazas. This study assessed driving behavior at a section, including a hybrid toll plaza, on one of the main expressways in Central Florida using a driving simulator. The details of the section and the plaza were accurately replicated in the simulator. Overall, 72 participants were recruited and each driver performed three different scenarios out of a total of 24 scenarios. Subsequently, four behavioral variables were extracted from the experimental data (i.e., average speed, speed variation, standard deviation of lane deviation, and standard deviation of acceleration) to explore risky driving behaviors with various paths, signs, pavement markings, segment lengths, and traffic conditions. In addition, the effects of drivers' individual characteristics on driving behavior were investigated. A series of linear mixed models with random effects to account for multiple observations from the same participant were developed to reveal the contributing factors for driving behavior at toll plazas. The results uncovered that drivers experiencing the open road tolling (ORT) have safer driving behavior than those who use the tollbooth. Also, providing dynamic message sign (DMS) on the ramp before the toll plaza has a significant effect on reducing sudden lane change maneuvers. Adjustment of the locations of overhead signs was also recommended in this study. Moreover, the existence of arrow pavement marking before and after the toll plaza is important for reducing unsafe driving behavior before and after the toll plaza.
Article
This study aims at contributing to the literature on pedestrian and bicyclist safety by building on the conventional count regression models to explore exogenous factors affecting pedestrian and bicyclist crashes at the macroscopic level. In the traditional count models, effects of exogenous factors on non-motorist crashes were investigated directly. However, the vulnerable road users’ crashes are collisions between vehicles and non-motorists. Thus, the exogenous factors can affect the non-motorist crashes through the non-motorists and vehicle drivers. To accommodate for the potentially different impact of exogenous factors we convert the non-motorist crash counts as the product of total crash counts and proportion of non-motorist crashes and formulate a joint model of the negative binomial (NB) model and the logit model to deal with the two parts, respectively. The formulated joint model is estimated using non-motorist crash data based on the Traffic Analysis Districts (TADs) in Florida. Meanwhile, the traditional NB model is also estimated and compared with the joint model. The result indicates that the joint model provides better data fit and can identify more significant variables. Subsequently, a novel joint screening method is suggested based on the proposed model to identify hot zones for non-motorist crashes. The hot zones of non-motorist crashes are identified and divided into three types: hot zones with more dangerous driving environment only, hot zones with more hazardous walking and cycling conditions only, and hot zones with both. It is expected that the joint model and screening method can help decision makers, transportation officials, and community planners to make more efficient treatments to proactively improve pedestrian and bicyclist safety.
Article
To investigate the factors predicting severity of bicycle crashes in Italy, we used an observational study of official statistics. We applied two of the most widely used data mining techniques, CHAID decision tree technique and Bayesian network analysis. We used data provided by the Italian National Institute of Statistics on road crashes that occurred on the Italian road network during the period ranging from 2011 to 2013. In the present study, the dataset contains information about road crashes occurred on the Italian road network during the period ranging from 2011 to 2013. We extracted 49,621 road accidents where at least one cyclist was injured or killed from the original database that comprised a total of 575,093 road accidents. CHAID decision tree technique was employed to establish the relationship between severity of bicycle crashes and factors related to crash characteristics (type of collision and opponent vehicle), infrastructure characteristics (type of carriageway, road type, road signage, pavement type, and type of road segment), cyclists (gender and age), and environmental factors (time of the day, day of the week, month, pavement condition, and weather). CHAID analysis revealed that the most important predictors were, in decreasing order of importance, road type (0.30), crash type (0.24), age of cyclist (0.19), road signage (0.08), gender of cyclist (0.07), type of opponent vehicle (0.05), month (0.04), and type of road segment (0.02). These eight most important predictors of the severity of bicycle crashes were included as predictors of the target (i.e., severity of bicycle crashes) in Bayesian network analysis. Bayesian network analysis identified crash type (0.31), road type (0.19), and type of opponent vehicle (0.18) as the most important predictors of severity of bicycle crashes.
Article
This study contributes to the safety literature on active mode transportation safety by using a copula-based model for crash frequency analysis at a macro level. Most studies in the transportation safety area identify a single count variable (such as vehicular, pedestrian, or bicycle crash counts) for a spatial unit at a specific period and study the impact of exogenous variables. Although the traditional count models perform adequately in the presence of a single count variable, these approaches must be modified to examine multiple dependent variables for each study unit. The presented research developed a multivariate model by adopting a copula-based bivariate negative binomial model for pedestrian and bicycle crash frequency analysis. The proposed approach accommodates potential heterogeneity (across zones) in the dependency structure. The formulated models were estimated with pedestrian and bicycle crash count data at the statewide traffic analysis zone level for the state of Florida for 2010 through 2012. The statewide traffic analysis zone level variables considered in the analysis included exposure measures, socioeconomic characteristics, road network characteristics, and land use attributes. A policy analysis was conducted - along with a representation of hot spot identification - to illustrate the applicability of the proposed model for planning purposes. The development of such spatial profiles allows planners to identify high-risk zones for screening and subsequent treatment identification.
Article
This study attempts to explore the viability of dual-state models (i.e., zero-inflated and hurdle models) for traffic analysis zones (TAZs) based pedestrian and bicycle crash frequency analysis. Additionally, spatial spillover effects are explored in the models by employing exogenous variables from neighboring zones. The dual-state models such as zero-inflated negative binomial and hurdle negative binomial models (with and without spatial effects) are compared with the conventional single-state model (i.e., negative binomial). The model comparison for pedestrian and bicycle crashes revealed that the models that considered observed spatial effects perform better than the models that did not consider the observed spatial effects. Across the models with spatial spillover effects, the dual-state models especially zero-inflated negative binomial model offered better performance compared to single-state models. Moreover, the model results clearly highlighted the importance of various traffic, roadway, and sociodemographic characteristics of the TAZ as well as neighboring TAZs on pedestrian and bicycle crash frequency.
Article
Highway accidents are complex events that involve a variety of human responses to external stimuli, as well as complex interactions between the vehicle, roadway features/condition, traffic-related factors, and environmental conditions. In addition, there are complexities involved in energy dissipation (once an accident has occurred) that relate to vehicle design, impact angles, the physiological characteristics of involved humans, and other factors. With such a complex process, it is impossible to have access to all of the data that could potentially determine the likelihood of a highway accident or its resulting injury severity. The absence of such important data can potentially present serious specification problems for traditional statistical analyses that can lead to biased and inconsistent parameter estimates, erroneous inferences and erroneous accident predictions. This paper presents a detailed discussion of this problem (typically referred to as unobserved heterogeneity) in the context of accident data and analysis. Various statistical approaches available to address this unobserved heterogeneity are presented along with their strengths and weaknesses. The paper concludes with a summary of the fundamental issues and directions for future methodological work that addresses unobserved heterogeneity.
Book
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
Article
Typical engineering research on traffic safety focuses on identifying either dangerous locations or contributing factors through a post-crash analysis using aggregated traffic flow data and crash records. A recent development of transportation engineering technologies provides ample opportunities to enhance freeway traffic safety using individual vehicular information. However, little research has been conducted regarding methodologies to utilize and link such technologies to traffic safety analysis. Moreover, traffic safety research has not benefited from the use of hurdle-type models that might treat excessive zeros more properly than zero-inflated models.This study developed a new surrogate measure, unsafe following condition (UFC), to estimate traffic crash likelihood by using individual vehicular information and applied it to basic sections of interstate highways in Virginia. Individual vehicular data and crash data were used in the development of statistical crash prediction models including hurdle models. The results showed that an aggregated UFC measure was effective in predicting traffic crash occurrence, and the hurdle Poisson model outperformed other count data models in a certain case.
Article
An increasing research effort has been made on spatially disaggregated safety analysis models to meet the needs of region-level safety inspection and recently emerging transportation safety planning techniques. However, without explicitly differentiating exposure variables and risk factors, most existing studies alternate the use of crash frequency, crash rate, and crash risk to interpret model coefficients. This procedure may have resulted in the inconsistent findings in relevant studies. This study proposes a Bayesian spatial model to account for county-level variations of crash risk in Florida by explicitly controlling for exposure variables of daily vehicle miles traveled and population. A conditional autoregressive prior is specified to accommodate for the spatial autocorrelations of adjacent counties. The results show no significant difference in safety effects of risk factors on all crashes and severe crashes. Counties with higher traffic intensity and population density and a higher level of urbanization are associated with higher crash risk. Unlike arterials, freeways seem to be safer with respect to crash risk given either vehicle miles traveled or population. Increase in truck traffic volume tends to result in more severe crashes. The average travel time to work is negatively correlated with all types of crash risk. Regarding the population age cohort, the results suggest that young drivers tend to be involved in more crashes, whereas the increase in elderly population leads to fewer casualties. Finally, it is confirmed that the safety status is worse for more deprived areas with lower income and educational level and higher unemployment rate in comparison with relatively affluent areas.
Article
Many studies have shown that intersections are among the most dangerous locations of a roadway network. Therefore, there is a need to understand the factors that contribute to traffic crashes at such locations. One approach is to classify intersections and quantify the effects that configuration, geometric characteristics, and traffic volume have on the number of crashes at signalized intersections. This paper addresses the different factors that affect crashes, by type of collision, at signalized intersections. It also looks into the quality and completeness of the crash data and the effect that incomplete data have on the final results. Data from multiple sources were cross-checked to ensure the completeness of all crashes, including minor crashes that were usually unreported or were not coded into crash databases. The tree-based regression methodology was adopted in this study to cope with multicollinearity between variables, missing observations, and the fact that the true model form was unknown. The results showed a significant discrepancy in the factors that were found to affect the different collision types and their influence in each model. The two most significant differences in comparison with the total crash model as a base case were found to be in the models of head-on and left-turn crashes. The results also showed that the important factors were relatively consistent for rear-end, right-turn, and sideswipe crashes when minor crashes were considered. However, angle and head-on crashes showed significant changes in the model structure when minor crashes were added to the data set because these types of crashes were less stable. Finally, different roadway characteristics were correlated with different types of crashes.
Article
Aim of the study was the analysis of powered two-wheeler (PTW) crashes in Italy in order to detect interdependence as well as dissimilarities among crash characteristics and provide insights for the development of safety improvement strategies focused on PTWs. At this aim, data mining techniques were used to analyze the data relative to the 254,575 crashes involving PTWs occurred in Italy in the period 2006-2008. Classification trees analysis and rules discovery were performed. Tree-based methods are non-linear and non-parametric data mining tools for supervised classification and regression problems. They do not require a priori probabilistic knowledge about the phenomena under studying and consider conditional interactions among input data. Rules discovery is the identification of sets of items (i.e., crash patterns) that occur together in a given event (i.e., a crash in our study) more often than they would if they were independent of each other. Thus, the method can detect interdependence among crash characteristics. Due to the large number of patterns considered, both methods suffer from an extreme risk of finding patterns that appear due to chance alone. To overcome this problem, in our study we randomly split the sample data in two data sets and used well-established statistical practices to evaluate the statistical significance of the results. Both the classification trees and the rules discovery were effective in providing meaningful insights about PTW crash characteristics and their interdependencies. Even though in several cases different crash characteristics were highlighted, the results of the two the analysis methods were never contradictory. Furthermore, most of the findings of this study were consistent with the results of previous studies which used different analytical techniques, such as probabilistic models of crash injury severity. Basing on the analysis results, engineering countermeasures and policy initiatives to reduce PTW injuries and fatalities were singled out. The simultaneous use of classification trees and association discovery must not, however, be seen as an attempt to supplant other techniques, but as a complementary method which can be integrated into other safety analyses.
Article
This study investigates the effect of spatial correlation using a Bayesian spatial framework to model pedestrian and bicycle crashes in Traffic Analysis Zones (TAZs). Aggregate models for pedestrian and bicycle crashes were estimated as a function of variables related to roadway characteristics, and various demographic and socio-economic factors. It was found that significant differences were present between the predictor sets for pedestrian and bicycle crashes. The Bayesian Poisson-lognormal model accounting for spatial correlation for pedestrian crashes in the TAZs of the study counties retained nine variables significantly different from zero at 95% Bayesian credible interval. These variables were - total roadway length with 35 mph posted speed limit, total number of intersections per TAZ, median household income, total number of dwelling units, log of population per square mile of a TAZ, percentage of households with non-retired workers but zero auto, percentage of households with non-retired workers and one auto, long term parking cost, and log of total number of employment in a TAZ. A separate distinct set of predictors were found for the bicycle crash model. In all cases the Bayesian models with spatial correlation performed better than the models that did not account for spatial correlation among TAZs. This finding implies that spatial correlation should be considered while modeling pedestrian and bicycle crashes at the aggregate or macro-level.
Article
This study presents a classification tree based alternative to crash frequency analysis for analyzing crashes on mid-block segments of multilane arterials. The traditional approach of modeling counts of crashes that occur over a period of time works well for intersection crashes where each intersection itself provides a well-defined unit over which to aggregate the crash data. However, in the case of mid-block segments the crash frequency based approach requires segmentation of the arterial corridor into segments of arbitrary lengths. In this study we have used random samples of time, day of week, and location (i.e., milepost) combinations and compared them with the sample of crashes from the same arterial corridor. For crash and non-crash cases, geometric design/roadside and traffic characteristics were derived based on their milepost locations. The variables used in the analysis are non-event specific and therefore more relevant for roadway safety feature improvement programs. First classification tree model is a model comparing all crashes with the non-crash data and then four groups of crashes (rear-end, lane-change related, pedestrian, and single-vehicle/off-road crashes) are separately compared to the non-crash cases. The classification tree models provide a list of significant variables as well as a measure to classify crash from non-crash cases. ADT along with time of day/day of week are significantly related to all crash types with different groups of crashes being more likely to occur at different times. From the classification performance of different models it was apparent that using non-event specific information may not be suitable for single vehicle/off-road crashes. The study provides the safety analysis community an additional tool to assess safety without having to aggregate the corridor crash data over arbitrary segment lengths.
Article
We reanalyzed data from the Pedestrian Injury Causation Study (PICS) for 1035 urban pedestrian injuries to children and youth less than 20 years of age. Analysis of variance with the Injury Severity Score (ISS) as the dependent variable was used to evaluate variables describing the characteristics of the pedestrian, the vehicle, the driver, and the circumstances under which the collision occurred. The mean injury severity score was 5.6. Nearly 80% of pedestrians had a minor injury, 13% moderate, and 7% severe; 4.5% of these pedestrian were killed. Multivariate analysis revealed that vehicle travel speed greater than 30 mph, pedestrian age less than 5 years, time of day either early morning or late afternoon, residential zone, type of road including collectors and major roads, and center travel lanes were associated with greater severity of injury. Attempts by the driver to avoid the collision by braking or other avoidance maneuvers were associated with reduced injury severity. Even on local streets and in residential zones, nearly 20% of children were struck by vehicles exceeding 30 mph, and these children were injured much more severely than children struck by more slowly moving vehicles.
Article
There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data.
Article
This study analyzes vehicle-pedestrian crashes at intersections in Florida over 4 years, 1999-2002. The study identifies the group of drivers and pedestrians, and traffic and environmental characteristics that are correlated with high pedestrian crashes using log-linear models. The study also estimates the likelihood of pedestrian injury severity when pedestrians are involved in crashes using an ordered probit model. To better reflect pedestrian crash risk, a logical measure of exposure is developed using the information on individual walking trips in the household travel survey. Lastly, the impact of average traffic volume on pedestrian crashes is examined. As a result of the analysis, it was found that pedestrian and driver demographic factors, and road geometric, traffic and environment conditions are closely related to the frequency and injury severity of pedestrian crashes. Higher average traffic volume at intersections increases the number of pedestrian crashes; however, the rate of increase is steeper at lower values of average traffic volume. Based on the findings in the analysis, some countermeasures are recommended to improve pedestrian safety.
Article
Statistical models, such as Poisson or negative binomial regression models, have been employed to analyze vehicle accident frequency for many years. However, these models have their own model assumptions and pre-defined underlying relationship between dependent and independent variables. If these assumptions are violated, the model could lead to erroneous estimation of accident likelihood. Classification and Regression Tree (CART), one of the most widely applied data mining techniques, has been commonly employed in business administration, industry, and engineering. CART does not require any pre-defined underlying relationship between target (dependent) variable and predictors (independent variables) and has been shown to be a powerful tool, particularly for dealing with prediction and classification problems. This study collected the 2001-2002 accident data of National Freeway 1 in Taiwan. A CART model and a negative binomial regression model were developed to establish the empirical relationship between traffic accidents and highway geometric variables, traffic characteristics, and environmental factors. The CART findings indicated that the average daily traffic volume and precipitation variables were the key determinants for freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies.
Article
The intent of this note is to succinctly articulate additional points that were not provided in the original paper (Lord et al., 2005) and to help clarify a collective reluctance to adopt zero-inflated (ZI) models for modeling highway safety data. A dialogue on this important issue, just one of many important safety modeling issues, is healthy discourse on the path towards improved safety modeling. This note first provides a summary of prior findings and conclusions of the original paper. It then presents two critical and relevant issues: the maximizing statistical fit fallacy and logic problems with the ZI model in highway safety modeling. Finally, we provide brief conclusions.
Article
There is a growing concern with the safety of school-aged children. This study identifies the locations of pedestrian/bicyclist crashes involving school-aged children and examines the conditions when these crashes are more likely to occur. The 5-year records of crashes in Orange County, Florida where school-aged children were involved were used. The spatial distribution of these crashes was investigated using the Geographic Information Systems (GIS) and the likelihoods of crash occurrence under different conditions were estimated using log-linear models. A majority of school-aged children crashes occurred in the areas near schools. Although elementary school children were generally very involved, middle and high school children were more involved in crashes, particularly on high-speed multi-lane roadways. Driver's age, gender, and alcohol use, pedestrian's/bicyclist's age, number of lanes, median type, speed limits, and speed ratio were also found to be correlated with the frequency of crashes. The result confirms that school-aged children are exposed to high crash risk near schools. High crash involvement of middle and high school children reflects that middle and high schools tend to be located near multi-lane high-speed roads. The pedestrian's/bicyclist's demographic factors and geometric characteristics of the roads adjacent to schools associated with school children's crash involvement are of interest to school districts.
Article
1. Ecologists use statistical models for both explanation and prediction, and need techniques that are flexible enough to express typical features of their data, such as nonlinearities and interactions. 2. This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model. Boosted regression trees combine the strengths of two algorithms: regression trees (models that relate a response to their predictors by recursive binary splits) and boosting (an adaptive method for combining many simple models to give improved predictive performance). The final BRT model can be understood as an additive regression model in which individual terms are simple trees, fitted in a forward, stagewise fashion. 3. Boosted regression trees incorporate important advantages of tree-based methods, handling different types of predictor variables and accommodating missing data. They have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and automatically handle interaction effects between predictors. Fitting multiple trees in BRT overcomes the biggest drawback of single tree models: their relatively poor predictive performance. Although BRT models are complex, they can be summarized in ways that give powerful ecological insight, and their predictive performance is superior to most traditional modelling methods. 4. The unique features of BRT raise a number of practical issues in model fitting. We demonstrate the practicalities and advantages of using BRT through a distributional analysis of the short-finned eel (Anguilla australis Richardson), a native freshwater fish of New Zealand. We use a data set of over 13 000 sites to illustrate effects of several settings, and then fit and interpret a model using a subset of the data. We provide code and a tutorial to enable the wider use of BRT by ecologists.
Determining the optimal access design 1007 of managed lanes considering dynamic pricing
  • M Saad
  • M Abdel-Aty
  • J Lee
  • L Wang
Saad, M., Abdel-aty, M., Lee, J., & Wang, L. (2018a). Determining the optimal access design 1007 of managed lanes considering dynamic pricing. 18th International Conference Road 1008 Safety on Five Continents.
Random parameter model used to explain effects
  • S Ukkusuri
  • S Hasan
  • H Aziz
Ukkusuri, S., Hasan, S., & Aziz, H. (2011). Random parameter model used to explain effects doi.org/10.3141/2237-11.
Geo-spatial and log-linear analysis of pedes-862
  • M Abdel-Aty
  • S S Chundi
  • C Lee
Abdel-Aty, M., Chundi, S. S., & Lee, C. (2007). Geo-spatial and log-linear analysis of pedes-862
Classification and Regression 873
  • L Breiman
  • J H Friedman
  • R A Olshen
  • C J Stone
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1998). Classification and Regression 873
Integrating the macroscopic and microscopic traffic safety analysis using hier-875
  • Q Cai
Cai, Q. (2017). Integrating the macroscopic and microscopic traffic safety analysis using hier-875