ArticlePDF Available

Abstract and Figures

Human behavior significantly contributes to severe road injuries, underscoring a critical road safety challenge. This study addresses the complex task of predicting dangerous driving behaviors through a comprehensive analysis of over 356,000 trips, enhancing existing knowledge in the field and promoting sustainability and road safety. The research uses advanced machine learning algorithms (e.g., Random Forest, Gradient Boosting, Extreme Gradient Boosting, Multilayer Perceptron, and K-Nearest Neighbors) to categorize driving behaviors into ‘Dangerous’ and ‘Non-Dangerous’. Feature selection techniques are applied to enhance the understanding of influential driving behaviors, while k-means clustering establishes reliable safety thresholds. Findings indicate that Gradient Boosting and Multilayer Perceptron excel, achieving recall rates of approximately 67% to 68% for both harsh acceleration and braking events. This study identifies critical thresholds for harsh events: (a) 48.82 harsh accelerations and (b) 45.40 harsh brakings per 100 km, providing new benchmarks for assessing driving risks. The application of machine learning algorithms, feature selection, and k-means clustering offers a promising approach for improving road safety and reducing socio-economic costs through sustainable practices. By adopting these techniques and the identified thresholds for harsh events, authorities and organizations can develop effective strategies to detect and mitigate dangerous driving behaviors.
Content may be subject to copyright.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This study investigates the feasibility of a relevance vector machine tuned with improved Manta-Ray foraging optimization (RVM-IMRFO) in predicting monthly pan evaporation using limited climatic input data (e.g. temperature). The accuracy of the RVM-IMRFO was evaluated by comparing it with RVM tuned by gray wolf optimization, RVM tuned with a whale optimization algorithm, and RVM tuned with Manta Ray foraging optimization concerning root mean square errors (RMSE), mean absolute errors (MAE), determination coefficient (R2) and Nash-Sutcliffe Efficiency (NSE) and new graphical inspection methods. The models were assessed using data acquired from two stations in China and data were divided into three equal parts. The models were tested using each data set. The application outcomes revealed that the proposed algorithm considerably improved the accuracy of a single RVM in monthly pan evaporation prediction by an average improvement in RMSE, MAE, R2, and NSE as 27.65%, 27.53%, 8.40% and 8.63%, respectively. It is also found that the proposed algorithm showed significant dominance over others models with respect to improvement in overall mean values of RMSE, MAE, R2, and NSE statistics from 34.7–38.2 to 18.2–19.5, 36.2–36.4 to 19.1–18.5, 12.5–13.8 to 3.6–3.7, and 12.4–14.6 to 3.6–3.9%, for both climatic stations, respectively. Importing extraterrestrial radiation and periodicity component (month number of the data) into the model inputs improved the prediction accuracy of the implemented models. The outcomes revealed that the RVM-IMRFO performed superior to the other methods in predicting monthly pan evaporation using only temperature data which is essential, especially in developing countries where other climatic data are missing or unavailable. The RVM model was also compared with standard multi-layer perceptron neural networks (MLPNN) and found that the first acts better than the latter in monthly pan evaporation prediction.
Article
Full-text available
Dissolved oxygen (DO) concentration is an important water-quality parameter, and its estimation is very important for aquatic ecosystems, drinking water resources, and agro-industrial activities. In the presented study, a new support vector machine (SVM) method, which is improved by hybrid firefly algorithm–particle swarm optimization (FFAPSO), is proposed for the accurate estimation of the DO. Daily pH, temperature (T), electrical conductivity (EC), river discharge (Q) and DO data from Fountain Creek near Fountain, the United States, were used for the model development. Various combinations of pH, T, EC, and Q were used as inputs to the models to estimate the DO. The outcomes of the proposed SVM–FFAPSO model were compared with the SVM–PSO, SVM– FFA, and standalone SVM with respect to the root mean square errors (RMSE), the mean absolute error (MAE), Nash–Sutcliffe efficiency (NSE), and determination coefficient (R2), and graphical methods, such as scatterplots, and Taylor and violin charts. The SVM–FFAPSO showed a superior performance to the other methods in the estimation of the DO. The best model of each method was also assessed in multistep-ahead (from 1- to 7-day ahead) DO, and the superiority of the proposed method was observed from the comparison. The general outcomes recommend the use of SVM– FFAPSO in DO modeling, and this method can be useful for decision-makers in urban water planning and management
Article
Full-text available
Driving risk classification is usually used for evaluating and reducing traffic accidents. It is of great significance to improve urban traffic problems, such as traffic jams and road accidents. Most related works use different representation methods with the data from the advanced driving assistance system and then use statistical approaches or machine learning to analyze the driving risk. However, because of the unbalance of positive and negative samples, the performance is usually not satisfactory. In this article, we propose a driving risk classification method via unbalanced time series samples. It first employs MeanShift with automatic bandwidth to cluster samples and expand its volume according to similarity. Then, it adopts a regional division module to simulate the outside environment and extracts state transition features by the Markov feature module to get more detailed information. By combining the stacking classification module with three convolutional neural networks, the performance is further improved. Finally, a city transfer module is designed with the drivers’ inherent attributes and sample weights to enhance the generalization of the model. The experimental results verify the effectiveness and generality of the classification model on driving risk. It can also be transferred to time series analysis in other fields.
Article
Full-text available
Distraction while driving occurs when a driver is engaged in non-driving activities. These activities reduce the driver’s attention and focus on the road, therefore increasing the risk of accidents. As a consequence, the number of accidents increases and infrastructure is damaged. Cars are now equipped with different safety precautions that ensure driver awareness and attention at all times. The first step for such systems is to define whether the driver is distracted or not. Different methods are proposed to detect such distractions, but they lack efficiency when tested in real-life situations. In this paper, four machine learning classification methods are implemented and compared to identify drivers’ behavior and distraction situations based on real data corresponding to different behaviors such as aggressive, drowsy and normal. The data were randomized for a better application of the methods. We demonstrate that the gradient boosting method outperforms the other used classifiers.
Article
Full-text available
Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability.
Article
Full-text available
Enhancing traffic safety on freeways is the main goal for all transportation agencies. However, to achieve this goal, many analysis protocols of network screening models need to be improved through considering human factors while analyzing traffic data. This paper introduces one on the new analysis protocol of identifying and discriminating between normal and risky driving in clear and rainy weather. The introduced analysis protocol will consider the effect of human factors on updating the networking screening process of identifying hotspots of crash risk. This paper employs the Second Strategic Highway Research Program (SHRP2) Naturalistic Driving Study (NDS) data to investigate the behavior of normal and risky driving under both rainy and clear weather conditions. Near-crash events on freeways, which were used as Surrogate Measure of Safety (SMoS) for crash risk, were identified based on the changes in vehicle kinematics, including speed, longitudinal and lateral acceleration and deceleration rates, and yaw rates. Through a trajectory-level data analysis, there were significant differences in driving patterns between rainy and clear weather conditions; factors that affected crash risk mainly included driver reaction and response time, their evasive maneuvers such as changes in acceleration rates and yaw rates, and lane-changing maneuvers. A cluster analysis method was employed to classify driving patterns into two clusters: normal and risky driving condition patterns, respectively. Statistical results showed that risky driving patterns started on average one second earlier in rainy weather conditions than in clear weather conditions. Furthermore, risky driving patterns extended in average three seconds in rainy weather conditions, while it was two seconds in clear weather conditions. The identification of these patterns is considered as a primary step towards an automated development that would distinguish between different driving patterns in a Connected Vehicle CV environment using Basic Safety Messages (BSM) and to enhance the network screening analysis for increased crash risk hotspots.
Article
Assessing the individual’s driving profile and identifying the at-fault behaviors contributes to road safety, riding comfort, and driver assistance systems. This study proposes a framework to identify aggressive driving patterns in longitudinal control using real-time driving profiles of heavy passenger vehicle (HPV) drivers. The main objective is to detect and quantify the instantaneous driving decisions and classify the identified maneuvers (acceleration, braking) using unsupervised machine learning techniques without any prior-ground truth. To this end, total 8295 acceleration events, and 7151 braking events, were extracted from 142 driving profiles collected using high-resolution (10 Hz) GPS instrumentation. The principal component analysis was conducted on a multi-dimensional feature set, followed by a two-stage k-means clustering on the reduced feature subspace. The results showed that 86.5% of accelerations and 65.3% of braking maneuvers were characterized as non-aggressive, indicating safe or base-line driving behavior. However, 13.5% of accelerations and 34.7% of braking maneuvers were featured to be aggressive, indicative of the actual risky behaviors. Further analysis demonstrated the heterogeneity in drivers’ trip-level frequency of aggressive maneuvers and highlighted the need for a continuous driving assessment. The study also revealed that the thresholds derived from the obtained clusters featuring the aggressive accelerations (+0.3 to +0.48 g) and aggressive braking (−0.42 to −0.27 g) maneuvers were beyond the acceptable limits of passenger safety and comfort. The insights from the study aids in developing driver assistance systems for personalized feedback provision and improve driver behavior.
Article
Crash data analysis is commonly subjected to imbalanced data. Varied by facility and control types, some crash types are more frequent than others. However, uncommon crash types are routinely more severe and associated with higher economic and societal costs, and thus crucial to prevent. It is paramount to develop inferential models that can reliably predict crash types and identify attributing factors, especially for the severe types. The current process of modeling towards infrequent events generally disregards disparity in data representation, which can lead to biased models. Therefore, mitigating and managing imbalanced data is essential to the development of meaningful and robust models that help reveal effective countermeasures. This study focuses on comparing the effects of resampling techniques on the performance of both machine learning and classical statistical models for classifying and predicting different crash types on freeways. Specifically, a mixed sampling approach featuring a cluster-based under-sampling coupled with three popular over-sampling methods (i.e., random over-sampling, synthetic minority over-sampling, and adaptive synthetic sampling) were investigated with respect to four crash classification models, including three ensemble machine learning models (CatBoost, XGBoost, and Random Forests) and one classic statistical model (Nested Logit). This study concluded that all three resampling methods consistently enhanced the performance of all models. Among the three over-sampling methods, the adaptive synthetic sampling approach performed best and tremendously improved the prediction of minority crash types without impeding the prediction of the majority crash type. This is likely due to the density-based approach of adaptive synthetic sampling in creating synthetic instances that are more congruent with the underlying manifold structure embodied in the high-dimensional feature space.
Article
The aim of this paper is to explore driving behaviour during mobile phone use on the basis of detailed driving analytics collected by smartphone sensors. The data came from a sample of one hundred drivers (18,850 trips) during a naturalistic driving experiment over four months. A specially developed smartphone application was used, through which driving exposure and behaviour metrics are captured by the smartphone sensors and transmitted to a back-end platform. The data are processed by Machine Learning algorithms yielding exposure (e.g. distance travelled per road type and time of day) and behaviour indicators (e.g. speeding, speed and acceleration variations, harsh braking, harsh manoeuvring, use of mobile phone etc.). Mixed binary logistic regression models were developed to investigate whether mobile phone use during a trip is correlated with other driving metrics, and can be accurately “detected” based on them. A model for all trips was developed, as well as models for trips on different road types (urban, rural, highway). Exposure metrics found to be significantly associated with the probability of mobile phone use are trip length, and driving off-morning rush. Exceeding the speed limits and the number of harsh events (particularly harsh cornering), are all negatively associated with the probability of mobile phone use. A general pattern of less speeding and smoother driving appears indicative of mobile phone use, in line with known assumptions of driver compensatory behaviour. The results suggest that mobile phone use while driving may be accurately predicted by the model in more than 70% of cases.