A method for simplifying the analysis of traffic accidents injury severity on two-lane highways using Bayesian networks

TRYSE Research Group, Department of Civil Engineering, University of Granada, Spain.
Journal of safety research (Impact Factor: 1.34). 10/2011; 42(5):317-26. DOI: 10.1016/j.jsr.2011.06.010
Source: PubMed


This study describes a method for reducing the number of variables frequently considered in modeling the severity of traffic accidents. The method's efficiency is assessed by constructing Bayesian networks (BN).
It is based on a two stage selection process. Several variable selection algorithms, commonly used in data mining, are applied in order to select subsets of variables. BNs are built using the selected subsets and their performance is compared with the original BN (with all the variables) using five indicators. The BNs that improve the indicators' values are further analyzed for identifying the most significant variables (accident type, age, atmospheric factors, gender, lighting, number of injured, and occupant involved). A new BN is built using these variables, where the results of the indicators indicate, in most of the cases, a statistically significant improvement with respect to the original BN.
It is possible to reduce the number of variables used to model traffic accidents injury severity through BNs without reducing the performance of the model.
The study provides the safety analysts a methodology that could be used to minimize the number of variables used in order to determine efficiently the injury severity of traffic accidents without reducing the performance of the model.

Download full-text


Available from: Juan De Oña,
  • Source
    • "The values of accuracy range from 64.0% in C1 to 55.1% in C4. These values are within the same range found in previous studies (Abdelwahab and Abdel-Aty, 2001; De Oña et al., 2011; Mujalli and De Oña, 2011) that used classification techniques for similar objectives. Table 4 shows that only C1 (64.0%) achieved a statistically significant improvement of accuracy as compared with results obtained for the EDB (59.5%). "
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the principal objectives of traffic accident analyses is to identify key factors that affect the severity of an accident. However, with the presence of heterogeneity in the raw data used, the analysis of traffic accidents becomes difficult. In this paper, Latent Class Cluster (LCC) is used as a preliminary tool for segmentation of 3229 accidents on rural highways in Granada (Spain) between 2005 and 2008. Next, Bayesian Networks (BNs) are used to identify the main factors involved in accident severity for both, the entire database (EDB) and the clusters previously obtained by LCC. The results of these cluster-based analyses are compared with the results of a full-data analysis. The results show that the combined use of both techniques is very interesting as it reveals further information that would not have been obtained without prior segmentation of the data. BN inference is used to obtain the variables that best identify accidents with killed or seriously injured. Accident type and sight distance have been identify in all the cases analysed; other variables such as time, occupant involved or age are identified in EDB and only in one cluster; whereas variables vehicles involved, number of injuries, atmospheric factors, pavement markings and pavement width are identified only in one cluster.
    Accident; analysis and prevention 11/2012; 51C:1-10. DOI:10.1016/j.aap.2012.10.016 · 1.65 Impact Factor
  • Source
    • "Taking into consideration the indicators used to evaluate the goodness of a classification method in de Oña et al. (2011) and Mujalli and de Oña (2011), and that the variable class used shows 2 possible response categories (state A and state B), the parameters that can be defined are described below:  Accuracy -The method's precision, defined as the percentage of cases correctly classified by the classifier.  Sensitivity -The proportion of cases correctly classified as state A among all the observed as state A.  Specificity -The proportion of cases correctly classified as state B among all the observed as state B.  Receiver Operating Characteristic Curve (ROC) Area – This indicator represents the curve of positive cases correctly classified (sensitivity), as opposed to the cases of false positives (1-specificity), in such a way that a value 1 describes a perfect adjustment. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Given the current number of road accidents, the aim of many road safety analysts is to identify the main factors that contribute to crash severity. To pinpoint those factors, this paper shows an application that applies some of the methods most commonly used to build decision trees (DTs), which have not been applied to the road safety field before. An analysis of accidents on rural highways in the province of Granada (Spain) between 2003 and 2009 (both inclusive) showed that the methods used to build DTs serve our purpose and may even be complementary. Applying these methods has enabled potentially useful decision rules to be extracted that could be used by road safety analysts. For instance, some of the rules may indicate that women, contrary to men, increase their risk of severity under bad lighting conditions. The rules could be used in road safety campaigns to mitigate specific problems. This would enable managers to implement priority actions based on a classification of accidents by types (depending on their severity). However, the primary importance of this proposal is that other databases not used here (i.e. other infrastructure, roads and countries) could be used to identify unconventional problems in a manner easy for road safety managers to understand, as decision rules.
    Accident; analysis and prevention 09/2012; 50. DOI:10.1016/j.aap.2012.09.006 · 1.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we describe a framework for an expert system that tries to predict effects of an accident based on past data using supervised learning employing artificial neural networks. For this purpose, sensory data events are post processed in order to generate a reasonable mapping between input and output parameters in case an event is detected automatically or manually. The framework is intended to be used to take actions for reducing the effects of the accident on traffic congestion and to inform necessary parties to intervene in a timely fashion.
    Computational Intelligence and Informatics (CINTI), 2012 IEEE 13th International Symposium on; 01/2012
Show more