Content uploaded by Arash Khoda Bakhshi
Author content
All content in this area was uploaded by Arash Khoda Bakhshi on Aug 26, 2020
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=ttra21
Transportmetrica A: Transport Science
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/ttra21
Utilizing black-box visualization tools to interpret
non-parametric real-time risk assessment models
Arash Khoda Bakhshi & Mohamed M. Ahmed
To cite this article: Arash Khoda Bakhshi & Mohamed M. Ahmed (2020): Utilizing black-box
visualization tools to interpret non-parametric real-time risk assessment models, Transportmetrica
A: Transport Science, DOI: 10.1080/23249935.2020.1810169
To link to this article: https://doi.org/10.1080/23249935.2020.1810169
Accepted author version posted online: 13
Aug 2020.
Published online: 25 Aug 2020.
Submit your article to this journal
Article views: 123
View related articles
View Crossmark data
TRANSPORTMETRICA A: TRANSPORT SCIENCE
https://doi.org/10.1080/23249935.2020.1810169
Utilizing black-box visualization tools to interpret
non-parametric real-time risk assessment models
Arash Khoda Bakhshi and Mohamed M. Ahmed
Department of Civil & Architectural Engineering, University of Wyoming, Laramie, WY, USA
ABSTRACT
This study bridges the gap between Real-Time Risk Assessment
(RTRA) and its practical implications by following the post-hoc inter-
pretability approach and utilizing black-box graphical tools for safety
data visualization. The real-time traffic-related crash contributing
factors were detected using the matched-case control design on
402-miles Interstate 80 in Wyoming. Four black-box visualization
tools, including Partial Dependence Plot (PDP), Individual Condi-
tional Expectation (ICE), centered ICE, and Accumulated Local Effect
(ALE), were scrutinized to interpret the causal effect of these factors
on crash probabilities. The results revealed that these techniques
have many advantages, disadvantages, and unanswered questions
that must be recognized by Active Traffic Management. PDPs must
be accompanied by ICEs that explain the heterogeneity across obser-
vations. ALE is the most reliable technique in one-dimensional plots
for highly correlated space of variables. However, there is a substan-
tial distinction between PDP and ALE in two-dimensional plots that
may make ALE an unreliable method.
ARTICLE HISTORY
Received 19 March 2020
Accepted 21 July 2020
KEYWORDS
Real-Time Risk Assessment;
Safety Data Visualization;
Partial Dependence Plot;
Individual Conditional
Expectation; Accumulated
Local Effect
1. Introduction
Despite abundant studies in the field of Real-Time Risk Assessment (RTRA), the toll of traf-
fic crashes still is a challenging issue among communities. According to the World Health
Organization, every year, 1.2 million people die, and more than 50 million people are
injured as the consequence of fatal and non-fatal road crashes (World Health Organiza-
tion 2015). To alleviate the crash rates, researchers have started studying crash prevention
and prediction for more than two decades. To this aim, the RTRA literature laid a good
foundation, and numerous methodologies have been proposed, developed, and discussed.
However, monitoring crash-prone conditions by Active Traffic Management (ATM) necessi-
tates visualization and interpretation of the results as a fundamental step to pass anecdotal
knowledge toward more practical implications.
Regardless of different models and methodologies used in RTRA, two notable terms
should be clarified: Accuracy and Interpretability. There is a trade-off between accuracy
CONTACT Arash Khoda Bakhshi akhodaba@uwyo.edu Department of Civil & Architectural Engineering,
University of Wyoming, 1000 E University Ave, Dept. 3295, Laramie 82071, WY, USA
© 2020 Hong Kong Society for Transportation Studies Limited
2A. KHODA BAKHSHI AND M. M. AHMED
and interpretability for a specific model. In classification problems for a dichotomous out-
come, the accuracy emphasizes on discerning one class (e.g. crashes) out of other (e.g.
non-crashes), while interpretability places particular emphasis on explaining the causality
relation between predictors and the probability of outcomes (Hossain et al. 2019). Fortu-
nately, there are worthy achievements in the literature in terms of accuracy in clustering
crashes versus non-crashes. A variety of statistical methods, as well as Artificial Intelligence
(AI) or data-mining techniques, obtained even more than 90% accuracy in this way (Hos-
sain et al. 2019). However, a majority of these approaches are based on non-parametric
techniques that are arduous to be interpreted (Lord and Mannering 2010). Therefore, irre-
spective of the obtained accuracies, it seems that the practical implications of these models
have been left unanswered.
The visualization tools of AI and Machine Learning (ML) techniques in elucidating the
causality effect of predictors on response variables are well-known and widely discussed in
their relating literature (Vellido, Martín-Guerrero, and Lisboa 2012; Molnar 2019). However,
without understanding the mathematical and statistical backgrounds of these methods,
implementing them to interpret results might be misleading to ATM. Additionally, con-
sidering the fundamental traffic flow diagram, real-time traffic-related crash contributing
factors can be correlated, resulting in heterogeneity across observations (Anastasopoulos
and Mannering 2009), which might mislead interpretations.
The main objective and contribution of the current study are to delve into the inner
working of the widely used black box visualization tools by explaining their mathematical
background, specifically in the RTRA domain. To this end, a matched-case control design
under the RTRA framework was followed to predict crash risk over 402-miles Interstate 80
(I-80) in Wyoming. I-80, with a low traffic volume, is a major freight corridor that mostly oper-
ates at Level of Service (LOS) A or B with an Average Annual Daily Traffic (AADT) between
10,000 and 20,000 vehicles per day that includes 30% to 55% heavy trucks (Wyoming
Department of Transportation 2018). In this corridor and during the last three decades,
the traffic volume has increased by 65%, whereas the growth rate of the heavy truck traffic
volume was 150% (Wyoming Department of Transportation 2018). This difference requires
more profound safety analyses over this corridor since, in 2014, I-80 reached 0.52 in large
truck crashes per million vehicles miles traveled, which was the first rank in the United States
(Gaweesh, Ahmed, and Piccorelli 2019; Khoda Bakhshi and Ahmed 2020a,2020b).
Furthermore, challenging mountainous terrain, geometric features, and adverse
weather conditions during snow seasons form non-monotonous traffic patterns through
the corridor that brings difficulties to developing a unique RTRA model over the entire
corridor. Therefore, the most significant crash contributing factors were recognized based
on a combination of Corrected Impurity Importance, as a feature selection algorithm, and
Generalized Additive Model (GAM), as the Crash Prediction Model (CPM). To recognize
the effects of the significant crash contributing factors on crash probabilities, the causal-
ity effects of these predictors on the crash probability were investigated according to four
widely used black-box visualization tools offered by Random Forest; Partial Dependence
Plot (PDP), Individual Conditional Expectation (ICE), centered ICE (cICE), and Accumulated
Local Effect (ALE) (Friedman 2001; Goldstein et al. 2015; Apley 2016; Greenwell 2017; Molnar
2019).
The conclusion and recommendations of this study can provide valuable insights by
clarifying the advantages and disadvantages of black-box visualization tools, especially
TRANSPORTMETRICA A: TRANSPORT SCIENCE 3
when they are to be used in the domain of RTRA and crash frequency analyses. The rest
of the article proceeds as follows. Section 2 presents a brief background of the RTRA. In
sections 3, the data preparation process and description of traffic-related variables are
explained. Section 4 introduces the methodology used in this study, wherein the theoret-
ical backgrounds of the crash prediction model and the four black box visualization tools
are discussed. The remarkable notes of utilizing black box visualization tools in the RTRA
domain are highlighted through various discussions in section 5. Section 6 frames the final
outlines of the paper.
2. Background
Oh et al. (2001) opened the gate for the RTRA by showing the possible statistical link-
age between traffic characteristics and crash probability. Afterward, various statistical
models, Machine Learning, and Deep Learning techniques were used to promote and
strengthen this relation. Ahmed, Abdel-Aty, and Yu (2012) used Bayesian Logistic Regres-
sion on mountainous freeways. They explained that the logarithm of the coefficient of
variation in speed is the most significant factor in crash prediction during the snow seasons.
Previous studies utilized real-time traffic loop data and Automatic Vehicle Identification
system. It was concluded that, within 5–10 min before crashes, the logarithm of the coef-
ficient of variation in speed is the most significant crash contributing factor (Abdel-Aty
and Pemmanaboina 2006; Ahmed and Abdel-Aty 2011). Investigating a limited length of
a corridor is the limitation in developing Crash Prediction Models (CPMs) that might hin-
der generalizing the obtained results to the entire corridor (Hossain et al. 2019). Having
different traffic patterns throughout a corridor might be the cause of this issue. It has
been shown that a developed model on I-4 cannot be appropriate to detect crash-prone
conditions on I-95 since the traffic patterns are not the same (Pande et al. 2011). Fluctua-
tions in traffic patterns can be a result of different combinations of vertical and horizontal
geometric features with varying weather conditions leading to nonlinear traffic-related
variables in CPMs (Yu et al. 2013; Eftekharzadeh and Khodabakhshi 2014; Mousavi et al.
2019).
Different approaches, including statistical modeling and black-box techniques, have
been proposed to deal with nonlinear predictors in the domain of RTRA and traffic crash
frequency modeling (Hossain et al. 2019). In this regard, statistical modeling follows two
distinct attitudes; parametric and non-parametric modeling. The former applies paramet-
ric transformation on nonlinear predictors to obtain a linear causal effect. The later follows
a data-driven approach and allows the data to find their patterns utilizing smoothing func-
tions such as splines (Jones and Wrigley 1995; Lord and Mannering 2010; Washington,
Karlaftis, and Mannering 2010; Azizi and Sheikholeslami 2013;Hastie2017; Azimi et al. 2020;
Rahimi et al. 2020). On the other hand, Artificial Intelligence and data-mining techniques,
including Trees Inductions and Neural Networks (NNs), have recently gained researchers’
attention due to their potentials in capturing nonlinear relationships between response
and explanatory predictors (Qu et al. 2013; Bargegol et al. 2016; Katrakazas, Quddus, and
Chen 2016; Liu and Chen 2017;Yangetal.2018; Mokhtarimousavi et al. 2019; Parsa et al.
2019; Nasr Esfahani and Song 2020; Parsa et al. 2020).
In a study conducted by Zeng et al. (2016a), an advanced NN was proposed to capture
the nonlinear relationship between crash frequency by severity and related factors on road
4A. KHODA BAKHSHI AND M. M. AHMED
segments in Hong Kong. The generalization capacity of neural networks was promoted
by proposing a structure optimization algorithm and a modified rule extraction algorithm.
The performance of the developed NN was benchmarked against a multivariate Poisson-
lognormal model. The results revealed the superiority of the proposed neural networks
in terms of fitting and predictive performance. Besides, in another related study, the per-
formance of the proposed and optimized NN was compared to the traditional Negative
Binomial model. Once again, NN outperformed the NB base-model in fitting and prediction
(Zeng et al. 2016b).
The performance of Artificial Intelligence, data-mining techniques, and statistical mod-
eling is a data-dependent matter. For instance, statistical literature has apprised when the
sample size and the separability of signal from noises across observations decrease (i.e.
typically having less than 85% accuracy), traditional Logistic Regression can outperform
Trees Induction methods; otherwise, Trees Induction models can have a superiority (Per-
lich, Provost, and Simonoff 2003; Kirasich, Smith, and Sadler 2018). Hence, according to the
characteristics of the dataset, the appropriate modeling technique should be selected to
achieve the maximum possible accuracy and prediction.
In this study, as it will be discussed later on, Logistic Regression was selected to cluster
crashes versus non-crash cases because it outperformed random forest and neural net-
works based on the same dataset. In terms of dealing with nonlinear predictors under
Logistic Regression, parametric and non-parametric approaches can be applied. Lao et al.
showed that, concerning the accuracy, Generalized Nonlinear Model (GNM), as a paramet-
ric technique, outperformed the Generalized Linear Model (GLM) (Lao et al. 2014). Besides,
the untapped potentials of non-parametric Generalized Additive Model (GAMs) compared
to GNM, have been shown by previous safety studies to deal with nonlinear predictors (Xie
and Zhang 2008;Li,Lord,andZhang2010). Thus, non-parametric GAM has been used to
detect crash-prone conditions in this study.
Lord and Mannering and Hossain et al. mentioned the problem of interpreting non-
parametric models as an arising challenge in this approach (Lord and Mannering 2010;
Hossain et al. 2019). The graphical tools of black-box techniques can be utilized to amend
this dilemma (Hossain et al. 2019) by following the post-hoc interpretability approach
in the domain of Interpretable Machine Learning (IML) (Du, Liu, and Hu 2019;Yang,
Du, and Hu 2019). Friedman proposed the concept of Partial Dependence Plots (PDPs)
that tracks the average of predicted outcomes through the distribution of a feature
when the effects of other features have been marginalized (Friedman 2001). While PDP
presents one curve as the average outcome, Goldstein et al. proposed Individual Con-
ditional Expectation (ICE) and centered ICE (cICE) that enable analysts to assess hetero-
geneity and homogeneity created by the interaction of predictors across observations
(Goldstein et al. 2015). Both PDP and ICE, however, suffer from highly correlated multi-
dimensional spaces (Molnar 2019). This issue has been addressed in the concept of Accu-
mulated Local Effect (ALE) (Apley 2016). ALE blocks the effect of highly correlated features
by dividing the domain of the predictor of interest into several intervals (Apley 2016).
Regarding the inner working and mathematical structures, these methods have advan-
tages and disadvantages that are essential to be considered by ATM in recognizing the
causal effects of crash contributing factors on crash probabilities. Aligned with the RTRA,
the most important aspects of these visualization tools are explained later on in this
study.
TRANSPORTMETRICA A: TRANSPORT SCIENCE 5
3. Data preparation and variable description
The matched-case control design was used to compare traffic oscillations between crashes
and non-crash cases. This approach predicts crashes according to crash precursors and
compares them with normal traffic patterns before non-crash instances within the same
timeframes. The matched-case control design follows a proper experimental design that
controls confounding factors such as roadway geometry, driver population, seasonal traf-
fic variation, and to some extent, weather conditions. Two fundamental principles must
be considered to provide a more accurate analysis. First, a non-crash instance should be
reduced through the same day of a week, the same time-window of a day, and the same
location that the corresponding crash has occurred. Secondly, the real-time traffic obser-
vations for the crash and non-crash scenarios should be obtained from the same speed
sensors to hold the same distance from the location of inspection (i.e. sensor location) and
the place of investigation (i.e. crash location).
In this regard, real-time traffic observations, within 15-minutes time-windows before
crashes and non-crash cases, were captured from speed-sensors installed on the 402-miles
of I-80. This time-window was chosen for two reasons; (1) to enable ATM in controlling traf-
fic oscillations, (2) to account for the small number of real-time observations collected on
the low-volume rural I-80. It has been shown that using two non-crash cases for each crash
within a week before and after crashes is sufficient to obtain an appropriate accuracy in
crash prediction (Ahmed and Abdel-Aty 2011; Hossain et al. 2019). Therefore, two datasets
for the crash and non-crash cases were conflated, and considering the data limitations, if
possible, for each crash, two non-crash cases were reduced.
The real-time traffic observations and crash data during the first seven months of 2017
were gathered. The crash database, provided by the Wyoming Department of Transporta-
tion (WYDOT), contains time, date, location, first harmful event, and the use of alcohol and
illegal drugs by drivers at the scene of crashes. It was preferred to include only crashes
that were attributed to traffic characteristics (Ahmed and Abdel-Aty 2011). Therefore,
those crashes under the effect of drug or alcohol usage, unknown, and other reasons,
representing 3% of the initial crash dataset, were eliminated.
The real-time traffic observations were provided by the WYDOT using Wavetronix speed
sensors installed on the corridor. Among 94 speed sensors on the 402-miles of I-80 in
Wyoming, 51 speed sensors were utilized because of the data limitations. On this point,
to capture the real-time traffic observations, it is needed to have both longitudinal (i.e.
milepost) and lateral (i.e. being installed on East or West Bound of the corridor) posi-
tions of sensors, which were not available for 28 speed sensors. Besides, an initial inves-
tigation showed that 15 speed sensors had observed traffic stream on only one lane in
each direction, instead of two lanes. Accordingly, 51 speed sensors were used in this
study.
The real-time speed observation dataset comprised the time of observation, vehicle
speed, vehicle length, and lane assignment. If a vehicle’s length was less than 30 feet, it was
considered as a passenger car; otherwise, it was denoted as a truck. In summary, 10,735,339
real-time traffic observations were reduced to 70,930 observations corresponding to 203
crashes and 284 non-crash cases to prepare the final dataset. The dataset was structured by
characterizing real-time traffic-related variables to capture traffic oscillations both laterally
and longitudinally.
6A. KHODA BAKHSHI AND M. M. AHMED
Tab le 1 . Description of explanatory variables.
Continuous
VariablesaDescription S.D. Mean Min. Max.
T_SpMean Spatial Difference in Mean Speed of Total Traffic
Volume in both Lanes
7.09 −0.10 −34.88 35.52
T_SpVar Spatial Difference in Speed Variance of Total Traffic
Volume in both Lanes
58.05 −11.78 −578.05 458.17
T_SpVARoMEAN Spatial Difference in Speed Variance Divided by
Mean Speed for Total Traffic Volume in both
Lanes
1.21 −0.16 −17.27 9.79
T_SpSlop Spatial Difference in the Slop of Speed Regression
(from Speed Profile) for Total Traffic Volume in
both Lanes
0.02 0.00 −0.03 0.26
T_SpVarDiff Spatial Difference in Subtraction of Speed Variance
in HSLbfrom Speed Variance in LSLc100.79 12.63 −1227.42 533.89
T_Volume Spatial Difference in Total Traffic Volume 28.45 10.90 −88.00 138.00
T_TrP Spatial Difference in Truck Proportion in Total
Traffic Volume in both Lanes
0.17 0.01 −0.57 0.57
T_TrPDiff Spatial Difference in Subtraction of Truck
Proportion in HSL from Truck Proportion in LSL
0.29 −0.04 −1.00 0.84
T_VolumeDiff Spatial Difference in Subtraction of Traffic Volume
in HSL from Traffic Volume in LSL
19.05 6.12 −75.00 72.00
HSL_SpMean Spatial Difference in Mean Speed in HSL 11.55 1.59 −32.85 84.07
HSL_SpVar Spatial Difference in Speed Variance in HSL 77.20 1.04 −752.67 537.92
HSL_SpVARoMEAN Spatial Difference in Speed Variance Divided by
Speed Mean in HSL
1.47 0.01 −19.68 10.59
HSL_SpSlop Spatial Difference in Slop of Speed Regression
(from Speed Profile) for HSL
0.18 −0.01 −3.80 0.27
HSL_Volume Spatial Difference in Traffic Volume in HSL 14.22 −0.23 −54.00 111.00
HSL_TrP Spatial Difference in Truck Proportion in HSL 0.30 −0.01 −1.00 1.00
LSL_SpMean Spatial Difference in Speed Mean in LSL 17.7 4.17 −35.20 79.50
LSL_SpVar Spatial Difference in Speed Variance in LSL 65.67 −11.59 −576.08 475.12
LSL_SpVARoMEAN Spatial Difference in Speed Variance Divided by
Mean Speed in LSL
1.30 −0.18 −17.07 10.52
LSL_SpSlop Spatial Difference in Slop of Speed Regression
(from Speed Profile) for LSL
0.02 0.00 −0.14 0.24
LSL_Volume Spatial Difference in Traffic Volume in LSL 23.86 11.13 −70.00 99.00
LSL_TrP Spatial Difference in Truck Proportion in LSL 0.27 0.06 −0.45 1.00
Categorical VariablesaDescription
Number of
Positive (Pct.)
Number of
Negative (Pct.)
D_ T_SpVARoMEAN Dummy Variable Representing T_SpVARoMEAN (1:
Negative (Reference Level), 0: Positive)
297 (60.9%) 190 (39.1%)
D_ T_SpSlop Dummy Variable Representing T_SpSlop (1:
Negative (Reference Level), 0: Positive)
251 (51.5%) 236 (48.5%)
D_T_SpMean Dummy Variable Representing T_SpMean (1:
Negative (Reference Level), 0: Positive)
240 (49.2%) 247 (50.8%)
D_ T_VolumeDiff Dummy Variable Representing T_VolumeDiff (1:
Negative (Reference Level), 0: Positive)
205 (42.1%) 282 (57.9%)
D_HSL_SpSlop Dummy Variable Representing HSL_SpSlop (1:
Negative (Reference Level), 0: Positive)
244 (50.1%) 243 (49.9%)
D_HSL_SpMean Dummy Variable Representing HSL_SpMean (1:
Negative (Reference Level), 0: Positive)
212 (43.5%) 275 (56.5%)
D_ LSL_SpSlop Dummy Variable Representing LSL_SpSlop (1:
Negative (Reference Level), 0: Positive)
248 (50.9%) 239 (49.1%)
D_LSL_SpMean Dummy Variable Representing LSL_SpMean (1:
Negative (Reference Level), 0: Positive)
217 (44.5%) 270 (55.5%)
aEach of the Continuous variables (C) was measured at Upstream (U) and Downstream (D). Afterward, for all of the observa-
tions, the corresponding values of the continuous variables were calculated by subtracting U from D (i.e. C =D – U). If a
continuous variable was negative, the corresponding dummy variable took the value of one; otherwise, it took the value
of zero.
bHSL: High-Speed Lane/ (i.e. Left Lane).
cLSL: Low-Speed Lane/ (i.e. Right Lane).
TRANSPORTMETRICA A: TRANSPORT SCIENCE 7
According to the literature, traffic flow variables are the core of the RTRA. The com-
monly used real-time traffic-related variables in previous studies can be categorized into
the overall average flow ratio (Lee, Abdel-Aty, and Hsia 2006; Yuan and Abdel-Aty 2018;
Yuan et al. 2019), standard deviation, or other statistical transformation of speed, and occu-
pancy during a specific time-window before crashes and non-crashes (Hossain et al. 2019).
These variables can be measured upstream and downstream of crash locations and directly
applied to crash prediction models. This study followed the literature in characterizing real-
time traffic-related predictors. However, the initial investigation showed that many of the
mentioned variables were not significant in crash prediction on the corridor of study. This
might stem from the fact that I-80 is a rural freeway corridor with low traffic volume mostly
operating at Level of Service (LOS) A and B; hence, a more robust feature engineering was
required.
On this account, it was decided to incorporate only those predictors in the crash
prediction model that could capture traffic oscillations both laterally and longitudinally.
To this aim, predictors were assigned to each of the individual lanes as well as the
whole traffic stream. Note that I-80 has two lanes in each direction; one High-Speed
Lane (HSL/i.e. Left Lane) and one Low-Speed Lane (LSL/i.e. Right Lane), where mean
speed on HSL was found to be 8 mph higher than mean speed on LSL. Besides, con-
cerning crash locations, these predictors were defined to longitudinally measure the spa-
tial difference in traffic characteristics by subtracting the associated value of the traffic-
related variable upstream from downstream (i.e. measured value in downstream minus
measured value in upstream). These subtractions were considered as values of predic-
tors per crash and non-crash cases that might take positive or negative values even
for non-negative traffic-related variables. In total, 29 variables were used to enable the
model to cluster crash and non-crash cases. Table 1describes the predictors used in this
study.
4. Methodology
Interpretable Machine Learning (IML) models are an important goal for many researchers
where various approaches have been attempted. These approaches can be categorized
into two groups; intrinsic interpretability and post-hoc interpretability (Du, Liu, and Hu
2019). The former is based on self-explanatory models that interpret the causal effects
of predictors directly according to their structure. The latter, however, needs to develop
another independent model to explain the existing model (Du, Liu, and Hu 2019;Yang,
Du, and Hu 2019). The post-hoc interpretability can be considered as a refinement process
from the prediction model to the interpretation model, where the interpretation model
provides a global vision in a post-hoc manner for the prediction model (Yang, Du, and
Hu 2019).
This study explained the causal effects of crash contributing factors by following the
post-hoc interpretability approach. On this point, two models were developed as follows:
(i) Crash Prediction Model (CPM) to detect statistically significant crash contributing
factors based on the Generalized Additive Model (GAM).
(ii) Crash Interpretation Model (CIM) to visualize the effect of crash contributing factors on
crash risk employing Random Forest.
8A. KHODA BAKHSHI AND M. M. AHMED
4.1. Crash prediction model (CPM)
The CPM was selected according to a comparison between accuracy and prediction per-
formance of nine models that were developed based on three Feature Selection (FS)
techniques, and three types of Logistic Regression. On this point, each of the models was
conducted within two main steps. First, the important predictors were selected using Mean
Decrease in Accuracy (MDA), Mean Decrease in Impurity (MDI), and Corrected Impurity
Importance (CII), as FS techniques offered by Random Forest (RF). Secondly, for each set of
the important predictors obtained from the first step, the Generalized Linear Model (GLM),
Generalized Nonlinear Model (GNM), and Generalized Additive Model (GAM) were con-
ducted. It was found that the combination of CII and GAM outperformed the other models
by achieving minimum Akaike Information Criterion (AIC) and maximum Area Under the
Curve (AUC). Hence, the combined GAM and CII was used as the CPM. The reasons for this
superiority are as follows:
MDA and MDI are the commonly used approaches offered by RF in FS that have been
widely used in the RTRA domain (Hossain et al. 2019). However, it has been apprised by
statistical literature that both of these two methods are biased and can lead to erroneous
results (White and Liu 1994; Strobl et al. 2008; Louppe et al. 2013; Wright and Ziegler 2015;
Gregorutti, Michel, and Saint-Pierre 2017;Wrightetal.2019). Hence, in addition to MDA and
MDI, CII was also investigated to measure unbiased importance of variables in RF (Wright
and Ziegler 2015; Nembrini, König, and Wright 2018;Wrightetal.2019). Figure 1depicts
the result of CII for FS, where the features corresponding to green points must be kept in
the model, and the features corresponding to red points must be removed from the model.
Furthermore, to deal with nonlinear predictors, GAM can provide a more effective tool
in comparison with GLM and GNM (Jones and Almond 1992; Jones and Wrigley 1995).
GAM follows a data-driven approach that permits data to determine their relationships
and the choice of model specification instead of imposing a pre-determined transforma-
tion curve (Jones and Almond 1992; Jones and Wrigley 1995). The term Additive, in the
definition of GAM, comes from utilizing splines as the smoothing function that, according
to the data-driven approach, allows nonlinear predictors to find their patterns. Equation (1)
demonstrates the developed CPM based on GAM in this study.
pi=eβ0+J
j=1βjxij+K
k=1fk(xik)
1+eβ0+J
j=1βjxij+K
k=1fk(xik)(1)
Where piis the probability of crash occurrence for ith observation, β0and βjare coeffi-
cients to be estimated, xij are the linear predictors,xik are the nonlinear predictors, fkare
the smoothing functions (splines) to be estimated.
In a nutshell, the CPM (i.e. the combined CII and GAM) was developed according to
the following procedure: First, the important features were selected based on the result
of CII. Secondly, highly correlated predictors were removed from the model. To this aim, a
preliminary Logistic Regression model was developed based on the variables selected by
CII. Highly correlated variables were sequentially removed through iterations according to
the Variance Inflation Factors (VIFs) of variables. This iteration was finished when all of the
VIFs were less than ten (Agresti 2018) in the last preliminary Logistic Regression. The third
step was allocated to define nonlinear predictors. Nonlinear predictors were detected by
TRANSPORTMETRICA A: TRANSPORT SCIENCE 9
Figure 1. Feature selection based on Corrected Impurity Importance (CII).
investigating separate scatter plots of fitted logit from the last preliminary Logistic Regres-
sion versus each of the remaining variables. Finally, CPM was conducted according to
equation (1).
4.2. Crash interpretation model (CIM)
In practice, after developing RTRA models, only the statistically significant crash contribut-
ing factors are introduced to ATM to develop appropriate interventions for averting crash-
prone conditions to the normal traffic conditions (Hosseinzadeh et al. 2020). In this regard,
to interpret the causal effects of significant predictors on crash risk, Partial Dependence Plot
(PDP), Individual Conditional Expectation (ICE), centered ICE (cICE) and Accumulated Local
Effect (ALE) offered by Random Forest were investigated. Hence, a Random Forest, as the
Crash Interpretation Model (CIM), was developed based on the statistically significant crash
contributing factors resulted from the CPM.
Almost all black box visualization tools go through the dimensionality reduction con-
cept that can help data visualization by reducing the complexity of sophisticated models
through graphical metaphors (Vellido, Martín-Guerrero, and Lisboa 2012). However, how
different graphical tools deal with dimensionality reduction varies from one technique to
another according to their mathematical structures. The mathematical backgrounds of the
10 A. KHODA BAKHSHI AND M. M. AHMED
four visualization tools (i.e. PDP, ICE, cICE, and ALE) are briefly explained in the following
sections.
4.2.1. Partial dependence plot (PDP)
The relationship between the outcome and one or two explanatory variable(s) can be delin-
eated by PDP (Friedman 2001). PDP estimates the average marginal effect of one or two
predictors on the predicted outcomes, which can be a probability in classification prob-
lems or a determined value in regression (Friedman 2001; Greenwell 2017; Molnar 2019). To
this aim, PDP divides predictors into two sets. The first set involves the feature(s) of interest
for which the PDP should be drawn (Xi), and the second set comprises the other features
included in the developed model (Xo). Therefore, the combination of Xiand Xocomprises
the multi-dimensional space of features based on which the CIM has been developed.
PDP works as a function of one or two variables in Xiand marginalizes the output
over the distribution of the features in Xo. In other words, to find the response variable
at a specific given value of xi, PDP generates synthetic observations where the values of
the feature of interest are permuted by the xiacross all actual observations. Afterward,
the response variable at xiis calculated by averaging the obtained outcomes over all the
synthetic observations. Equation (2) parametrizes the overall approaches used in PDP.
ˆ
fXi(xi)=1
n
n
j=1
ˆ
f(xi,xj
o)
Where ˆ
fXiis the partial function for the predictor of interest that estimates the average
marginal effect at a given value of xi,nis the number of observation, xj
ois the actual value
of those predictors that are not under investigation for jth observation, and ˆ
f(xi,xj
o)is
the model output for the jth observation at xi. PDPs can draw the relationship between
outcomes and the combined effect of two features in a two-dimensional (2D) space by
centralizing the values of the response variable. Having intuitive results, being easy to
interpret, and simple implementation are the main advantages of PDPs in either one or
two-dimensional space.
4.2.2. Individual conditional expectation (ICE)
PDP uses the averaged marginal effect of a predictor on response variable and plots one
curve through the range of the predictor of interest. PDP, however, is unable to bring
sufficient insight into the heterogeneity across observations that comes from interactions
between predictors. To address this issue, Goldstein et al. (2015) proposed the concept of
Individual Conditional Expectation (ICE) to dig into heterogeneity across observations.
As it is apparent from the definition, for a specific predictor of interest, ICE draws one
curve per each of the individual synthetic instances while holding the value of the other pre-
dictors unchanged. The estimating function (ˆ
fXi)would depend on values of the predictor
of interest, and by permuting this value through the range of the predictor, one curve per
one instance is obtained. ICE repeats this procedure for all the observations. The outcome
is a combination of curves through the domain of the predictor of interest that can provide
sufficient insight into the interaction of predictors and heterogeneity across observations.
The average of all the drawn curves plotted by ICE is exactly the PDP for the predictor of
interest.
TRANSPORTMETRICA A: TRANSPORT SCIENCE 11
4.2.3. Centered ICE (cICE)
Since the curves in ICE plots start from different prediction points, it is difficult to assess
whether ICE curves differ from one instance to another. ICE curves can be centralized at a
certain prediction point that demonstrates only the difference in predicted outcomes to
that certain point. The obtained plot is called a centered ICE (cICE) plot (Molnar 2019). This
certain point is usually chosen at the lower bound of the range of the predictor of interest.
Equation (3) illustrates this procedure.
ˆ
f(i)
cent =ˆ
f(i)−1ˆ
f(xa−x(i))(3)
Where 1is a vector of 1’s and ˆ
f(i)
cent is the centralized estimating function for ith observation
(x(i))centered at xa(Goldstein et al. 2015). Note that since cICE are centralized, the outcome
probability can take negative values. However, this does not affect visualizing the overall
trend (Molnar 2019). ICE and c-ICE can only present a one-dimensional relationship com-
pared to PDPs that can show the combined effect of two features on outcomes based on
two-dimensional plots.
4.2.4. Accumulated local effects (ALE)
The procedure used in PDP and ICE assumes that the other predictors have the same
marginal distributions for any level of the predictor of interest that can be a false assump-
tion for highly correlated predictors (Apley 2016; Molnar 2019). Apley and Zhu introduced
the concept of Accumulated Local Effects (ALE) to tackle this issue. They proposed a more
sophisticated technique by averaging differences in predictions for conditional distribution
instead of averaging predictions for marginal distribution (Apley 2016).
ALE divides the range of the predictor of interest (Xj)intoKequal intervals. This division
blocks the effect of the correlated variable(s) and the generation of unrealistic synthetic
observations. Besides, calculating the average over differences in predictions will lead to
the pure main effect of Xjon outcomes (Apley 2016). Equation (4) shows the uncentered
ALE for at Xj=x(the curious readers are referred to (Apley 2016)).
˜
fj(x)=
kj(x)
k=1
1
nj(k)
i:xi,J∈Nj(k)
[f(zk,j,xi,J)−f(zk−1,j,xi,J)](4)
Where jindices the predictor of interest, kj(x)is the kth interval of Xjwithin which xlies
(kj(x)∈{1, 2, ...,K}), nj(k)is the number of observations in kj(x),Nj(k)is a subset of obser-
vations that occur in kj(x),zk,jand zk−1,j, respectively, are the upper and lower bound of Xj
in kj(x),xi,Jis the value of other predictors (j= J)forith observation in kj(x),andf(.)is the
fitted model.
The right side of equation (4) presents that ALE calculates the difference in prediction
for every single observation in the kth interval. The difference in prediction is the effect
that the variable of interest has for individual observations. ALE measures these effects by
subtracting the outcomes when the values of Xjfor observations in the kth interval are per-
muted by the upper and lower bound of Xjin that interval. Adding and averaging the effects
of observations make them localized in the kth interval. The left sum means that the local
effects are accumulated across all intervals. For instance, ALE for the value of a variable in
the fifth interval is the sum of the local effects from the first to fifth intervals.
12 A. KHODA BAKHSHI AND M. M. AHMED
To ease the interpretation, ALE is centered to obtain zero as the mean effect. Therefore,
for instance, if ALE for a specific level of Xjis 4, it means that the effect of Xjon the response
variable at that specific level increases by four compared to the average prediction. ALE
can plot the effect of two variables on outcomes in 2D space. However, it needs a different
interpretation compared to 2D-PDP that will be discussed later on. Furthermore, ICE plots
cannot be drawn for ALE, and heterogeneity among observations might be obscured based
on ALE results.
5. Results and discussions
Table 2presents the outcomes of the developed CPM based on GAM. Note that GAM can
be conducted on initially introduced linear and nonlinear predictors, and presents non-
parametric results for the nonlinear predictors in terms of the Effective Degree of Freedom
(EDF). GAM achieved EDF =1 for two non-linear predictors (i.e. HSL_SpMean and HSL_TrP).
The EDF =1 shows that the effects of the corresponding predictors on the response vari-
able have been essentially reduced to simple linear effects, though they had been initially
defined as nonlinear predictors (39). Notably, using smoothing splines for the other non-
linear predictor (i.e. T_TrP) changes the fitted logits, which compels GAM to find a linear
correlation between the predictors with EDF =1 and the response variable. In this regard,
if HSL_SpMean and HSL_TrP had been introduced as linear variables, the results, in terms of
coefficients corresponding to other predictors and the overall performance of the model,
would have been remained the same. Besides, introducing HSL_SpMean and HSL_TrP as
linear variables to the model would result in the equal corresponding p-values, as men-
tioned in Table 2, as well as 0.048 and −1.796 as their corresponding estimated coefficients,
respectively.
Tab le 2 . Results of the crash prediction model based on the generalized additive model.
STEP 1 Number of Selected Variables Based on CII: 15
STEP 2 Removed Highly Correlated Variables:
T_SpVar, LSL_SpVar
STEP 3 Nonlinear Predictors:
HSL_SpMean, HSL_TrP, T_TrP
STEP 4 GAM (AICa: 579.64, AUCb: 71.74%)
Variables Est. Zvalue χ2Pvalue Sgfnt.
(Intercept) −0.942 −3.149 – 0.002 *
T_SpMean −0.108 −3.495 – <0.000 *
HSL_SpSlop −4.314 −1.023 – 0.306
T_SpSlop 9.424 0.713 – 0.476
D_T_SpMean 0.034 0.108 – 0.914
T_TrPDiff −0.887 −1.302 – 0.193
D_HSL_SpSlop 0.308 1.197 – 0.231
LSL_SpVARoMEAN 0.872 2.758 – 0.006 *
T_SpVarDiff 0.000 0.015 – 0.988
T_SpVARoMEAN −0.861 −2.798 – 0.005 *
LSL_SpMean −0.020 −2.007 – 0.045 *
HSL_SpMean EDF =1.000 – 11.403 0.001 *
HSL_TrP EDF =1.000 – 6.235 0.013 *
T_TrP EDF =5.553 – 22.114 0.002 *
aAIC: Akaike Information Criterion.
bAUC: Area Under the Curve.
*Represents statistically significant predictors under 95% Confidence Interval.
TRANSPORTMETRICA A: TRANSPORT SCIENCE 13
Figure 2. Comparison of the ROC curves for Generalized Additive Model, Random Forest, and Neural
Network.
Before interpreting the results, it is noteworthy to explain the main contribution of this
study. As mentioned earlier, this study followed the post-hoc interpretability approach to
ascertain how real-time traffic-related variables contribute to the crash risk. To this aim,
the Crash Prediction Model (CPM) was conducted based on GAM; and Crash Interpretation
Model (CIM) was developed using Random Forest (RF). Thus, RF was not directly used in
developing the CPM; however, its graphical tools were used to interpret the results of the
developed statistical CPM (i.e. GAM). The reason for this consideration was the fact that GAM
outperformed the black-box techniques according to the dataset used in this study. In this
regard, using the same important predictors obtained from CII and the same dataset, the
performance of two other widely used approaches in the RTRA domain (i.e. RF and Neural
Network (NN)) (Hossain et al. 2019) were compared to the developed GAM. Figure 2depicts
a comparison between the performance of RF, NN, and GAM based on the Receiver Operat-
ing Characteristic (ROC) curves and Area Under the Curves (AUCs). RF and NN respectively
achieved 65.4% and 59.7% AUC, which were less than the 71.7% AUC obtained by GAM.
Hence, GAM and RF were selected as the CPM and CIM, respectively.
5.1. Interpreting parametric statistics
The main objective of this paper was to utilize black box visualization tools to interpret the
results of non-parametric statistics. However, the results of the current study illustrated that
black box visualization tools could also benefit ATM even to interpret parametric statistics
more precisely. To cite an example, according to Table 2, consider 0.872 as the corre-
sponding estimated coefficient to LSL_SpVARoMEAN, which shows the spatial difference
in speed variance at a given level of mean speed on the low-speed lane. The traditional
interpretation of Logistic Regression leads to the conclusion that the odds of the crash
14 A. KHODA BAKHSHI AND M. M. AHMED
Figure 3. Graphical representation of the causal effect of LSL_SpVARoMEAN (spatial difference in speed
variance divided by the mean speed in low-speed lane).
probability increases by e0.872 =2.39 per one-unit increase in LSL_SpVARoMEAN while
controlling for other variables. This conclusion draws a monotonic positive relationship
between LSL_SpVARoMEAN throughout its domain and the probability of crashes.
However, the black box visualized plots, presented in Figure 3,includingALE,PDP,and
specially c-ICE, indicate another scenario that is not consistent with the mentioned conclu-
sion. These plots prove that this relationship is not monotonic; instead, it is more similar to a
stepwise correlation. In fact, once the value of LSL_SpVARoMEAN exceeds −3, the crash risk
dramatically increases by more than 20%. This conclusion is more practical for ATM to apply
proper interventions on-time according to the critical domains of the predictor. Figure 3(A)
reveals another untapped potential of cICE for ATM. Indeed, c-ICEs can show fluctuations
of the outcomes according to the variation of crash contributing factors that explain the
sensitivity of crash risk through the domain of these factors. Thus, ATM can be fed from
this result in terms of choosing the appropriate time to apply specific interventions. For
instance, cICE shows in a particular range of LSL_SpVARoMEAN (e.g. from its lower bound
to −3andfrom+3 to its upper bound) the difference of crash probabilities across observa-
tions does not change considerably, and the ATM should apply proportional interventions,
more specifically, when LSL_SpVARoMEAN takes an amount between −3to+3.
A similar conclusion and even more beneficial one can be drawn for T_SpMean, which is
the total mean speed at downstream minus upstream of crash and non-crash cases. Con-
sidering the corresponding estimated coefficient (−0.108) from GAM, it is inferable that
when T_SpMean increases, the crash risk will decrease. This conclusion, for example, means
that the crash risk for T_SpMean =20 mph is less than the crash risk for T_SpMean =0
mph. However, Figure 4presents another inference that is different from the mentioned
conclusion. Here, although a general negative causal effect can be seen, the minimum
crash risk occurs when T_SpMean =0 mph. This conclusion is more reasonable because
when T_SpMean =0 mph, the mean speed at downstream is equal to upstream; hence,
Figure 4. Graphical representation of the causal effect of T_SpMean (spatial difference in mean speed
of total traffic volume) on crash risk.
TRANSPORTMETRICA A: TRANSPORT SCIENCE 15
the speed is harmonized, leading to the maximum level of safety and minimum crash
probability.
Again, although the overall trend seems to have a negative correlation, the plots are
presenting a stepwise relationship. The critical domain falls into −15 to 0 mph, where the
probability of crashes considerably decreases by almost 25%. Last but not least, within the
range of 0–17, the predictor behavior regarding crash occurrences substantially violates
the inference from traditional statistical interpretation where crash risk increases by 10% as
the T_SpMean increases.
5.2. Interpreting non-parametric statistics
According to Table 2, T_TrP, which is the difference in truck proportion at downstream
minus upstream, is the only non-parametric statistically significant predictor resulted from
GAM. The corresponding EDF equal to 5.55 reveals the widely discussed issue associated
with the non-parametric statistics that the results are incomprehensible or at least very hard
to interpret. However, according to all four Black box graphical plots, the complex behavior
of the predictor regarding crash risk can be seen in Figure 5.
The ICE plot in Figure 5(A) depicts that the majority of the observations follow the
same patterns that reveal more homogeneity across observations than heterogeneity, mak-
ing PDP a reliable method to interpret the effect of the predictor on crash probabilities.
Notably, having more homogenous observations is very critical in utilizing PDP to interpret
the results (Apley 2016). PDP can be useful if there is no interaction between the predic-
tors, leading to homogeneity across observations (Molnar 2019). As mentioned before,
PDP, by marginalizing the effects of other features, shows the impact of the predictor of
interest through its range on the outcome by averaging the obtained outcomes for all
Figure 5. Graphical representation of the causal effect of T_TrP (spatial difference of truck proportion in
total traffic stream) on crash risk.
16 A. KHODA BAKHSHI AND M. M. AHMED
synthetic observations resulting in one curve. This curve does not show the heterogeneity
or homogeneity across observations that can lead to misinterpretation of PDPs.
To cite an example, Figure 6(A) shows a PDP for a hypothetical predictor. Since the dia-
gram includes a proximate horizontal line, a superficial evaluation can lead to a conclusion
that the predictor does not have any effect on the response. However, delving into the het-
erogeneity causes some contradictory inferences. Figure 6(B) presents a situation in which,
for all observations, the impact of the predictor is constant without any considerable fluctu-
ations through its range that proves this feature does not affect the outcome. On the other
hand, Figure 6(C) renders an entirely different scenario where, for some of the instances,
the predictor positively influences the outcomes, while for others, the predictors inversely
affect the response.
Additionally, Figure 6(D) depicts a completely heterogeneous observation where there
is no specific pattern across observations. In these cases, the corresponding effects for each
of the observations at a particular level of the predictor cancel each other out in turn. This
phenomenon, finally, will lead to a proximate horizontal line presented by PDP that can be
misleading, notwithstanding the nature of the observations in the training dataset. Hence,
to make a reliable interpretation, PDPs must be accompanied by ICE because crash con-
tributing factors usually have interactions with each other that create heterogeneity across
observations (Anastasopoulos et al. 2012).
Turning back to Figure 5(C,D), the ALE plot and PDP reveal the nonlinear effect of T_TrP
on the probability of crashes. Both plots recommend ATM to hold T_TrP around −0.1 that
is corresponding to the minimum crash risk through the range of the predictor. Although
ALE and PDP present a similar trend, there are some small differences between these two
plots. For instance, according to PDP, when T_TrP increases from 0.1 to 0.2, the probability
of crashes roughly decreases by 6%, whereas, based on the ALE plot, it is reduced by 15%.
These types of differences are due to the small level of multi-collinearity among the predic-
tors used in GAM (Molnar 2019). Comparing PDPs to ALEs for all the predictors (see Figure 7),
explains that these differences between ALEs and PDPs are not considerable. Hence, the
overall interpretations of these two methods can be similar. However, the next section clar-
ifies that this inference is only limited to this study and cannot be overgeneralized to other
studies.
5.3. Reliability of ALE versus PDP in highly correlated space of predictors
As mentioned in section 5.2, there are some differences between ALE and PDP that are due
to having correlated variables among the multi-dimensional space of predictors (Goldstein
Figure 6. Misleading interpretation of PDP.
TRANSPORTMETRICA A: TRANSPORT SCIENCE 17
Figure 7. Causality relationship between crash contributing factors and crash probability according to
1D-Plots.
et al. 2015; Apley 2016; Molnar 2019). In this regard, the fundamental assumption of PDP
and ICE is that the predictor of interest is not highly correlated to other predictors that could
be a wrong assumption. If this assumption is violated, the PDP graphical interpretation
can be misleading and erroneous (Apley 2016; Molnar 2019), especially for RTRA studies. In
RTRA, due to the fundamental traffic flow theory, features including those predictors relat-
ing to speed, volume, and density can be highly correlated. The following oversimplified
example would ease the explanation of this problem in the RTRA.
Suppose, in an RTRA study, the causality relationship between mean speed and the prob-
ability of crashes is to be defined where the range of mean speed includes 20–100 mph. To
visualize how mean speed affect the crash risks, PDP generates synthetic observations by
permuting values of the predictor of interest among its range for each of the individual
observations when other variables in the model are kept unchanged. Based on nsynthetic
observations, outcome values are calculated and averaged while marginalizing the effect
of the other features regardless of their values. Xois the space of the other predictors, such
as traffic volume, that are not under investigation. Let say the traffic volume for a spe-
cific real observation is 10 vehicles within 15-minutes. During this time slice, the LOS is A,
and the mean speed, highly likely, is at the Free Flow Speed level. The real observations
reflect this situation where the mean of the distribution of mean speed is 90 mph at a given
low-volume traffic condition. However, since PDP produces artificial instances without con-
sidering the correlation between predictors, many of these generated synthetic instances
may unrealistically get a speed of 50 mph or less, which rarely can happen in practice due to
the low traffic volume. In this situation, interpreting PDP is highly biased. The used mathe-
matical theory in ALE tackles this issue by preventing the generation of unrealistic synthetic
observations that makes ALE a more reliable technique in dealing with highly correlated
multidimensional space of predictors (Apley 2016; Molnar 2019). According to Figure 7,the
differences between PDP and ALE, in many cases, were small because of two reasons. First,
the most important features were selected based on CII that, compared to MDA, does not
overestimate the importance of highly-correlated features (White and Liu 1994; Strobl et al.
18 A. KHODA BAKHSHI AND M. M. AHMED
2008; Louppe et al. 2013; Wright and Ziegler 2015; Gregorutti, Michel, and Saint-Pierre 2017;
Wright et al. 2019). Secondly, before conducting GAM, highly correlated variables were
removed from the model resulting in VIF <10.
In a nutshell, when features are highly correlated, the results obtained from PDP can
be misleading (Molnar 2019). This issue is a serious one that must be considered in RTRA
studies. ALE is an alternative to PDP in dealing with highly correlated spaces to clarify the
causality effect of predictors on the response variable. It has been proven by the statistical
literature, when the number of highly correlated variables as well as Pearson Correlation
coefficients among variables increases, PDP will show misleading results (Apley 2016; Mol-
nar 2019). The results of this study depict that utilizing statistical models for RTRA and
following their assumptions in removing highly correlated variables can mitigate this situa-
tion and increase the reliability of PDPs. However, in the case of conducting black box crash
prediction models, because of their assumptions where having multi-collinearity usually is
not a concern, the results from PDPs can be doubtful.
5.4. Performance of ALE versus PDP in two-dimensional (2D) plots
Both PDP and ALE can present the effect of two predictors on the response variable using
two-dimensional (2D) plots. However, there is a serious distinction between these two plots
that necessitates taking more heed to their interpretations. Figure 8presents the 2D plots
for HSL_SpMean and LSL_SpMean based on PDP and ALE techniques.
Intuitively, the two plots are different, which is due to the distinct mathematical back-
ground used in PDP versus ALE in drawing 2D plots. 2D-PDP presents the centered com-
bined effect of two predictors on the outcomes. The combined effect includes the main
effects of both predictors plus the second-order effect of the first and second predictors
together (i.e. interaction effect). However, ALE presents only the interaction effect of the
predictors on the response variable. If two predictors do not interact, the 2D-ALE plot shows
nothing (Molnar 2019).
Interpreting the result obtained from 2D-PDP is straightforward since it includes the total
effects. For instance, Figure 8(A) suggests that by holding mean speed on LSL at down-
stream more than upstream, and maintaining a similar mean speed at downstream and
upstream on HSL, the probability of the crash occurrence can be reduced (the yellow areas
are corresponding to the less likelihood of crashes). This reduction can happen even by 70%
Figure 8. 2D-PDP and 2D-ALE plots in presenting the effect of HSL_SpMean and LSL_SpMean on crash
risk.
TRANSPORTMETRICA A: TRANSPORT SCIENCE 19
Figure 9. The combined effect of crash contributing factors on crash risk based on 2D-PDP.
compared to the worst-case scenario of the combination of HSL_SpMean and LSL_SpMean.
On the other hand, illustrating the results obtained from 2D-ALE is complicated since it only
reflects the effect of interaction between two predictors without considering their main
effects at a certain level. Therefore, 2D-ALE only shows the situations when the second-
order interaction of two variables can lead to an unsafe condition. In RTRA, since both main
effects and the interaction effect of predictors coincide, separating these two terms does
not make sense. Hence, in terms of interpreting the impact of two crash contributing fac-
tors simultaneously, 2D-PDP provides more beneficial insight than 2D-ALE. However, to
have a reliable 2D-PDP, highly correlated predictors should be removed from the CPM.
Figure 9presents all possible combined effects of two crash contributing factors on crash
risk according to 2D-PDPs in this study.
5.5. Reliability of 2D-ALE plot
ALE, compared to PDP and ICE, provides a thoughtful approach to minimize the effect
of highly correlated variables by dividing the domain of the predictor of interest into
equal intervals. Usually, in different software packages, the default value for the number
of intervals (K) is 40 that comes from having ten intervals per quartile. However, there is
no promising value for Ksince the most appropriate Kdepends on the distribution of the
predictor of interest (Goldstein et al. 2015; Apley 2016). In Figure 10, HSL_SpMean and
LSL_SpMean are nominated to illustrate the effect of the number of intervals on in 1D-ALE
and 2D- ALE.
When Kchanges, ALE plots can become unstable. This issue does not make a remarkable
problem in 1D-ALE plots since the overall effect of a predictor on the response variable can
be similarly observed through a range of K. However, for 2D-ALE plots, different levels of K
20 A. KHODA BAKHSHI AND M. M. AHMED
Figure 10. ALE dependency on numbers of intervals (K) in 1D and 2D plots.
makes noticeable differences leading to unreliable interpretation. The reason is that 2D-ALE
plots only show the interaction effect of predictors that is highly sensitive to the number of
intervals for each of the two involved predictors (Molnar 2019). Consequently, 2D-ALE plots
would vary remarkably by changing the number of intervals.
5.6. Selecting crash prediction models based on ICE
ICE can be extremely lucrative to RTRA not only in interpreting results but also in terms of
choosing the type of crash prediction models by revealing heterogeneity and homogene-
ity across observations. Indeed, as it has been mentioned by the literature (Washington,
Karlaftis, and Mannering 2010), Figure 6presents a very substantial remark for RTRA when
heterogeneity exceeds homogeneity across observations. One of the useful applications
of ICE plots is to suggest which type of statistical modeling, in terms of fixed-parameter
or random-parameters, can be most suitable to be conducted on a specific dataset. If ICE
reveals heterogeneity, the random-parameter models can outperform the fixed-parameter
models. Anastasopoulos et al. and Anastasopoulos and Mannering have shown that the
random-parameter approach, as opposed to fixed-parameter models, can capture unob-
served heterogeneity, providing a better understanding of the accident mechanism (Anas-
tasopoulos and Mannering 2009; Anastasopoulos et al. 2012). ICE can dig and into the
heterogeneity across observations; hence, more than being an interpretation tool, ICE can
be an indicator for model selection.
TRANSPORTMETRICA A: TRANSPORT SCIENCE 21
5.7. Future implication
Safety data visualization is one of the critical aspects in the domain of RTRA that can
improve the comprehension of crash mechanisms. With this regard, graphical tools pro-
vided by Random Forest are proven effective in visualizing the causal effects of crash
contributing factors on crash risk. Although some of these techniques have been used in
previous studies, they have many advantages, disadvantages, and unanswered questions
that must be considered by researchers. The current study illustrates different diagnostic
aspects of these widely-used techniques, more specifically in the RTRA domain. Discus-
sions and recommendations presented in this paper can prevent future studies from erro-
neous conclusions, where random forest graphical tools are used to interpret the obtained
results.
In addition to the capability of Random Forest visualization tools in interpreting non-
parametric statistics, it has been illustrated how these tools can be beneficial to ATM in
terms of interpreting parametric statistics. In this regard, the visualization tools can disclose
the causal effects of parametric statistics on crash risks more accurately. The interpretation
of parametric statistics is mainly based on estimated coefficients associated with predic-
tors. Analyzing a unique estimated coefficient always draws a uniform positive or negative
marginal effect of a predictor on the response variable. However, the actual causal effect
might have fluctuations through the range of predictors on outcomes, even though the
overall trend can be positive or negative.
For instance, it is expected to reach the minimum crash risk when the mean speed at
downstream is equal to upstream because, in such a situation, the speed is completely har-
monized across traffic stream (Ahmed and Abdel-Aty 2011). According to the parametric
statistics, this expectation was not met because the estimated coefficients showed when
the mean speed at downstream is more than upstream, the crash risk would decrease.
This inference is based on assuming a uniform causal effect of the associated predictor (i.e.
T_SpMean, which is the spatial variation of mean speed from upstream to downstream) on
the crash probability without any fluctuation. Hence, an intervention to reduce the crash
risk would need to increase the mean speed at downstream, compared to upstream, or to
reduce the mean speed at upstream compared to downstream. Although this intervention
might reduce the risk of rear-end crashes, it potentially can increase the risk of lane-change
related crashes. The reason is that when the mean speeds downstream and upstream are
different, traffic flow characteristics will vary from upstream to downstream. This variation
can result in a between-lane difference in speed that would encourage drivers for lane-
changing maneuvers, resulting in a higher risk of lane-change related crashes (Pande and
Abdel-Aty 2006).
On the other hand, Random Forest visualization tools could unveil the effect of
T_SpMean on the crash risk in more detail by drawing the fluctuations throughout its
causality effect. All the visualization tools (i.e. PDP, ICE, price, and ALE) revealed that the
minimum crash risk occurred when the mean speeds at downstream and upstream are
equal, which is consistent with the earlier-mentioned expectation. In fact, when the mean
speeds at upstream and downstream are equal, the risk of rear-end and lane-change related
crashes would be minimum. Therefore, the realistic and appropriate intervention is to hold
mean speeds at downstream and upstream equal to each other. On this point, visualiza-
tion tools, discussed in this study, can shed more light on the probable fluctuations in
22 A. KHODA BAKHSHI AND M. M. AHMED
the causality effects as well as drawing the overall relationship between predictors and
outcomes.
Furthermore, the study has revealed how the results of highly accurate models can be
linked with highly interpretable models in the notion of alleviating the trade-off between
accuracy and interpretability in crash prediction models by following post-hoc inter-
pretability approach (Du, Liu, and Hu 2019). This relatively new methodology can address
some of the concerns in the RTRA domain, where crash prediction models are to provide
both accurate and interpretable results for ATM, simultaneously.
6. Conclusion and recommendation
In any field of sciences, usually, there is a schism between scientific researches and indus-
trial applications. This shortcoming exists among the safety analyses literature where there
is a gap between the results of Real-Time Risk Assessment (RTRA) and their practical impli-
cations for Active Traffic Management (ATM). This paper tried to address some of the safety
data visualization challenges by drawing a determined interpretation framework using
visualization tools provided by black-box techniques such as Random Forest.
According to the matched-case control design approach, a crash prediction model was
calibrated for 402-miles of Interstate 80 (I-80) in Wyoming. The prediction model was
developed within two parts. First, according to Corrected Impurity Importance (CII), the
important predictors were defined. Secondly, to deal with nonlinear predictors, Generalized
Additive Model (GAM), as a data-driven non-parametric statistical approach, was developed
based on the results of the first part. The most significant parametric and non-parametric
crash contributing factors were identified. A Random Forest (RF) model was conducted to
interpret the causal effect of these significant predictors on crash risk. Four graphical tech-
niques, including Partial Dependence Plot (PDP), Individual Conditional Expectations (ICE),
centered ICE (cICE), and Accumulated Local Effect (ALE), were developed under the RF and
investigated for every single significant predictor.
The study achieved remarkable notes for interpreting RTRA models regarding utilizing
these graphical tools that are mentioned as the following recommendations:
A. Considering the fundamental traffic flow diagrams, many of the introduced variables in
clustering crash versus non-crash cases can be highly correlated. In this situation, ALE
provides a more reliable interpretation than PDP and ICE. Unless highly correlated vari-
ables are eliminated from the model, the PDP and ICE can present misleading causality
relationships between predictor(s) and response variable.
B. In the case of reliable PDP, utilizing ICE is highly recommended because ICE plots
reveal heterogeneity across observations that are ignored by PDP and might result in
erroneous conclusions.
C. ICE presents homogeneity and heterogeneity across observations and provides initial
insight into choosing an appropriate crash prediction model in terms of random-
parameter as opposed to fixed-parameter models.
D. cICE can show the sensitivity of crash probability through the range of predictors that
is beneficial to ATM in timely applying appropriate interventions.
E. PDP and ALE can present the effect of two predictors on the probability of crashes using
two-dimensional (2D) plots. However, there is a substantial difference between these
TRANSPORTMETRICA A: TRANSPORT SCIENCE 23
two methods that must be noticed by ATM. 2D-PDP plots depict the combined effect of
two variables that includes their main and interaction effects. Whereas, the 2D-ALE only
narrates the interaction effect of two variables without the main effects. In the RTRA
domain, since the main effects and interaction effect of crash contributing factors coin-
cide, PDP is more beneficial to ATM than ALE. In this regard, because PDP suffers from
highly correlated features, if 2D-PDP is under investigation, highly correlated features
must have been removed from the model in advance.
F. ALE has superior performance over PDP and ICE in visualizing the causal effect of the
predictor of interest on crash probability in highly-correlated multi-dimensional space
of predictors. ALE divides the range of the predictor into equal intervals and averages
the differences in predictions for conditional distribution. However, there is no deter-
mined number of intervals (K) since it depends on the distribution of the predictor.
For normally distributed variables, considering Kas a 4-multiple is a good idea that
can equally divide every quartile of the predictor. On this point, 1D-ALE plots are not
highly sensitive to K; however, 2D-ALE plots are remarkably sensitive to K. Therefore, at
least for the dataset used in this study, the 2D-ALE plots did not lead to reliable results
because they presented different types of the interaction effect of two predictors of
interest on crash risk according to different levels of values taken by K.
G. All in all, when multi-dimensional space of variables is not highly correlated, this study
suggests utilizing PDP that is accompanied by ICE or cICE for 1D and 2D interpretation.
Otherwise, in the case of highly-correlated space of predictors, only ALE can demon-
strate solid causality relationships between predictors and crash probability for 1D
interpretation.
Achieving even more than 90% accuracy in clustering crash versus non-crash cases with
a high level of sensitivity and specificity among the safety literature implies that RTRA has
almost obtained its goal. After more than two decades of RTRA studies, the authors believe
this is the time to face the safety data visualization challenges and move from anecdotal
knowledge toward more practical implications. The authors do not claim this study is a
fault-free one; however, it provided a fundamental basis to achieve this ultimate goal that
can fructify the results of other researches to promote ATM capabilities in crash prevention.
Author contributions
The authors confirm contribution to the article as follows: study conceptualization,
experimental design, data collection, statistical analysis, data visualization, and results’
interpretation: A. Khoda Bakhshi; draft article preparation: A. Khoda Bakhshi, M. M. Ahmed.
All authors reviewed the results and approved the final version of the manuscript.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Funding
This research is sponsored by the U.S. Department of Transportation Connected Vehicle Pilot Deploy-
ment Program (grant number DTFH6116RA00007), and the Wyoming Department of Transportation
(grant number RS04218).
24 A. KHODA BAKHSHI AND M. M. AHMED
ORCID
Arash Khoda Bakhshi http://orcid.org/0000-0003-2462-2392
Mohamed M. Ahmed http://orcid.org/0000-0002-1921-0724
References
Abdel-Aty, Mohamed A., and Rajashekar Pemmanaboina. 2006. “Calibrating a Real-time Traffic Crash-
prediction Model Using Archived Weather and ITS Traffic Data.” IEEE Transactions on Intelligent
Transportation Systems 7 (2): 167–174. doi:10.1109/TITS.2006.874710.
Agresti, Alan. 2018.An Introduction to Categorical Data Analysis. Gainesville, Florida: Wiley.
Ahmed, Mohamed M., and Mohamed A. Abdel-Aty. 2011. “The Viability of Using Automatic Vehicle
Identification Data for Real-time Crash Prediction.” IEEE Transactions on Intelligent Transportation
Systems 13 (2): 459–468. doi:10.1109/TITS.2011.2171052.
Ahmed, Mohamed M., Mohamed Abdel-Aty, and Rongjie Yu. 2012. “Assessment of Interaction of Crash
Occurrence, Mountainous Freeway Geometry, Real-Time Weather, and Traffic Data.” Transportation
Research Record: Journal of the Transportation Research Board 2280 (1): 51– 59. doi:10.3141/2280-06.
Anastasopoulos, Panagiotis Ch, and Fred L. Mannering. 2009. “A Note on Modeling Vehicle Acci-
dent Frequencies with Random-parameters Count Models.” Accident Analysis & Prevention 41 (1):
153–159. doi:10.1016/j.aap.2008.10.005.
Anastasopoulos, Panagiotis Ch, Fred L. Mannering, Venky N. Shankar, and John E. Haddock. 2012.“A
Study of Factors Affecting Highway Accident Rates Using the Random-Parameters Tobit Model.”
Accident Analysis & Prevention 45: 628–633. doi:10.1016/j.aap.2011.09.015.
Apley, Daniel W. 2016. “Visualizing the Effects of Predictor Variables in Black Box Supervised Learning
Models.” ArXiv Preprint ArXiv:1612.08468.https://arxiv.org/abs/1612.08468.
Azimi, Ghazaleh, Alireza Rahimi, Hamidreza Asgari, and Xia Jin. 2020. “Severity Analysis for Large Truck
Rollover Crashes Using a Random Parameter Ordered Logit Model.” Accident Analysis & Prevention
135. doi:10.1016/j.aap.2019.105355.
Azizi, Leila, and Abdolreza Sheikholeslami. 2013. “Safety Effect of U-Turn Conversions in Tehran:
Empirical Bayes Observational before-and-after Study and Crash Prediction Models.” Journal of
Transportation Engineering 139 (1): 101–108. doi:10.1061/(ASCE)TE.1943-5436.0000469.
Bargegol, Iraj, Vahid Najafi Moghaddam Gilani, Meisam Ghasedi, and Mahyar Ghorbanzadeh. 2016.
“Delay Modeling of Un-Signalized Roundabouts Using Neural Network and Regression.” Computa-
tional Research Progress in Applied Science & Engineering 2: 28–34.
Du, Mengnan, Ninghao Liu, and Xia Hu. 2019. “Techniques for Interpretable Machine Learning.”
Communications of the ACM 63 (1): 68–77. doi:10.1145/3359786.
Eftekharzadeh, S. F., and A. Khodabakhshi. 2014. “Safety Evaluation of Highway Geometric Design Cri-
teria in Horizontal Curves at Downgrades.” International Journal of Civil Engineering 12 (3): 326–332.
http://ijce.iust.ac.ir/article-1-833-en.html.
Friedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals
of Statistics. JSTOR, 1189–1232. https://www.jstor.org/stable/2699986%0D.
Gaweesh, Sherif M., Mohamed M. Ahmed, and Annalisa V. Piccorelli. 2019. “Developing Crash Pre-
diction Models Using Parametric and Nonparametric Approaches for Rural Mountainous Free-
ways: A Case Study on Wyoming Interstate 80.” Accident Analysis & Prevention 123: 176–189.
doi:10.1016/j.aap.2018.10.011.
Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box:
Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Compu-
tational and Graphical Statistics 24 (1): 44–65. doi:10.1080/10618600.2014.907095.
Greenwell, Brandon M. 2017. “Pdp: An R Package for Constructing Partial Dependence Plots.” The R
Journal 9 (1): 421–436. doi:10.32614/RJ-2017-016.
Gregorutti, Baptiste, Bertrand Michel, and Philippe Saint-Pierre. 2017. “Correlation and Variable Impor-
tance in Random Forests.” Statistics and Computing 27 (3): 659– 678. doi:10.1007/s11222-016-
9646-1.
Hastie, Trevor J. 2017. “Generalized Additive Models.” In Statistical Models in S, 249–307. Routledge.
TRANSPORTMETRICA A: TRANSPORT SCIENCE 25
Hossain, Moinul, Mohamed Abdel-Aty, Mohammed A. Quddus, Yasunori Muromachi, and Soumik
Nafis Sadeek. 2019. “Real-Time Crash Prediction Models: State-of-the-Art, Design Pathways
and Ubiquitous Requirements.” Accident Analysis & Prevention 124: 66– 84. doi:10.1016/j.aap.
2018.12.022.
Hosseinzadeh, Aryan, Abolfazl Karimpour, Robert Kluger, and Raymond Orthober. 2020. “A Frame-
work to Link Crashes to Emergency Medical Service Runs and Trauma Admissions: For Improved
Highway Safety Monitoring and Crash Outcome Assessment.” In Transportation Research Board.
99th Annual Meeting Transportation Research Board.
Jones, Kelvyn, and Simon Almond. 1992. “Moving out of the Linear Rut: The Possibilities of Gen-
eralized Additive Models.” Transactions of the Institute of British Geographers. JSTOR, 434–47.
https://www.jstor.org/stable/622709%0D.
Jones, Kelvyn, and Neil Wrigley. 1995. “Generalized Additive Models, Graphical Diagnostics, and
Logistic Regression.” Geographical Analysis 27 (1): 1–18. doi:10.1111/j.1538-4632.1995.tb00333.x.
Katrakazas, Christos, Mohammed A. Quddus, and Wen-Hua Chen. 2016. “Real-Time Classification
of Aggregated Traffic Conditions Using Relevance Vector Machines.” In 95th Annual Meeting of
the Transportation Research Board. Washington, DC. https://scholar.smu.edu/datasciencereview/
vol1/iss3/9.
Khoda Bakhshi, Arash, and Mohamed M. Ahmed. 2020a. “Practical Advantage of Crossed Random
Intercepts under Bayesian Hierarchical Modeling to Tackle Unobserved Heterogeneity in Clustering
Critical versus Non-Critical Crashes.” Accident Analysis and Prevention (Under Press).
Khoda Bakhshi, Arash, and Mohamed M. Ahmed. 2020b. “Real-Time Crash Prediction for a Long Low-
Traffic Volume Corridor Using Corrected-Impurity Importance and Semi-Parametric Generalized
Additive Model.” Journal of Transportation Safety & Security (Under Press).
Kirasich, Kaitlin, Trace Smith, and Bivin Sadler. 2018. “Random Forest vs Logistic Regres-
sion: Binary Classification for Heterogeneous Datasets.” SMU Data Science Review 1 (3): 9.
https://scholar.smu.edu/datasciencereview/vol1/iss3/9.
Lao, Yunteng, Guohui Zhang, Yinhai Wang, and John Milton. 2014. “Generalized Nonlinear Models for
Rear-End Crash Risk Analysis.” Accident Analysis & Prevention 62: 9–16. doi:10.1016/j.aap.2013.09.004.
Lee, Chris, Mohamed Abdel-Aty, and Liang Hsia. 2006. “Potential Real-Time Indicators of Sideswipe
Crashes on Freeways.” Transportation Research Record: Journal of the Transportation Research Board
1953 (1): 41–49. doi:10.1177/0361198106195300105.
Li, Xiugang, Dominique Lord, and Yunlong Zhang. 2010. “Development of Accident Modification Fac-
tors for Rural Frontage Road Segments in Texas Using Generalized Additive Models.” Journal of
Transportation Engineering 137 (1): 74–83. doi:10.1061/(ASCE)TE.1943-5436.0000202.
Liu, Miaomiao, and Yongsheng Chen. 2017. “Predicting Real-Time Crash Risk for Urban Expressways
in China.” Mathematical Problems in Engineering.doi:10.1155/2017/6263726.
Lord, Dominique, and Fred Mannering. 2010. “The Statistical Analysis of Crash-Frequency Data: A
Review and Assessment of Methodological Alternatives.” Transportation Research Part A: Policy and
Practice 44 (5): 291–305. doi:10.1016/j.tra.2010.02.001.
Louppe, Gilles, Louis Wehenkel, Antonio Sutera, and Pierre Geurts. 2013. “Understanding Variable
Importances in Forests of Randomized Trees.” In Advances in Neural Information Processing Systems,
431–439. http://papers.nips.cc/paper/4928-understanding-variable-importances-in-forests-of
-randomized.
Mokhtarimousavi, Seyedmirsajad, Jason C. Anderson, Atorod Azizinamini, and Mohammed Hadi.
2019. “Improved Support Vector Machine Models for Work Zone Crash Injury Severity Prediction
and Analysis.” Transportation Research Record: Journal of the Transportation Research Board 2673
(11): 680–692. doi:10.1177/0361198119845899.
Molnar, Christoph. 2019.Interpretable machine learning. A Guide for Making Black Box Models Explain-
able. Leanpib. https://christophm.github.io/interpretable-ml-book/.
Mousavi, Seyedeh Maryam, Hassan Marzoughi, Scott A. Parr, Brian Wolshon, and Anurag Pande.
2019. “A Mixed Crash Frequency Estimation Model for Interrupted Flow Segments.” International
Conference on Transportation and Development 2019: 72–83. doi:10.1061/9780784482575.008.
26 A. KHODA BAKHSHI AND M. M. AHMED
Nasr Esfahani, Hossein, and Ziqi Song. 2020. “A Deep Neural Network Approach for Pedestrian Tra-
jectory Prediction Considering Heterogeneity.” Transportation Research Board. 99th Annual Annual
Meeting. doi:10.1016/j.aap.2020.105444.
Nembrini, Stefano, Inke R. König, and Marvin N. Wright. 2018. “The Revival of the Gini Importance?”
Bioinformatics 34 (21): 3711–3718. doi:10.1093/bioinformatics/bty373.
Oh, Cheol, Jun-Seok Oh, Stephen Ritchie, and Myungsoon Chang. 2001. “Real-Time Estimation of Free-
way Accident Likelihood.” In 80th Annual Meeting of the Transportation Research Board. Washington,
DC.
Pande, Anurag, and Mohamed Abdel-Aty. 2006. “Assessment of Freeway Traffic Parameters
Leading to Lane-Change Related Collisions.” Accident Analysis & Prevention 38 (5): 936– 948.
doi:10.1016/j.aap.2006.03.004.
Pande, Anurag, Abhishek Das, Mohamed Abdel-Aty, and Hany Hassan. 2011. “Estimation of Real-
Time Crash Risk: Are All Freeways Created Equal?” Transportation Research Record: Journal of the
Transportation Research Board 2237 (1): 60–66. doi:10.3141/2237-07.
Parsa, Amir Bahador, Ali Movahedi, Homa Taghipour, Sybil Derrible, and Abolfazl Kouros Moham-
madian. 2020. “Toward Safer Highways, Application of XGBoost and SHAP for Real-Time Accident
Detection and Feature Analysis.” Accident Analysis & Prevention 136), doi:10.1016/j.aap.2019.105405.
Parsa, Amir Bahador, Homa Taghipour, Sybil Derrible, and Abolfazl Kouros Mohammadian. 2019.
“Real-Time Accident Detection: Coping with Imbalanced Data.” Accident Analysis & Prevention 129:
202–210. doi:10.1016/j.aap.2019.05.014.
Perlich, Claudia, Foster Provost, and Jeffrey S Simonoff. 2003. “Tree Induction vs. Logistic Regres-
sion: A Learning-Curve Analysis.” Journal of Machine Learning Research 4 (June): 211– 255.
http://www.jmlr.org/papers/v4/perlich03a.html.
Qu, Xu, Wei Wang, Wenfu Wang, and Pan Liu. 2013. “Real-Time Freeway Sideswipe Crash
Prediction by Support Vector Machine.” IET Intelligent Transport Systems 7 (4): 445–453.
doi:10.1049/iet-its.2011.0230.
Rahimi, Ehsan, Ali Shamshiripour, Amir Samimi, and Abolfazl Kouros Mohammadian. 2020. “Inves-
tigating the Injury Severity of Single-Vehicle Truck Crashes in a Developing Country.” Accident
Analysis & Prevention 137. doi:10.1016/j.aap.2020.105444.
Strobl, Carolin, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, and Achim Zeileis. 2008.
“Conditional Variable Importance for Random Forests.” BMC Bioinformatics 9 (1). doi:10.1186/1471
-2105-9-307.
Vellido, Alfredo, José David Martín-Guerrero, and Paulo J. G. Lisboa. 2012. “Making Machine Learn-
ing Models Interpretable.” ESANN 12: 163–172. http://www.i6doc.com/en/livre/?GCOI =28001100
967420.
Washington, Simon P., Matthew G. Karlaftis, and Fred Mannering. 2010.Statistical and Econometric
Methods for Transportation Data Analysis. Boca Raton, Florida: Chapman and Hall/CRC.
White, Allan P., and Wei Zhong Liu. 1994. “Bias in Information-Based Measures in Decision Tree
Induction.” Machine Learning 15 (3): 321–329. doi:10.1023/A:1022694010754.
World Health Organization. 2015. “Global Status Report on Road Safety.” WHO Library Cataloguing-
in-Publication Data Global.
Wright, Marvin N., Stefan Wager, Philipp Probst, and Maintainer Marvin N. Wright. 2019. “Package
‘Ranger.’” https://github.com/imbs-hl/ranger.
Wright, Marvin N., and Andreas Ziegler. 2015. “Ranger: A Fast Implementation of Random Forests for
High Dimensional Data in C++ and R.” ArXiv Preprint ArXiv:1508.04409. doi:10.18637/jss.v077.i01.
Wyoming Department of Transportation. 2018. “Master Plan Implementation Report, I-80 Corridor
Study.”
Xie, Yuanchang, and Yunlong Zhang. 2008. “Crash Frequency Analysis with Generalized Additive Mod-
els.” Transportation Research Record: Journal of the Transportation Research Board 2061 (1): 39–45.
doi:10.3141/2061-05.
Yang, Fan, Mengnan Du, and Xia Hu. 2019. “Evaluating Explanation without Ground Truth in Inter-
pretable Machine Learning.” ArXiv Preprint ArXiv:1907.06831.https://arxiv.org/abs/1907.06831.
TRANSPORTMETRICA A: TRANSPORT SCIENCE 27
Yang, Kui, Xuesong Wang, Mohammed Quddus, and Rongjie Yu. 2018. “Deep Learning for Real-Time
Crash Prediction on Urban Expressways.” In 97th Annual Meeting of the Transportation Research
Board. Washington, DC.
Yu, Rongjie, Mohamed A. Abdel-Aty, Mohamed M. Ahmed, and Xuesong Wang. 2013. “Utilizing
Microscopic Traffic and Weather Data to Analyze Real-Time Crash Patterns in the Context of
Active Traffic Management.” IEEE Transactions on Intelligent Transportation Systems 15 (1): 205–213.
doi:10.1109/TITS.2013.2276089.
Yuan, Jinghui, and Mohamed Abdel-Aty. 2018. “Approach-Level Real-Time Crash Risk Analysis for Sig-
nalized Intersections.” Accident Analysis & Prevention 119: 274–289. doi:10.1016/j.aap.2018.07.031.
Yuan, Jinghui, Mohamed Abdel-Aty, Yaobang Gong, and Qing Cai. 2019. “Real-time Crash Risk Predic-
tion Using Long Short-term Memory Recurrent Neural Network.” Transportation Research Record:
Journal of the Transportation Research Board 2673 (4): 314–326. doi:10.1177/0361198119840611.
Zeng, Qiang, Helai Huang, Xin Pei, and S. C. Wong. 2016a. “Modeling Nonlinear Relationship between
Crash Frequency by Severity and Contributing Factors by Neural Networks.” Analytic Methods in
Accident Research 10: 12–25. doi:10.1016/j.amar.2016.03.002.
Zeng, Qiang, Helai Huang, Xin Pei, S. C. Wong, and Mingyun Gao. 2016b. “Rule Extraction from an
Optimized Neural Network for Traffic Crash Frequency Modeling.” Accident Analysis & Prevention
97: 87–95. doi:10.1016/j.aap.2016.08.017.