Conference PaperPDF Available

A Tree-Based Ordered Probit Approach to Identify Factors Affecting Work Zone Weather-Related Crashes Severity in North Carolina Using the Highway Safety Information System Dataset

Authors:

Abstract

Work zone crashes are still on the rise due to the aging of US roads and the increase in traffic demand. Investigation of crash characteristics and determining contributing factors in work zones is one of the most important issues in many traffic safety studies. The effect of work zones on traffic safety can be exacerbated by weather conditions. A sudden reduction in visibility may intensify the severity of work zone crashes. Although many studies have investigated work zone crashes, research that investigates the impact of adverse weather conditions on work zone crashes is lacking. In this study, The Highway Safety Information System database for North Carolina was used to identify the characteristics of work zone weather-related crashes. A Tree-based Ordered Probit, a relatively recent and promising combination of nonparametric machine learning (decision tree) and classical statistics (ordered probit) techniques, was utilized to gain a better understanding about the effects of various factors on different work zone crash related injury and crash severity in adverse weather conditions. The results showed that Tree- based Ordered Probit model has a better performance compared to conventional Ordered Probit Model. Lighting conditions, number of vehicles involved in a crash, road characteristics, number of occupants, land use, presence of traffic control devices, and two types of crashes (sideswipe and rear-end crashes) were identified as the most important factors in work zone weather-related crash severity.
Complementary parametric probit regression and nonparametric
classification tree modeling approaches to analyze factors
affecting severity of work zone weather-related crashes
Ali Ghasemzadeh
1
Mohamed M. Ahmed
1
Received: 12 February 2018 / Revised: 26 November 2018 / Accepted: 28 November 2018 / Published online: 21 December 2018
The Author(s) 2018
Abstract Identifying risk factors for road traffic injuries
can be considered one of the main priorities of trans-
portation agencies. More than 12,000 fatal work zone
crashes were reported between 2000 and 2013. Despite
recent efforts to improve work zone safety, the frequency
and severity of work zone crashes are still a big concern for
transportation agencies. Although many studies have been
conducted on different work zone safety-related issues,
there is a lack of studies that investigate the effect of
adverse weather conditions on work zone crash severity.
This paper utilizes probit–classification tree, a relatively
recent and promising combination of machine learning
technique and conventional parametric model, to identify
factors affecting work zone crash severity in adverse
weather conditions using 8 years of work zone weather-
related crashes (2006–2013) in Washington State. The key
strength of this technique lies in its capability to alleviate
the shortcomings of both parametric and nonparametric
models. The results showed that both presence of traffic
control device and lighting conditions are significant
interacting variables in the developed complementary crash
severity model for work zone weather-related crashes.
Therefore, transportation agencies and contractors need to
invest more in lighting equipment and better traffic control
strategies at work zones, specifically during adverse
weather conditions.
Keywords Adverse weather Work zone Safety Crash
characteristics Probit model Decision tree
1 Introduction
The widespread aging of the US roads and the increase in
traffic demand have raised the need for more transportation
maintenance projects; hence, affecting both safety and
operations of roadways. Over 27% of 67,523 work zone
crashes in 2013 involved injuries or fatalities. In fact, over
579 fatalities were reported due to work zone-related cra-
shes in 2013 [1]. A number of studies have been conducted
on work zone crashes. Results from these studies showed
that crash rate and frequency are increased by the presence
of work zones [26]. Weng et al. [7] used association rules
to investigate factors that might have a significant effect on
work zone crash casualties. They found that driving under
the influence, the speed limit over 40 mph and work zones
without traffic control devices have the most significant
effects on work zone casualty risk. Debnath et al. [8]
identified the most frequent hazards at work zones from
road-worker perspective. They found that speeding vehi-
cles are the common work zone hazard. Wang and Qin [9]
investigated the severity of single-vehicle crashes at work
zones. They concluded that speeding is one of the key
factors affecting the severity of work zone crashes. Char-
acteristics of work zone rear-end crashes were analyzed by
Qi et al. [10], and they found that driving under the
influence, lighting condition, the presence of pedestrians,
and roadway defects have the highest effects on severity of
work zone rear-end crashes.
It is worth mentioning that there is no agreement among
different studies about the severity of work zone crashes.
More specifically, some studies reported that work zone
&Ali Ghasemzadeh
aghasemz@uwyo.edu
Mohamed M. Ahmed
mahmed@uwyo.edu
1
Department of Civil and Architectural Engineering,
University of Wyoming, Laramie, WY 82071, USA
123
J. Mod. Transport. (2019) 27(2):129–140
https://doi.org/10.1007/s40534-018-0178-6
crashes were significantly more severe than non-work zone
crashes [11]. On the other hand, some studies claimed that
work zone crashes were less severe than non-work zone
crashes [12,13]. There is another group of studies, which
mentioned that there is no significant difference between
the severity of work zone and non-work zone crashes
[14,15]. While these studies are important and useful to
understand factors affecting work zone crashes, it is largely
unknown whether the harmful effects of adverse weather
conditions can exacerbate the severity of work zone cra-
shes or not.
Prior experience from road safety studies suggested that
adverse weather conditions significantly affect the opera-
tions and safety of roadways [1620]. More than 7,400
people are killed, and over 673,000 people are injured due
to weather-related crashes every year on the US roadways
[21]. Qin et al. [22] found that heavy rain leads to a fewer
number of crashes but more severe crashes. Yu et al. [23]
found that crashes are extremely affected by weather
conditions, especially those that occurred in mountainous
freeways. Huang et al. [24] found that the severity of fog-
and smoke-related crashes is higher compared with crashes
in clear weather condition. Eisenberg et al. [25] analyzed
effects of snowfalls on crashes injury severity. They found
that snowy days have less fatal crashes than dry days;
however, they have more injury and property damage only
(PDO) crashes. Ahmed et al. [26] analyzed the interaction
between roadway geometry and real-time weather and
traffic data on mountainous freeways. They found that the
probability of being involved in a crash could be doubled
during the snowy seasons. Another study depicted that both
injury and non-injury crash rates increase during the winter
season; however, injury crashes are more severe during
snowy winter season in comparison with other non-snow
seasons [27].
Nevertheless, there is still a lack of studies examining
the impact of weather conditions on work zone injury
severity. This paper utilized data from the second Strategic
Highway Research Program (SHRP2) Roadway Informa-
tion Datasets (RID) to shed some lights on the different
factors affecting work zone crash frequency and severity in
different weather conditions.
The contributions provided in this paper are emphasized
as follows. First, considering the fact that weather data
extracted from weather stations are not that accurate, this
study utilized weather information provided in police crash
reports. It is worth mentioning that a previous study by the
authors showed about 60% accuracy in identifying
weather-related crashes using the weather data obtained
from the weather stations [28]. Second, a new methodol-
ogy, which is based on combining the traditional probit
regression analysis and data mining decision tree tech-
nique, was utilized to overcome the limitations of each
standalone modeling technique. Third, factors affecting the
severity of work zone weather-related crashes were iden-
tified and discussed.
2 Data source
This study utilized 8 years (2006–2013) of crashes
including 3,028 weather-related crashes (1,887 PDO and
1,141 fatal ?injury) to determine factors affecting work
zone weather-related crashes in Washington State. The
crash data were extracted from the Strategic Highway
Research Program 2 (SHRP2) Roadway Information
Dataset (RID).
RID consists of roadway data collected from mobile
data collection project, government, public, and private
parties, in addition to supplemental crash history data
which is used in this study. For more information about
RID, see [29].
3 Methodology
In this study, factors affecting work zone weather-related
crashes were identified using a complementary parametric
and nonparametric crash severity model. In fact, both
parametric and nonparametric models have their own
advantages and disadvantages. For example, parametric
models such as probit and logistic regression can provide
the relationship between a response variable and predictors,
and the results obtained from them are easy to interpret and
understand [30]. However, the problem with these models
is that there are many pre-assumptions in parametric
models, which might negatively affect the accuracy of the
results. Risk factors can also exhibit various exposure
effects in different circumstances in parametric models
(hidden effects problem). These shortcomings cannot be
addressed using the common parametric models such as
logistic and probit regression models [31]. One of the
major solutions to address the hidden effects problem is to
split the full sample data into several sub-datasets using the
nonparametric classification tree method [31]. This method
might be an effective way to see the effects of risk factors’
hidden exposure in different circumstances. In addition,
using nonparametric models is beneficial as these models
have the ability to provide high prediction accuracy [31].
Therefore, this study utilized the probit–classification tree
method, which is a complementary method utilizing both
nonparametric (classification tree) and parametric (probit
regression) models, to analyze the effect of different risk
factors on work zone weather-related crashes. The depen-
dent variable in the model is the crash severity levels, and
the explanatory variables are the factors which influence
130 A. Ghasemzadeh, M. M. Ahmed
123 J. Mod. Transport. (2019) 27(2):129–140
crash severity levels. In this study, the crash severity is
defined as a binary variable, which has two levels including
severe crashes (injury and fatal crashes) and non-severe
crashes (PDO crashes).
3.1 Decision tree
A decision tree can be used for both continuous and
nominal target variables. When a decision tree is used to
predict a continuous target variable, it is called a regression
tree, and when it is used to classify a nominal target
variable, it is called a classification tree [32]. Two main
components of decision trees are the ‘‘root node’’ and the
‘leaf node.’’ The root node is the node located at the top of
the tree and contains all the data, and the ‘‘leaf node’’ refers
to the termination node and has the lowest impurity. More
specifically, based on the independent variable (splitter)
that creates the best homogeneity, the root node is divided
into two child nodes. This procedure (partitioning the target
variable recursively) will be continued until all the data in
each node reach their highest homogeneity; then, tree
growing will stop. This node, which does not have any
branches, is called the ‘‘leaf node,’’ and each path from the
top of the tree (the root node) to each leaf (the terminal
node) can be considered as a rule. It is worth mentioning
that data in each child node are purer (more homogenous)
than the data in the upper parent node [33]. In order to find
possible splits among all variables, a splitting criterion
(test) is performed. Splitting criterion is the main design
component of a decision tree [34]. More specifically, in the
decision tree learning algorithm, the splitting criterion’s
role is to measure the quality of each possible split among
all variables. Two common splitting criteria that can be
used to grow a decision tree are Chi-square and Gini
reduction. In this study, Gini splitting criterion is used to
select which variable and split pattern will be used to best
split the node.
Gini impurity shows the level of data impurity. More
specifically, it shows the incorrect classification probability
of a randomly chosen record from the specific node in the
subset. The procedure of selecting variables and split
scheme that can be used to best split the parent node is as
follows [35]:
1. Determine the node impurity: Considering the tas a
parent node, the node impurity i(t) can be calculated
using the Gini index definition, which is provided in
Eq. (1).
iðtÞ¼1X
M
j
nj
N

2
;ð1Þ
where Mrepresents the number of classes, njrepre-
sents the number of class jelements, and Ndepicts all
elements in the node.
If a node is homogeneous, then the value obtained
from Eq. (1) would be minimal. However, the value
would be higher for the less pure nodes.
2. Determine the impurity reduction (Di): For all possible
splits in the values for the variable x, the impurity
reduction on the parent node tcaused by a split sis
calculated as follows:
Diðx;s;tÞ¼iðtÞX
nt
j¼1
FjiðtjÞ;ð2Þ
where ntis the number of child nodes of the parent
node t;F
j
is the proportion of class jelements divided
by all elements in the node (the proportion mentioned
in the parenthesis in Eq. (1)). Di[0 indicates that
elements in the child nodes are purer than elements in
the parent node, and Di0 denotes that elements in
the child nodes are not purer than the parent node.
The best split which can be shown by s* associated
with the variable xcan be identified by comprehen-
sively searching all possible splits related to the vari-
able x. More clearly, s* causes the maximum impurity
reduction.
3. Determine the global maximum impurity reduction:
The previous step would be repeated, and the best
splits for all variables would be determined. Among all
the best splits, the split s* associated with the variable
x
*
determines the global maximum impurity reduction.
4. Determine the leaf node: If Di(x*, s*, t)[0, then
choose the variable x* and split s* to split the parent
node. If Di(x*, s*, t)B0, then the parent node will be
considered as the leaf node.
5. Stop the splitting: By satisfying one of these two
criteria, the splitting would be stopped:
(a) Any nodes cannot be further split (Di(x*, s*,
t)B0), or the number of elements in the node is
less than the preset minimum number.
(b) The current tree depth reaches the preset maxi-
mum tree depth. Otherwise, go to Step 1.
For more information about the decision tree method,
see [32].
3.2 Probit regression
Probit regression is a common statistical method that can
estimate the coefficients of predictors [36]. This method
was chosen to develop a separate probit regression for each
group identified by the decision tree model. The forward
Complementary parametric probit regression and nonparametric classification tree modeling131
123
J. Mod. Transport. (2019) 27(2):129–140
selection method was used to find the best fitting model.
The advantage of using probit regression model is the
ability to estimate marginal effects. Probit regression is
accordingly implemented to each data group to find the
coefficient of each explanatory variable.
Probit regression has the S-shape and can be used for
dealing with binary response variables. Probit model
(probit PðxÞ
½
¼y) is shown as Eq. (3)[36].
y¼aþbx;ð3Þ
where ais the probability of response when explanatory
variables are the reference level (or when x= 0) and b
represents the change in the probability per unit change in
x, a parameter to be estimated.
The cumulative distribution function (CDF) F(x) for a
random variable xcould be defined as FðxÞ¼PðXxÞin a
way that xcould be varied between 1 and 1.By
increasing the x,F(x) increases from 0 to 1 because the
probability of having success would be increased. Let us
consider xas a continuous random variable and plot the
cumulative distribution function as a function of x; the
result looks like an S-shape when b[0[36].
The best definition for the relation of CDF, normal
distribution and probit model provided by Agresti [36]is
‘‘ when F is the CDF of a normal distribution, the model
provided in Eq. (3)is equivalent to the probit model.’’ This
statement is clearly shown in Eqs. (4), (5), and (6) as the
normal density function provided in Eq. (6) and the CDF of
normal distribution depicted in Eq. (5)[36].
PxðÞ¼FxðÞ:ð4Þ
As mentioned, assuming the Ffollows a normal
cumulative distribution:
FðxÞ¼UðxÞ¼ Z
x
1
/ðzÞdz;ð5Þ
where /ðzÞis the normal density function and defined as in
Eq. (6).
/ðzÞ¼
pexp ðzlÞ2
2r2

rffiffiffiffiffi
2p
p;zNð0;1Þ;l¼a
b;
r¼1
b
jj
;
ð6Þ
where zfollows the normal distribution with the mean of 0
and standard deviation equal to 1.
For more information about probit regression, see [36].
3.3 Variables description
Table 1shows the selected variables for developing the
work zone crash severity model in adverse weather
conditions. The dependent variable is crash severity with
two levels: severe crashes (fatal and injury crashes) and
non-severe crashes (PDO crashes). Explanatory variables
can be considered as driver behavior, roadway conditions,
environmental conditions, and crash characteristics.
4 Results and discussion
The dataset was divided into three sub-datasets (training,
validation, and testing). Among them, 60% of the data
were assigned to the training sub-dataset, 30% to the val-
idation sub-dataset, and 10% to the testing sub-dataset.
More clearly, 3023 observations were used and divided
among mentioned sub-datasets (i.e., 1813 observations
were assigned to the training sub-dataset, 906 observations
were assigned to the validation sub-dataset, and 302
observations were assigned to the testing sub-dataset). In
addition, the maximum tree depth of 10, which is suit-
able considering the number of variables, was selected. The
minimum number of data per node was selected as 40 so
that all the crash severity classes have enough data for the
subsequent probit analysis in each leaf node.
The Gini reduction test found any node at level four at
the significance level of 0.05 to be insignificant based on
the obtained results. Hence, the tree stops growing at this
level, and the tree comprising four leaf nodes is selected.
Optimal decision tree size ensures mitigating the overfit-
ting issue. Two criteria were considered: First, the mis-
classification rate for the training and validation datasets
should be similar to some extent; and second, the mis-
classification rate and average squared error should be as
low as possible. Misclassification rates for the training and
validation were obtained as 0.32 and 0.35, respectively.
The average square errors obtained were 0.21 and 0.22 for
the training and validation datasets, respectively. There-
fore, both criteria were satisfied based on the obtained
results.
Figure 1shows the selected decision tree structure for
the probit–classification tree model. As can be seen, col-
lision type, the presence of a traffic control device, and
lighting conditions were found to be significant interacting
variables. Based on the classification shown in Fig. 1, the
training dataset was divided into four groups representing
four leaf nodes, including
Group (1): collision type = 3 and 4 (sideswipe and other
crashes);
Group (2): collision type = 1 and 2 (rear-end and angle
crashes) and traffic control = 2 (none);
Group (3): collision type = 1 and 2 (rear-end and angle
crashes) and traffic control = 1 (stop/signal/yield sign/
other) and lighting condition = 2 (other);
132 A. Ghasemzadeh, M. M. Ahmed
123 J. Mod. Transport. (2019) 27(2):129–140
Table 1 Variables descriptions
Variable Description Type Definition Assigned
code
Crash severity level Severity of a crash Binary PDO 1
Fatal ?injury 2
Speed Maximum posted speed Binary Below 50 mph
Above 50 mph
1
2
Gender Driver’s gender Categorical Male 1
Female 2
Age Driver’s age Categorical Young \25 1
Middle (25–64) 2
Old [64 3
Land use If the crash occurred in an urban area Binary Urban 1
Rural 2
Vehicle involved Number of motor vehicles involved Categorical Greater than or equal to three 1
Less than three 2
Collision type Type of crash Categorical Rear-end 1
Angle 2
Sideswipe 3
Other 4
Driving under the
influence
If the driver was under the influence
of alcohol or drugs
Binary Sober 1
Drunk 2
Roadway characteristic If there is a curve at the crash location Binary No 1
Yes 2
Lighting condition If the crash occurred during the daylight Binary Yes 1
No 2
Weather Type of severe weather condition Categorical Rain 1
Snow 2
Fog 3
Other 4
Vehicle action before
crash
Vehicle last action before the crash Categorical Stopped 1
Making left or right turn 2
Changing lane 3
Slowing 4
Other (going straight, etc.) 5
Roadway type Roadway type in crash location Categorical One way 1
Two-way divided 2
Two-way undivided 3
Other 4
Traffic control devices Presence of traffic control devices at
crash location
Binary Stop/signal/yield sign/other 1
None 2
Vehicle age Vehicle age Categorical Less than 5 years 1
Between 5 and 10 years 2
Greater than 10 years 3
Vehicle type Whether the vehicle is considered as a big
vehicle, a small vehicle or a motorcycle
Categorical Big vehicles (bus, truck, pickup,
etc.)
1
Small vehicles (sedan, coupe, etc.) 2
Motorcycle 3
Complementary parametric probit regression and nonparametric classification tree modeling133
123
J. Mod. Transport. (2019) 27(2):129–140
Table 1 continued
Variable Description Type Definition Assigned
code
Airbag Whether the vehicle is equipped with airbag or
not
Categorical Equipped 1
Not airbag equipped 2
Construction type Different construction types at crash location Categorical Construction 1
Maintenance 2
Utility 3
Other 4
Crash location Whether the crash is within work zone or in
external traffic backup caused by work zone
Binary Within work zone 1
In external traffic backup 2
Surface condition Surface condition at crash location Binary Dry 1
Other (wet, snow, etc.) 2
Fig. 1 Decision tree structure for the logistic-decision tree model
134 A. Ghasemzadeh, M. M. Ahmed
123 J. Mod. Transport. (2019) 27(2):129–140
Group (4): collision type = 1 and 2 (rear-end and angle
crashes) and traffic control = 1 (stop/signal/yield sign/
other) and lighting condition = 1 (daylight).
For each above-mentioned group, a separate probit
regression model was developed, and the marginal effect of
risk factors for work zone weather-related crashes was
calculated. For instance, in case of Group (2), 19 variables
are available for developing a probit model, since the
lighting condition is equal to 2, which means that this
variable is fixed in this model. It is worth mentioning that
for Group 2, collision type still has two categories (rear-end
and angle crashes), and therefore, this variable is not fixed
and will be used for further analysis in the probit model.
Table 2shows the results obtained from developing a
separate probit model for each classified group using the
estimated decision tree algorithm. In fact, four models were
estimated for work zone weather-related crashes using the
probit–classification tree procedure. To confirm the
suitability and fitness of the models, three statistics
including Akaike information criterion (AIC) statistic, the
Schwarz criterion (SC) statistic, and the -2 log-likelihood
statistic were used, and the results are given in Table 3.
As can be seen in model 1, the marginal effects of risk
factors on the crash severity of a driver who is involved in
collision types 3 and 4 (sideswipe and other) in work zone
weather-related crashes can be identified. Speed limit is
one of the factors found to be significant in model 1. In
fact, a higher speed limit dramatically increases the
severity of work zone weather-related crashes. More
specifically, drivers in work zone weather-related crashes
occurred in a segment with less than 55 mph posted speed
limit were 13% less likely to be involved in severe crashes
(the marginal effect is -0.13 in Table 2) in comparison
with the reference level (drivers who were driving in a
work zone segment with a posted speed limit greater than
55 mph). In fact, driving over the speed limit could be
risky because it provides insufficient time for
Table 2 Estimation of probit classification tree regression for work zone weather-related crash severity
Model Variable Coefficient SE Wald Chi-
square
Pvalue Marginal effect
dy/dxSE
Model 1 Intercept 0.5876 0.2132 7.5971 0.0058
Speed limit 1 -0.3769 0.0972 15.0471 0.0001 -0.1353255 0.0313671
Vehicle age 1 -0.3709 0.1120 10.9751 0.0009 -0.1333592 0.0309113
Vehicle age 2 -0.0637 0.1046 0.3715 0.5422 -0.0262309 0.0060800
Sobriety 1 -0.5067 0.1833 7.6390 0.0057 -0.1323963 0.0306881
Gender 1 -0.1876 0.0928 4.0818 0.0433 -0.0548812 0.0127209
Roadway characteristic 1 0.2864 0.0987 8.4145 0.0037 -0.0665939 0.0154358
Collision type 3 -0.2600 0.0948 7.5265 0.0061 -0.0746425 0.0173014
Model 2 Intercept -1.0630 0.5961 3.1806 0.0745 –
Roadway type 2 1.1801 0.2017 34.2453 \0.0001 0.3710220 0.0917906
Speed limit 1 -0.2757 0.1092 6.3771 0.0116 -3.6055224 0.0182795
Vehicle type 1 -0.1790 0.0776 5.3202 0.0211 -2.5847597 0.6394681
Crash location 1 -0.2710 0.0982 7.6225 0.0058 -0.1153495 0.0285374
Roadway characteristic 1 -0.2302 0.1041 4.8938 0.0270 -0.0930068 0.0230098
Surface condition 2 0.8450 0.2979 8.0437 0.0046 0.2948777 0.0729526
Collision type 1 -0.4907 0.2117 5.3725 0.0205 -0.1845932 0.0456683
Model 3 Intercept 6.7347 334.7 0.0004 0.9839
Sobriety 1 -1.3562 0.3125 18.8339 \0.0001 -0.4300327 0.1684546
Airbag 1 -0.4780 0.2352 4.1319 0.0421 -0.1320891 0.0517426
Vehicle involved 2 -0.6947 0.1659 17.5375 \0.0001 -0.2196478 0.0860416
Model 4 Intercept -0.7614 0.3911 3.7887 0.0516 –
Vehicle action before crash 1 0.2731 0.1345 4.1238 0.0423 0.0793240 0.0205579
Vehicle action before crash 2 -1.0430 0.5113 4.1618 0.0413 -0.3807359 0.0986730
Speed limit 1 -0.3002 0.1405 4.5622 0.0327 -0.1242374 0.0321978
Weather condition 1 0.9319 0.3227 8.3385 0.0039 0.1333412 0.0345572
Land use 1 0.3460 0.1459 5.6216 0.0177 0.1054947 0.0273404
SE standard error
Complementary parametric probit regression and nonparametric classification tree modeling135
123
J. Mod. Transport. (2019) 27(2):129–140
suitable response to control and handle unexpected situa-
tions, and this might be exacerbated during adverse
weather conditions such as heavy rain or fog.
The findings indicated that among those drivers who had
sideswipe and other types of crashes (all crashes except
rear-end and angle crashes), female drivers had a higher
casualty risk in comparison with male drivers. Two factors
have been identified for the higher risk of being injured
associated with the female drivers at work zone crashes in
the literature, i.e., risk-taking behavior, which is higher in
younger female drivers, and increased number of female
drivers (exposure effect) [31,37]. Based on the identified
marginal effects for model 1, male drivers were 5% less
likely to be involved in severe crashes in comparison with
female drivers.
Vehicle age is another factor that turned out to be a
significant predictor in model 1. Results showed that older
vehicles were severely impacted in work zone weather-
related crashes. This result confirms previous studies,
which depicted the impact of vehicle age on driver injury
severity [38,39]. More specifically, vehicles less than
5 years old were 13% less likely to be involved in severe
work zone weather-related crashes in comparison with
those over 10 years old. It was also found that 5–10-year-
old vehicles were 2% less involved in severe work zone
weather-related crashes in comparison with older vehicles.
Driving under the influence is another factor that came
out to be significant in model 1. Particularly, sober drivers
were 13% less likely to be involved in severe work zone
weather-related crashes in comparison with drivers under
the influence, which supports previous studies [40,41].
Roadway characteristics (presence of a curve) were also
found to have a significant effect in this model. More
specifically, drivers who were involved in a work zone
weather-related crash at non-curved sections were 6% less
likely to be involved in severe crashes in comparison with
drivers who were involved in a work zone weather-related
crash at a curve. The presence of curves at work zone
segments could adversely affect the sight distance and
handling of the vehicle in critical situations. The adverse
weather might exacerbate this negative effect. Risky
overtaking might be another reason for severe crashes at
work zones on road curves.
The model also estimated the effect of collision type
(sideswipe crashes) on the severity of crashes. More
clearly, drivers who were involved in sideswipe crashes
were 7% less likely to be involved in severe crashes in
comparison with other collision types. It is worth men-
tioning that most of the sideswipe crashes were not severe
in comparison with head-on crashes at work zones, which
will be discussed in model 2 interpretation.
Model 2 indicates the results of the probit model for
those drivers who were involved in rear-end and angle
crashes, in locations with no traffic control devices. Par-
ticularly, model 2 showed that route type is a contributing
factor in crash severity model. Drivers who were involved
in work zone weather-related crashes in rural or secondary
routes were 37% more likely to be involved in severe
Table 3 Pure probit regression
Parameter Estimate SE Wald Chi-
square
Pvalue Marginal effect
dx/dySE
Intercept -0.4316 0.1834 5.5401 0.0186 –
Vehicle action before crash 1 0.1439 0.0650 4.9053 0.0268 0.0613226 0.0088342
Vehicle action before crash 2 -0.5228 0.1567 11.1332 0.0008 -0.1802264 0.0259636
Roadway type 1 -0.4321 0.1938 4.9698 0.0258 -0.1553926 0.0223860
Roadway type 2 0.2769 0.0990 7.8209 0.0052 0.0938238 0.0135164
Roadway type 3 0.3120 0.1003 9.6716 0.0019 0.1096382 0.0157946
Speed limit 1 -0.1901 0.0627 9.1990 0.0024 -0.0718147 0.0103457
Traffic control 2 -0.1597 0.0534 8.9583 0.0028 0.0551023 0.0079381
Sobriety 2 0.6802 0.1291 27.7685 \0.0001 -0.2363750 0.0340524
Gender 1 -0.1469 0.0487 9.1061 0.0025 -0.0533711 0.0076887
Roadway characteristic 2 0.2216 0.0645 11.7957 0.0006 -0.0803298 0.0115724
Surface condition 2 0.2705 0.1374 3.8766 0.0490 0.0793566 0.0114322
Land use 1 -0.1587 0.0666 5.6741 0.0172 0.0563574 0.0081189
Vehicle involved 1 -0.1547 0.0490 9.9761 0.0016 -0.0548034 0.0078950
Collision type 1 0.2220 0.0673 10.8795 0.0010 0.0742384 0.0106949
Collision type 2 0.5739 0.1374 17.4388 \0.0001 0.1945923 0.0132321
Collision type 3 -0.2636 0.0969 7.4078 0.0065 -0.0918510 0.0132321
136 A. Ghasemzadeh, M. M. Ahmed
123 J. Mod. Transport. (2019) 27(2):129–140
crashes in comparison with drivers who were driving on
interstate highways. This could be because most of the
rural roads are narrow two-way two-lane undivided roads
which do not have enough shoulder width. Even secondary
roads do not have the same quality as interstate roadways,
which might intensify the severity of crashes in rural and
secondary roads.
Vehicle type was found to be a significant factor in
model 2. Drivers who were driving passenger cars and
involved in work zone weather-related crashes were 2.5
times less likely to be involved in severe crashes in com-
parison with those driving other vehicles.
Crash location was found to be another significant factor
in this model. The results showed that drivers who were
involved in crashes within a work zone area were 11% less
likely to be involved in severe crashes in comparison with
drivers who were involved in crashes in external traffic
backup caused by work zone. This finding is supported by
other studies which noted lane closure might increase the
traffic conflicts, as a road work mostly requires closing a
part of the roadway [42]. This situation might be worse in
adverse weather conditions as the low visibility for drivers
could prevent them to perform appropriate maneuver in
critical circumstances.
The model also showed the significant effect of surface
conditions on the severity of crashes. Drivers who were
involved in work zone crashes were 29% more likely to be
involved in severe crashes if the surface is wet in com-
parison with a dry surface condition. The marginal effect of
rear-end crashes (collision type = 1) is given in Table 2,
which indicated that drivers who were involved in rear-end
crashes (most common crashes in work zones) were 18%
less likely to be involved in severe crashes in comparison
with angle crashes during adverse weather conditions.
Model 3 specifies the results of the probit model for
those drivers who were involved in rear-end and angle
crashes, with the presence of traffic control devices and in
the absence of daytime lighting conditions. Similar to
model 1, sobriety came out to be significant, which means
that the crash severity is affected by this factor. The mar-
ginal effect of a risk factor on crash severity showed that
drivers equipped with airbags were 13% less likely to be
involved in severe crashes, which is consistent with a
previous study [31].
In addition, results showed that crashes with fewer
vehicles involved were 21% less likely to be severe. This
could be because of the severity of chain crashes. Asking
drivers to keep the safe following distance and avoid tail-
gating might be helpful. Installing dynamic message signs
at work zones could be beneficial to provide advisory
messages for drivers.
Finally, model 4 reveals the results of the probit model
for those drivers who were involved in rear-end and angle
crashes, with the presence of traffic control devices and in
daytime lighting conditions. Vehicle action before crash,
speed limit, weather conditions, and land use came out to
be significant factors affecting crash severity model.
Indeed, vehicles making a left or right turn before a crash
were 38% less likely to be involved in severe crashes in
comparison with vehicles going straight (category 5 in
Table 1).
It was also found that drivers who were driving on roads
with posted speed limits less than 55 mph were 12% less
likely to be involved in severe crashes. In addition, drivers
who were involved in rainy weather condition crashes were
13% more likely to be involved in severe crashes. Finally,
drivers who were driving in a rural area were 10% more
likely to be involved in severe work zone weather-related
crashes in comparison with those driving on urban road-
ways. This might be due to the fact that risky behaviors
such as not wearing seatbelts are more common among
rural drivers [43].
As mentioned earlier, another conventional probit model
with all variables was developed to compare the results
with complementary probit–classification tree models. All
the 20 variables used for developing probit–classification
tree were considered in developing the conventional probit
model as well. Forward stepwise selection method was
used to find the best model fit. The results of the probit
regression model are given in Table 3.
As can be seen, 11 variables came out to be significant
in the conventional probit model. However, some impor-
tant contributing factors including vehicle age, vehicle
type, crash location, airbag, number of vehicles involved,
lighting, and weather conditions are not represented in this
model. The conventional probit model fit statistics are
provided in Table 4.
Receiver operating characteristic (ROC) curve was used
to visualize models’ performance. The predictive capabil-
ity of the probit–classification tree models and conven-
tional probit (base model) is shown in Fig. 2using ROC
curves. ROC curve shows the models’ performance by
plotting sensitivity versus 1-specificity. More clearly, if
there are no relevant information and bias results, the ROC
curve would be closer to the diagonal line. On the other
hand, the best performed model would be close to the
upper left corner.
Figure 2shows that for training dataset which contains
most of the data, all developed probit–classification tree
models performed better than the conventional probit
model. It is also worth mentioning that even though small
portions of dataset were assigned to validation and testing
datasets (in comparison with training dataset), still using
probit–classification tree method showed a better perfor-
mance in comparison with a conventional probit model.
Complementary parametric probit regression and nonparametric classification tree modeling137
123
J. Mod. Transport. (2019) 27(2):129–140
5 Conclusions
This study explored the effects of vehicle, driver behavior,
and environmental factors on work zone weather-related
crashes severity using a method that combines both data
mining technique and conventional parametric modeling.
In addition, the obtained results from both conventional
probit model and proposed probit–classification tree model
were compared to better understand the advantages of the
proposed model using the 8 years of work zone weather-
related crashes.
The complementary methodology utilized in this study
has the advantage of being used in safety analysis, as this
model can compensate for the weak points of parametric
and nonparametric models.
In comparison with the probit–classification tree model,
some important contributing factors, such as vehicle age,
vehicle type, crash location, airbag, number of the vehicles
involved, lighting, and weather conditions, were excluded
in the conventional probit model, which shows the capa-
bility of the developed model in identifying risk factors
affecting work zone weather-related crashes.
The most interesting finding of this study is that both the
presence of traffic control device and lighting conditions
were found to be significant interacting variables in the
developed complementary crash severity model for work
zone weather-related crashes. This finding shows that more
attention should be paid to the effect of weather conditions
in different stages of work zone projects by transportation
agencies and contractors. It is highly recommended to
transportation agencies and contractors to invest in
Fig. 2 Comparision among all models using ROC chart
Table 4 Model fit statistics for four models obtained from probit–classification tree
Model 1 Model 2 Model 3 Model 4 Conventional probit
Intercept
only
Intercept
and
covariates
Intercept
only
Intercept
and
covariates
Intercept
only
Intercept
and
covariates
Intercept
only
Intercept
and
covariates
Intercept
only
Intercept
and
covariates
Model fit statistics
AIC 1119.4 1081.9 1714.4 1604.7 446.5 371.1 665.1 631.5 4003.28 3844.92
SC 1124.3 1120.5 1719.6 1682.0 450.3 408.8 669.3 682.1 4009.30 3965.19
-2 Log-likelihood 1117.4 1065.9 1712.4 1574.7 444.5 351.1 663.1 607.5 4001.28 3804.92
138 A. Ghasemzadeh, M. M. Ahmed
123 J. Mod. Transport. (2019) 27(2):129–140
adequate lighting equipment for nighttime work, which is
the only option in most urban areas. In 2013, the American
Traffic Safety Services Association developed the first and
the most comprehensive set of nighttime lighting guideli-
nes for work zones called ‘‘Nighttime Lighting Guidelines
for Work Zones: A guide for developing a lighting plan for
nighttime work zones’’ [44]. The manual provides a simple
procedure for designing a nighttime lighting system for
work zones that can be easily adopted by engineers,
designers, and contractors without prior experience in
illumination. However, more studies are needed to clarify
and characterize driver behavior in different lighting, vis-
ibility, and weather conditions.
This study found that the presence of a traffic control
device is a significant contributing factor to crash injury
severity. Improving the temporary traffic control (TTC),
including but not limited to the portable variable speed
limit (VSL), to adjust speed limits during adverse weather
conditions can be beneficial in this regard.
Portable changeable message signs (PCMS) are also highly
recommended to warn drivers about work zones to reduce
the severity of work zone crashes.
Acknowledgements This work was conducted under the second
Strategic Highway Research Program (SHRP2), which is adminis-
trated by the Transportation Research Board (TRB) of the National
Academies of Sciences, Engineering, and Medicine, and it was
sponsored by the Federal Highway Administration (FHWA) in
cooperation with the American Association of State Highway and
Transportation Officials (AASHTO).
Open Access This article is distributed under the terms of the
Creative Commons Attribution 4.0 International License (http://
creativecommons.org/licenses/by/4.0/), which permits unrestricted
use, distribution, and reproduction in any medium, provided you give
appropriate credit to the original author(s) and the source, provide a
link to the Creative Commons license, and indicate if changes were
made.
References
1. Ullman GL, Pratt M, Fontaine MD et al (2018) Estimating the
safety effects of work zone characteristics and countermeasures: a
guidebook. Natl Coop Highw Res Progr Rep. https://doi.org/10.
17226/25007
2. Ghasemzadeh A, Ahmed MM (2017) A tree-based ordered probit
approach to identify factors affecting work zone weather-related
crashes severity in North Carolina using the highway safety
information system dataset. In: 96th Transportation research
board annual meeting, Washington, D.C.
3. Ghasemzadeh A, Ahmed MM (2016) Crash characteristics and
injury severity at work zones considering adverse weather con-
ditions in Washington using SHRP 2 roadway information data-
base. In: 95th Transportation research board annual meeting,
Washington, D.C.
4. Khattak AJ, Khattak AJ, Council FM (2002) Effects of work zone
presence on injury and non-injury crashes. Accid Anal Prev
34:19–29. https://doi.org/10.1016/S0001-4575(00)00099-3
5. Schrock SA, Ullman GL, Cothron AS, Cothron AS (2004) An
analysis of fatal work zone crashes in Texas. Texas Transporta-
tion Institute, Texas
6. Akepati SR, Dissanayake S (2011) Characteristics and contribu-
tory factors of work zone crashes. In: Transportation Research
Board 90th annual meeting
7. Weng J, Zhu J-Z, Yan X, Liu Z (2016) Investigation of work zone
crash casualty patterns using association rules. Accid Anal Prev
92:43–52
8. Debnath AK, Blackman R, Haworth N (2015) A comparison of
self-nominated and actual speeds in work zones. Transp Res Part
F Traffic Psychol Behav 35:213–222
9. Wang K, Qin X (2014) Use of structural equation modeling to
measure severity of single-vehicle crashes. Transp Res Rec J
Transp Res Board 2432:17–25
10. Qi Y, Srinivasan R, Teng H, Baker R (2013) Analysis of the
frequency and severity of rear-end crashes in work zones. Traffic
Inj Prev 14:61–72
11. Garber N, Zhao M (2002) Distribution and characteristics of
crashes at different work zone locations in Virginia. Transp Res
Rec 1794:19–25
12. Garber NJ, Woo T-SH (1990) Accident characteristics at con-
struction and maintenance zones in urban areas. No. VTRC
90-R12. Virginia Transportation Research Council
13. Hargroves BT (1981) Vehicle crashes in highway work zones.
J Transp Eng 107:525–539
14. Chambless J, Chadiali AM, Lindly JK, McFadden J (2002)
Multistate work zone crash characteristics. ITE J 72(5):46–50
15. Hall JW, Lorenz VM (1989) Characteristics of construction zone
crashes. In: Transportation research record: journal of the trans-
portation research board. TRB, National Research Council,
Washington, D.C, pp 20–27
16. Ghasemzadeh A, Ahmed MM (2018) Utilizing naturalistic driv-
ing data for in-depth analysis of driver lane-keeping behavior in
rain: non-parametric MARS and parametric logistic regression
modeling approaches. Transp Res Part C Emerg Technol
90:379–392. https://doi.org/10.1016/j.trc.2018.03.018
17. Hammit BE, Ghasemzadeh A, James RM et al (2018) Evaluation
of weather-related freeway car-following behavior using the
SHRP2 naturalistic driving study database. Transp Res Part F
Traffic Psychol Behav 59:244–259
18. Ghasemzadeh A, Ahmed MM (2017) Drivers’ lane-keeping
ability in heavy rain: preliminary investigation using SHRP 2
naturalistic driving study data. Transp Res Rec J Transp Res
Board. https://doi.org/10.3141/2663-13
19. Ghasemzadeh A, Hammit BE, Ahmed MM, Young RK (2018)
Parametric ordinal logistic regression and non-parametric deci-
sion tree approaches for assessing the impact of weather condi-
tions on driver speed selection using naturalistic driving data.
Transp Res Rec. https://doi.org/10.1177/0361198118758035
20. Ahmed MM, Ghasemzadeh A (2018) The impacts of heavy rain
on speed and headway behaviors: an investigation using the
SHRP2 naturalistic driving study data. Transp Res Part C Emerg
Technol 91:371–384. https://doi.org/10.1016/j.trc.2018.04.012
21. Colyar J, Zhang L, Halkias J (2003) Identifying and assessing key
weather-related parameters and their impact on traffic operations
using simulation. In: Institute of Transportation Engineers 2003
Annual Meeting and Exhibit (held in conjunction with ITE Dis-
trict 6 Annual Meeting) Institute of Transportation Engineers
22. Qin X, Noyce D, Lee C, Kinar J (2006) Snowstorm event-based
crash analysis. Transp Res Rec J Transp Res Board
1948:135–141
Complementary parametric probit regression and nonparametric classification tree modeling139
123
J. Mod. Transport. (2019) 27(2):129–140
23. Yu R, Xiong Y, Abdel-Aty M (2015) A correlated random
parameter approach to investigate the effects of weather condi-
tions on crash risk for a mountainous freeway. Transp Res part C
Emerg Technol 50:68–77
24. Huang H, Abdel-Aty M (2010) Multilevel data and Bayesian
analysis in traffic safety. Accid Anal Prev 42:1556–1565
25. Eisenberg D, Warner KE (2005) Effects of snowfalls on motor
vehicle collisions, injuries, and fatalities. Am J Public Health
95:120–124
26. Ahmed M, Abdel-Aty M, Yu R (2012) Assessment of interaction
of crash occurrence, mountainous freeway geometry, real-time
weather, and traffic data. Transp Res Rec J Transp Res Board
2280:51–59
27. Khattak A, Knapp K (2001) Interstate highway crash injuries
during winter snow and nonsnow events. Transp Res Rec J
Transp Res Board 1746:30–36
28. Ahmed M, Abdel-Aty M, Lee J, Yu R (2014) Real-time assess-
ment of fog-related crashes using airport weather data: a feasi-
bility analysis. Accid Anal Prev 72:309–317. https://doi.org/10.
1016/j.aap.2014.07.004
29. Smadi O, Hawkins N, Hans Z, Bektas BA, Knickerbocker S,
Nlenanya I, Souleyrette R, Hallmark S (2015) Naturalistic driving
study: development of the roadway information database. Federal
Highway Administration. SHRP 2 Report No. S2-S04A-RW-1
30. Abdel-Aty M, Abdelwahab H (2004) Modeling rear-end colli-
sions including the role of driver’s visibility and light truck
vehicles using a nested logit structure. Accid Anal Prev
36:447–456
31. Weng J, Meng Q, Wang DZW (2013) Tree-based logistic
regression approach for work Zone Casualty Risk Assessment.
Risk Anal 33:493–504
32. Osei-Bryson K-M, Ngwenyama O (2014) Advances in research
methods for information systems research. Springer, Berlin
33. Weng J, Meng Q (2011) Decision tree-based model for estima-
tion of work zone capacity. Transp Res Rec J Transp Res Board
2257:40–50
34. Kovalerchuk B, Vityaev E (2000) Data mining in finance:
advances in relational and hybrid methods. Springer, Berlin
35. Weng J, Meng Q (2012) Effects of environment, vehicle and
driver characteristics on risky driving behavior at work zones. Saf
Sci 50:1034–1042
36. Agresti A (2007) An introduction to categorical data analysis,
2nd edn. Wiley, New York
37. Kostyniuk LP, Molnar LJ, Eby DW (2000) Are women taking
more risks while driving? A look at Michigan drivers. In:
Women’s travel issues second national conference
38. National Highway Traffic Safety Administration (2013) How
vehicle age and model year relate to driver injury severity in fatal
crashes. US Dep Transp
39. Glassbrenner D (2012) An analysis of recent improvements to
vehicle safety. NHTSA Tech Rep (Report No DOT HS 811 572)
40. DeJong W, Hingson R (1998) Strategies to reduce driving under
the influence of alcohol. Annu Rev Public Health 19:359–378
41. Keall MD, Frith WJ, Patterson TL (2004) The influence of
alcohol, age and number of passengers on the night-time risk of
driver fatal injury in New Zealand. Accid Anal Prev 36:49–61
42. Weng J, Meng Q, Yan X (2014) Analysis of work zone rear-end
crash risk for different vehicle-following patterns. Accid Anal
Prev 72:449–457. https://doi.org/10.1016/j.aap.2014.08.003
43. Hao W, Daniel J (2016) Driver injury severity related to incle-
ment weather at highway–rail grade crossings in the United
States. Traffic Inj Prev 17:31–38
44. American Traffic Safety Services Association (2013) Nighttime
lighting guidelines for work zones. https://www.workzonesafety.
org/files/documents/training/fhwa_wz_grant/night_lighting_
guide.pdf
140 A. Ghasemzadeh, M. M. Ahmed
123 J. Mod. Transport. (2019) 27(2):129–140
... In the United States, traffic congestion in work zone areas accounts for 10% of the total mileage driven, while in Germany, it accounts for 31% [2]. Congestion in work zone areas is mainly caused when drivers fail to change lanes in advance, which means that they must wait at merging points [3]. Due to the limitations of human driving capabilities (a long response time and collaborative autonomous driving among multiple vehicles, also known as multivehicle cooperative driving [8,9]. ...
Article
Full-text available
Work zone areas are frequent congested sections considered as the freeway bottleneck. Connected and autonomous vehicle (CAV) trajectory optimization can improve the operating efficiency in bottleneck areas by harmonizing vehicles’ manipulations. This study presents a joint trajectory optimization of cooperative lane changing, merging, and car-following actions for CAV control at a local merging point together with upstream points. The multiagent reinforcement learning (MARL) method is applied in this system, with one agent providing a merging advisory service at the merging point and controlling the inner-lane vehicles’ headway for smooth outer-lane vehicle merging, while other agents provide lane-changing advisory services at advance lane-changing points to control how vehicles make lane changes in advance and perform corresponding headway adjustment, similar to and jointly with the merging advisory service. Uniting all agents, the coordination graph (CG) method is applied to seek the global optimum, overcoming the exponential growth problem in MARL. Using MATLAB and the VISSIM COM interface, an online simulation platform is established. The simulation results show that MARL is effective for online computation with in-timing response. More importantly, comparisons of the results obtained in various scenarios demonstrate that the proposed system obtained smoother vehicle trajectories in all controlled sections, rather than only in the merging area, indicating that it can achieve better traffic conditions in freeway work zone areas.
... Traditional regression analysis such as logistic and probit regression analysis has been used in previous studies to quantify the effect of various contributing factors on crashes or driver behaviors (Ghasemzadeh and Ahmed, 2017a, 2017b. However, if the data (crash data, driver behavior data, etc.) can be structured in a hierarchical format, then analysis should consider the nested sources of variability. ...
... Since the outcome of crash severity studies can be categorized by discrete severity levels, discrete choice models, such as binary logit and binary probit models (4)(5)(6)(7), multinomial logit models (8)(9)(10)(11), ordered response models (10)(11)(12)(13)(14)(15), and mixed variants of these models (10,16,17) have been applied to develop crash severity prediction models as functions of contributing factors. These models operate under certain distributions; namely, normal (probit) and Gumbel (logit) distributions. ...
Article
Work zones are a high priority issue in the field of road transportation because of their impacts on traffic safety. A better understanding of work zone crashes can help to identify the contributing factors and countermeasures to enhance roadway safety. This study investigates the prediction of work zone crash severity and the contributing factors by employing a parametric approach using the mixed logit modeling framework and a non-parametric machine learning approach using the support vector machine (SVM). The mixed logit model belongs to the class of random parameter models in which the effects of flexible variables across different observations are identified, that is, data heterogeneity is taken into account. The performance of the SVM model is enhanced by applying three metaheuristic algorithms: particle swarm optimization, harmony search, and the whale optimization algorithms. Empirical findings indicate that SVM provides higher prediction accuracy and outperforms the mixed logit model. Estimation results reveal key factors that increase the likelihood of severe work zone crashes. Furthermore, the analysis illustrates the ability of the three metaheuristics to enhance the SVM and the superiority of the harmony search algorithm in improving the performance of the SVM model.
... Data constraining is a common approach that focuses on a very specific segment of the crash dataset. For examples, Kitali (24,25); other specific subjects like crashes in rural or urban arterial (26,27) and crashes involving volatile or older adult drivers (28,29) have also been investigated. ...
Article
Full-text available
Heterogeneity of crash data masks the underlying crash patterns and perplexes crash analysis. This paper aims to explore an advanced high-dimensional clustering approach to investigate heterogeneity in large datasets. Detailed records of crashes involving large trucks occurring in the state of Florida between 2007 and 2016 were examined to identify truck crash patterns and significant conditions contributing to the patterns. The block clustering method was applied to more than 220,000 crash records with nearly 200 attributes. The analysis showed promising results in segmenting a large heterogeneous dataset into meaningful subgroups (with 95.72% average degree of homogeneity for selected blocks). The goodness of fit for clustering methods is evaluated and both integrated completed likelihood (ICL) and pseudo-likelihood values improved significantly (20.8% and 21.1% respectively). Attribute clustering showed distinct characteristics for each cluster. Crash clustering revealed significant differences among the clusters and suggested that this crash dataset could be portioned as same-direction, opposing-direction, and single-vehicle crashes. Individual blocks defined by both row and column clustering were further investigated to better understand the contribution set of conditions that lead to large truck crashes. Major features for each of the three major types of crashes were analyzed, which may provide additional insights to develop potential countermeasures and strategies that target specific segments. The clustering approach could be used as a preanalysis method to identify homogeneous subgroups for further analysis, which will help enhance the effectiveness of safety programs.
... A number of studies have been conducted on work zone crashes. Results from these studies showed that crash rate and frequency are increased by the presence of work zones [2][3][4][5][6]. Weng et al. [7] used association rules to investigate factors that might have a significant effect on work zone crash casualties. ...
Article
Full-text available
Identifying risk factors for road traffic injuries can be considered one of the main priorities of transportation agencies. More than 12,000 fatal work zone crashes were reported between 2000 and 2013. Despite recent efforts to improve work zone safety, the frequency and severity of work zone crashes are still a big concern for transportation agencies. Although many studies have been conducted on different work zone safety-related issues, there is a lack of studies that investigate the effect of adverse weather conditions on work zone crash severity. This paper utilizes probit–classification tree, a relatively recent and promising combination of machine learning technique and conventional parametric model, to identify factors affecting work zone crash severity in adverse weather conditions using 8 years of work zone weather-related crashes (2006–2013) in Washington State. The key strength of this technique lies in its capability to alleviate the shortcomings of both parametric and nonparametric models. The results showed that both presence of traffic control device and lighting conditions are significant interacting variables in the developed complementary crash severity model for work zone weather-related crashes. Therefore, transportation agencies and contractors need to invest more in lighting equipment and better traffic control strategies at work zones, specifically during adverse weather conditions.
... school zones), they found high variability in driver behavioral factors contributed to different types and designs of the roadway [9]. In fact, there are many studies that investigated different environmental factors on driver behavior including weather conditions [10], work zones [11], and roadway characteristics [12], [13] as well as many other factors. However, trajectory-level analysis of driver behavior, which can consider driver behavior across time and different spatial locations with various geometric characteristics have not been well investigated in previous studies. ...
Article
Full-text available
Identifying factors that are associated with the probability of roadside work zone collisions enables decision makers to better assess and control the risk of scheduling a particular maintenance or construction activity by modifying the characteristics of the operation. This can be achieved by studying the effect of work zone properties on the risk of roadside work zone collisions. Much of the existing work in this area is based on data in the police traffic collision reports, which do not include data on the characteristics of the work zone itself. This paper develops a comprehensive data set of 42 features describing time, location, work zone characteristics, traffic volume, and road properties. Using recent machine learning techniques such as extreme gradient boosting classifiers on this extensive set of features allows for more accurate analysis to identify factors that affect the risk of work zone collisions or indicate higher than baseline chances of a roadside crash. Our statistical analysis reveals 10 important features and shows that four of these features are significantly associated with higher probabilities of roadside work zone collisions.
Article
Introduction: Reducing the severity of crashes is a top priority for safety researchers due to its impact on saving human lives. Because of safety concerns posed by large trucks and the high rate of fatal large truck-involved crashes, an exploration into large truck-involved crashes could help determine factors that are influential in crash severity. The current study focuses on large truck-involved crashes to predict influencing factors on crash injury severity. Method: Two techniques have been utilized: Random Parameter Binary Logit (RPBL) and Support Vector Machine (SVM). Models have been developed to estimate: (1) multivehicle (MV) truck-involved crashes, in which large truck drivers are at fault, (2) MV track-involved crashes, in which large truck drivers are not at fault and (3) and single-vehicle (SV) large truck crashes. Results: Fatigue and deviation to the left were found as the most important contributing factors that lead to fatal crashes when the large truck-driver is at fault. Outcomes show that there are differences among significant factors between RPBL and SVM. For instance, unsafe lane-changing was significant in all three categories in RPBL, but only SV large truck crashes in SVM. Conclusions: The outcomes showed the importance of the complementary approaches to incorporate both parametric RPBL and non-parametric SVM to identify the main contributing factors affecting the severity of large truck-involved crashes. Also, the results highlighted the importance of categorization based on the at-fault party. Practical Applications: Unrealistic schedules and expectations of trucking companies can cause excessive stress for the large truck drivers, which could leads to further neglect of their fatigue. Enacting and enforcing comprehensive regulations regarding large truck drivers' working schedules and direct and constant surveillance by authorities would significantly decrease large truck-involved crashes.
Article
Adverse weather conditions play a considerable role in the safety and efficiency of the transportation network. Many studies have aimed to quantify the impact that different weather conditions have on transportation safety and mobility; however, most studies have evaluated the network capacity, average speed, and other macroscopic measures without capturing specific driving characteristics. In order to understand specific driving behavior and performance characteristics that exist during different environmental conditions, high resolution vehicle data and video footage are required. The SHRP2 sponsored the generation of a large Naturalistic Driving Study (NDS) database – which provides vehicle time series data, front and rear video, driver video, external sensor readings, and driver surveys – and the Roadway Information Database (RID) – which is a complementary database with geospatial data for commonly driven roads in the NDS and other ancillary data sources, including annual traffic, roadway geometry, accident reports, weather conditions, and 511 alerts. The purpose of this study is to leverage these SHRP2 databases and weather data from the National Climatic Data Center (NCDC) to extract trips that occur during adverse weather conditions. The extraction of weather-related trips from a NDS is unprecedented, and this study presents three complementary methodologies used in parallel to acquire relevant trips from the SHRP2 NDS database. A semi-automated data reduction procedure was developed to process the raw trip files into a format that further analysis and modeling could be completed. This novel approach to NDS trip acquisition and reduction could be extended to other naturalistic driving studies worldwide.
Thesis
Full-text available
Inclement weather conditions such as fog, snow, ground blizzards, slush, rain, and strong winds negatively affect pavement condition, vehicle performance, visibility, and driver behavior and performance. Driver behavior exhibits high variability and is difficult to quantify, particularly in inclement weather conditions. Driver behavior and performance are imperative to understand when describing the influence of adverse weather conditions on roadways’ safety and mobility. Adverse weather inhibits driver’s ability to perceive their environment, and visibility reductions – caused by adverse weather events – is known to increase the likelihood of crashes. The effects of adverse weather on safe and efficient operations of transportation networks have been extensively researched; however, specific considerations of driver behavior and performance are noticeably absent from these studies. The Second Strategic Highway Research Program (SHRP2) Naturalistic Driving Study (NDS) and Roadway Information Database (RID) provide an unprecedented opportunity for researchers to better understand driving behavior. Having identified the potential benefits and value of this unique dataset, this dissertation aimed to evaluate methods by which the data could be used to study driver behavior and performance characteristics during adverse weather conditions. This dissertation consists of five published or presented papers in which addresses different gaps in the knowledge by presenting innovative methods to identify and analyze weather-related naturalistic driving data to better understand driver behavior and performance in adverse weather 2 conditions. An innovative methodology introduced in Chapter 4 helped to effectively identify weather-related trips in real-time using vehicle wiper status and other complementary methodologies introduced in chapter 5 helped to identify naturalistic driving weather-related trips using external weather data sources. In addition, a semi-automated data reduction procedure was developed and introduced in chapter 5 to process raw trip data files into a format that further analyses and modeling techniques could be easily applied. The novel approaches developed in this dissertation for NDS trip acquisition and reduction could be extended to other naturalistic driving studies worldwide. In addition to the contributions in data extraction and reduction, preliminary analysis as well as advanced modeling techniques were utilized in this study. These analyses were used to explain the relationship between different levels of speed selection/lane keeping behaviors and a set of contributing factors including roadway characteristics, environmental and traffic conditions and driver demographics on a trajectory level. These modeling techniques ranged from common parametric approaches such as binary logistic regression and ordinal logistic/probit regression models to a more advanced non-parametric/data mining modeling techniques such as Classification and Regression Trees (CART) and Multivariate Adaptive Regression Splines (MARS). The results from this study suggest that both parametric and non-parametric modeling approaches are important to analyze driver behavior and performance. In fact, this study attempted to maximize the benefits out of the advantages of parametric models, such as the ability of interpreting the marginal effects of various risk factors, as well as the advantages of using nonparametric models, including but not limited to the ability of providing high prediction accuracy, handling of missing values automatically, and their capability of handling large number of 3 explanatory variables in a timely manner, which might be extremely beneficial specifically for assessing traffic operations and safety in real-time considering weather and traffic data to be directly fed into the model. The specific data mining modeling techniques used in this study have an additional advantage compared to most of the data mining/ machine learning techniques. Unlike many other nonparametric models, CART and MARS functions are interpretable and transparent; therefore, these models do not have the “black box” problem known for other machine learning techniques. The results of the developed speed selection models revealed that among various adverse weather conditions, drivers were more likely to reduce their speed in snowy weather conditions compared to other adverse weather conditions. Specifically, the odds of drivers reducing their speed were 9.29 times higher in snowy weather conditions, followed by rain and fog with 1.55 and 1.29 times, respectively (compared to clear conditions). In addition, variable importance analysis using CART method revealed that weather conditions, traffic conditions, and posted speed limit are the three most important variables affecting driver speed selection behavior. In addition, the results of the developed lane-keeping models revealed that drivers in heavy rain conditions were 3.95 times more likely to have a worse lane-keeping performance compared to clear weather conditions. The developed speed selection model is a key example of a derived mechanism by which the SHRP2 database can be leveraged to improve Weather Responsive Traffic Management (WRTM) strategies directly. Moreover, the results may shed some light on driver lane keeping behavior at a trajectory level. Moreover, a better understanding of driver lane-keeping behavior might help in developing better Lane Departure Warning (LDW) systems. Evaluating driver behavior and performance under the influence of reduced visibility due to adverse weather conditions is extremely important to develop safe driving strategies, including Variable Speed Limits (VSL). Many roadways across the U.S. currently have weather-based VSL systems to ensure safe driving environments during adverse weather. Current VSL systems mainly collect traffic information from external sources, including inductive loop detector, overhead radars and Closed Circuit Television (CCTV). However, human factors especially driver behavior and performance such as selection of speed and acceleration/ deceleration behaviors during adverse weather are neglected due to the lack of appropriate driver data. The findings from this study indicated that the SHRP2NDS data could be effectively utilized to identify trips in adverse weather conditions and to assess the impacts of adverse weather on driver behavior and performance. With the evolution of connected vehicles, Machine Vision and other real-time weather social crowd sources such as WeatherCloud®, more accurate real-time data similar to the NDS data will be available in the near future. This study provided early insights into using similar data collected from NDS.
Conference Paper
Adverse weather can have a dramatic impact on traffic operations. Motorists drive more cautiously, vehicles cannot start and stop as quickly, and pavement markings and traffic signs can be blocked during a weather event. Such factors can all add up to significantly degraded traffic operations during adverse weather when compared to normal conditions. The purpose of this paper is to identify which traffic parameters in the CORSIM microscopic simulation model are likely affected by adverse weather and to assess which of these parameters most affect the quality of traffic flow. Past research was used to help identify which traffic parameters are affected by adverse weather. Then, a sensitivity test of the numerous freeway car following and lane changing parameters on various congestion levels and roadway configurations was performed to demonstrate which parameters most affect various measures of effectiveness. Overall, this paper presents a planning-level guide to help traffic analysts determine which parameters should likely be changed when modeling weather events. Also, the results from the sensitivity tests show that the Car Following Sensitivity Multiplier and Time to Complete Lane Change parameters are the most sensitive car following and lane changing parameters, respectively, and the ones that produced the most consistent and expected impacts on traffic operations. In addition, the sensitivity study confirms past research that the Mean Free Flow Speed parameter is a crucial parameter to alter when modeling weather events on freeway networks
Research
Work zone safety is currently a major concern to transportation and highway engineers because of the relatively higher rates of accidents in these areas. There is a strong indication that during the next decade, emphasis will be placed on maintenance and rehabilitation of the nation's highways rather than on the construction of new highways. This will result in many more work zones. Unless effective measures are taken to increase safety in these work zones, a significant increase in accident rates will occur. This national phenomenon is also currently being observed in Virginia, where data have indicated that total and fatal accidents are over-represented in urban work zones. A clear understanding of work zone accident characteristics and traffic control devices is however needed to facilitate the development of effective guidelines that will significantly improve safety at urban work zones. Therefore, the Virginia Transportation Research Council undertook a study to determine accident characteristics at urban work zones and to evaluate the effectiveness of traffic control devices in reducing accident rates. All of the sites considered were within urban areas and have no access control; average speeds were between 25 and 48 mph. The traffic control devices evaluated do not include warning signs placed some distance ahead to inform motorists of the approaching work zones. The results indicate that the major influencing factor on accident rates during the construction period on urban multilane highways is the accident rate just prior to the construction period. Also, the use of appropriate traffic control devices has a positive effect on safety in urban work zones, but the effectiveness depends on the type of traffic control used. The results also show that the most effective combination on multilane highways consists of flashing arrows, a flagger, and cones. On two-lane highways, flagmen and either cones, barricades, or static signs are most effective. Any combination not including flagmen, is less effective on two-lane highways than one including flagmen.
Article
This study investigated the impact of snowstorms on a roadway system and assessed winter maintenance efforts in improving highway safety from a macroscopic perspective. The Wisconsin State Trunk Highway System was used as the data source. An inverse relationship between deicing material consumption and crash counts during snowstorms is reflected in the analysis; it also implies multiple causes of snowstorm crashes. Primary factors considered in this research were the effects of various weather conditions and winter maintenance. Results showed a mixed influence of both the snowstorm severity and winter maintenance investment on road safety. That is, severity of the snowstorm in regard to duration, intensity, and wind speed increases traffic crashes and casualties with simultaneous consumption of more deicing materials and labors. The research also explicitly proves that a proactive winter maintenance effort significantly improves traffic safety. Temporal distribution of crash occurrence during a snowstorm shows that a large percentage of crashes occurred during initial stages of the snowstorms, probably because snow removal activities had not yet begun. The pattern is quite similar for state-maintained highways and local roads, except that in the second half of a snowstorm, a higher percentage of crashes occur on local roads than on state highways, possibly suggesting that the different level of maintenance and use of deicing materials may play an important role. Additional research is required to quantify the effects of deicing materials specifically, particularly in regard to application rate and frequency, on highway safety.
Article
The impact of adverse weather conditions on transportation operation and safety is the focus of many studies; however, comprehensive research detailing the differences in driving behavior and performance during adverse conditions is limited. Many previous studies utilized aggregate traffic and weather data (e.g., average speed, headway, and global weather information) to formulate conclusions about the impact of weather on network operation and safety; however, research into specific factors associated with driver performance and behavior are notably absent. A novel approach, presented in this paper, fills this gap by considering disaggregate trajectory-level data available through the SHRP2 Naturalistic Driving Study and Roadway Information Database. Parametric ordinal logistic regression and non-parametric classification tree modeling were utilized to better understand speed selection behavior in adverse weather conditions. The results indicate that the most important factors impacting driver speed selection are weather conditions, traffic conditions, and the posted speed limit. Moreover, it was found that drivers are more likely to significantly reduce their speed in snowy weather conditions, as compared with other adverse weather conditions (such as rain and fog). The purpose of this study was to gather insights into driver speed preferences in different weather conditions, such that efficient logic can be introduced for a realistic variable speed limit system—aimed at maximizing speed compliance and reducing speed variations. This study provides valuable information related to drivers’ interaction with real-time changes in roadway and weather conditions, leading to a better understanding of the effectiveness of operational countermeasures.
Article
It is known that adverse weather conditions can affect driver performance due to reduction in visibility and slippery surface conditions. Lane keeping is one of the main factors that might be affected by weather conditions. Most of the previous studies on lane keeping have investigated driver lane-keeping performance from driver inattention perspective. In addition, the majority of previous lane-keeping studies have been conducted in controlled environments such as driving simulators. Therefore, there is a lack of studies that investigate driver lane-keeping ability considering adverse weather conditions in naturalistic settings. In this study, the relationship between weather conditions and driver lane-keeping performance was investigated using the SHRP2 naturalistic driving data for 141 drivers between 19 and 89 years of age. Moreover, a threshold was introduced to differentiate lane keeping and lane changing in naturalistic driving data. Two lane-keeping models were developed using the logistic regression and multivariate adaptive regression splines (MARS) to better understand factors affecting driver lane-keeping ability considering adverse weather conditions. The results revealed that heavy rain can significantly increase the standard deviation of lane position (SDLP), which is a very widely used method for analyzing lane-keeping ability. It was also found that traffic conditions, driver age and experience, and posted speed limits have significant effects on driver lane-keeping ability. An interesting finding of this study is that drivers have a better lane-keeping ability in roadways with higher posted speed limits. The results from this study might provide better insights into understanding the complex effect of adverse weather conditions on driver behavior.
Article
There is a lack of studies that have examined the impact of weather conditions on drivers’ lane-keeping performance. Many driver behavior studies have been conducted in simulated environments. However, no studies have examined the impact of heavy rain on lane-keeping ability in naturalistic settings. A study used data from the SHRP 2 Naturalistic Driving Study to provide better insights into driver behavior and performance in clear and rainy weather conditions. In particular, a lane-keeping model was developed using logistic regression to better understand factors affecting drivers’ lane-keeping ability in different weather conditions. One interesting finding of this research is that heavy rain can significantly increase the standard deviation of lane position, which is a widely used method for analyzing lane-keeping ability. More specifically, drivers in heavy rain are 3.8 times more likely to show a higher standard deviation of lane position than in clear weather condition. An additional interesting finding is that drivers have better lane-keeping abilities in roadways with higher posted speed. Results from this study could provide a better understanding of the complex effects of weather conditions on drivers’ lane-keeping ability and how drivers perceive and react in different weather conditions. Results from this study may also provide insights into automating the activation and deactivation of lane departure warning systems.