ArticlePDF Available

Construction Safety Risk Modeling and Simulation

Wiley
Risk Analysis
Authors:

Abstract and Figures

By building on a recently introduced genetic-inspired attribute-based conceptual framework for safety risk analysis, we propose a novel methodology to compute univariate and bivariate construction safety risk at a situational level. Our fully data-driven approach provides construction practitioners and academicians with an easy and automated way of extracting valuable empirical insights from databases of unstructured textual injury reports. By applying our methodology on an attribute and outcome dataset directly obtained from 814 injury reports, we show that the frequency-magnitude distribution of construction safety risk is very similar to that of natural phenomena such as precipitation or earthquakes. Motivated by this observation, and drawing on state-of-the-art techniques in hydroclimatology and insurance, we introduce univariate and bivariate nonparametric stochastic safety risk generators, based on Kernel Density Estimators and Copulas. These generators enable the user to produce large numbers of synthetic safety risk values faithfully to the original data, allowing safety-related decision-making under uncertainty to be grounded on extensive empirical evidence. Just like the accurate modeling and simulation of natural phenomena such as wind or streamflow is indispensable to successful structure dimensioning or water reservoir management, we posit that improving construction safety calls for the accurate modeling, simulation, and assessment of safety risk. The underlying assumption is that like natural phenomena, construction safety may benefit from being studied in an empirical and quantitative way rather than qualitatively which is the current industry standard. Finally, a side but interesting finding is that attributes related to high energy levels and to human error emerge as strong risk shapers on the dataset we used to illustrate our methodology.
Content may be subject to copyright.
Risk Analysis DOI: 10.1111/risa.12772
Construction Safety Risk Modeling and Simulation
Antoine J.-P. Tixier,1,Matthew R. Hallowell,2and Balaji Rajagopalan2
By building on a genetic-inspired attribute-based conceptual framework for safety risk anal-
ysis, we propose a novel approach to define, model, and simulate univariate and bivariate
construction safety risk at the situational level. Our fully data-driven techniques provide con-
struction practitioners and academicians with an easy and automated way of getting valuable
empirical insights from attribute-based data extracted from unstructured textual injury re-
ports. By applying our methodology on a data set of 814 injury reports, we first show the
frequency-magnitude distribution of construction safety risk to be very similar to that of
many natural phenomena such as precipitation or earthquakes. Motivated by this observa-
tion, and drawing on state-of-the-art techniques in hydroclimatology and insurance, we then
introduce univariate and bivariate nonparametric stochastic safety risk generators based on
kernel density estimators and copulas. These generators enable the user to produce large
numbers of synthetic safety risk values faithful to the original data, allowing safety-related
decision making under uncertainty to be grounded on extensive empirical evidence. One of
the implications of our study is that like natural phenomena, construction safety may bene-
fit from being studied quantitatively by leveraging empirical data rather than strictly being
approached through a managerial perspective using subjective data, which is the current in-
dustry standard. Finally, a side but interesting finding is that in our data set, attributes related
to high energy levels (e.g., machinery, hazardous substance) and to human error (e.g., im-
proper security of tools) emerge as strong risk shapers.
KEY WORDS: Construction safety; risk modeling; stochastic simulation
1. INTRODUCTION AND MOTIVATION
Despite the significant improvements that have
followed the inception of the Occupational Safety
and Health Act of 1970, safety performance has
reached a plateau in recent years and the construc-
tion industry still suffers from a disproportionate ac-
cident rate. Fatalities in construction amounted to
885 in 2014, the highest count since 2008.(1) In ad-
dition to dreadful human costs, construction injuries
1Computer Science Laboratory, ´
Ecole Polytechnique, Palaiseau,
France.
2Department of Civil, Environmental, and Architectural Engi-
neering, CU Boulder, USA.
Address correspondence to Antoine J.-P. Tixier, Postdoctoral
Researcher, Computer Science Laboratory, ´
Ecole Polytechnique,
Palaiseau, France; antoine.tixier-1@colorado.edu.
are also associated with huge direct and indirect eco-
nomic impacts.
A very large portion of construction work, up-
stream or downstream of groundbreaking, involves
making safety-related decisions under uncertainty.
Partly due to their limited personal history with acci-
dents, even the most experienced workers and safety
managers may miss hazards and underestimate the
risk of a given construction situation.(2,3) Design-
ers face an even greater risk of failing to recognize
hazards and misestimating risk.(2) In addition, when
uncertainty is involved, humans often recourse to
personal opinion and intuition to apprehend their
environment. This process is fraught with numer-
ous biases and misconceptions inherent to human
cognition(4) and compounds the likelihood of misdi-
agnosing the riskiness of a situation.
10272-4332/17/0100-0001$22.00/1 C2017 Society for Risk Analysis
2Tixier, Hallowell, and Rajagopalan
010⋯101
010⋯000
100⋯001
011⋯010
⋮⋮⋮⋱⋮⋮⋮
010⋯011
000⋯100
000⋯001
010⋯010
34
13
25
45
⋮⋮
33
24
12
34
Pbinary attributes
X1X2X3XP
report1
reportR
report2
report3
severity
injury
reports
data set of R = 814 injury reports, P = 77 attributes
(present=1, absent=0), and real and worst severity
univariate and bivariate risk
modeling, simulation, and estimation
raw empirical
data
natural
language
processing
Risk modeling
& simulation
real worst
Fig. 1. Overarching research process: from raw injury reports to safety risk analysis.
Therefore, it is of paramount importance to
provide construction practitioners with tools to
mitigate the adverse consequences of uncertainty
on their safety-related decisions. In this study, we
focus on leveraging situational data extracted from
raw textual injury reports to guide and improve
construction situation risk assessment. Our method-
ology facilitates the augmentation of construction
personnel’s experience and grounds risk assessment
on potentially unlimited amounts of empirical and
objective data. In other words, our approach com-
bats construction risk misdiagnosis on two fronts, by
jointly addressing both the limited personal history
and the judgment bias problems previously evoked.
We used fundamental construction attribute
data extracted by a highly accurate natural language
processing (NLP) system(5) from a database of 921
injury reports provided by a partner company en-
gaged in industrial construction projects worldwide.
Attributes are context-free universal descriptors of
the work environment that are observable prior to
injury occurrence. They relate to environmental con-
ditions, construction means and methods, and human
factors, and provide a unified, standardized way of
describing any construction situation. To illustrate,
one can extract four attributes from the following
text: “worker is unloading a ladder from pickup
truck with bad posture”: ladder, manual handling,
light vehicle, and improper body positioning. Be-
cause attributes can be used as leading indicators
of construction safety performance,(6) they are also
called injury precursors. In what follows, we will
use the two terms interchangeably. Drawing from
national databases, Esmaeili and Hallowell(7,8)
initially identified 14 and 34 fundamental attributes
from 105 fall and 300 struck-by high-severity injury
cases, respectively. In this study, we used a refined
and broadened list of 80 attributes carefully engi-
neered and validated by Prades(9) and Desvignes(10)
from analyzing a large database of 2,201 reports
featuring all injury types and severity levels.
A total of 107 of 921 reports were discarded be-
cause they either were not associated with any at-
tribute or because the real outcome was unknown.
Additionally, 3 attributes out of 80 (pontoon, soffit,
and poor housekeeping) were removed because they
did not appear in any report. This gave us a final ma-
trix of R =814 reports by P =77 attributes. Al-
though other related studies concerned themselves
with predictive modeling,(6) here we focus on defin-
ing, modeling, and simulating attribute-based con-
struction safety risk. The overall study pipeline is
summarized in Fig. 1.
The contributions of this study are fourfold: (1)
we formulate an empirically-grounded definition of
construction safety risk at the attribute level, and ex-
tend it to the situational level, both in the univariate
and the bivariate case; (2) we show how to model
risk using kernel density estimators (KDE); (3) we
observe that the frequency-magnitude distribution
of risk is heavy-tailed, and resembles that of many
natural phenomena; and finally (4) we introduce
univariate and bivariate nonparametric stochastic
generators based on kernels and copulas to draw
conclusions from much larger samples and better
estimate construction safety risk.
Construction Safety Risk Modeling and Simulation 3
Table I. Counts of Injury Severity Levels Accounted for by Each Precursor
Severity Levels
Precursors s1=Pain s2=1st Aid
s3=Medical Case/Lost
Work Time
s4=Permanent
Disablement s5=Fatality
X1n11 n12 n13 n14 n15
X2n21 n22 n23 n24 n25
... ... ... ... ... ...
XP1n(P1)1 n(P1)2 n(P1)3 n(P1)4 n(P1)5
XPnP1 nP2 nP3 nP4 nP5
Table II. Severity Level Impact Scores Adapted from Hallowell
and Gambatese(16)
Severity Level (s) Severity Scores (Ss)
Pain S1=12
1st aid S2=48
Medical case/lost work time S3=192
Permanent disablement S4=1,024
Fatality S5=26,214
2. BACKGROUND AND POINT
OF DEPARTURE
The vast majority of construction safety risk
analysis studies use opinion-based data,(9) and thus
rely on the ability of experts to rate the relative
magnitude of risk based on their professional expe-
rience. This approach suffers two main limitations.
First, prior ranges are very often provided by re-
searchers to bound risk values. Second, and more im-
portantly, even the most experienced experts have
limited personal history with hazardous situations,
and their judgment under uncertainty suffers the
same cognitive limitations as that of any other hu-
man being,(11) such as overconfidence, anchoring,
availability, representativeness, unrecognized limits,
motivation, and conservatism.(11–13) It was also sug-
gested that gender(14) and even emotional state(15)
impact risk perception. Even if it is possible to some-
what alleviate the negative impact of adverse psy-
chological factors,(16) the reliability of data obtained
from expert opinion is questionable. Conversely,
truly objective empirical data, like the injury reports
used in this study, seem superior.
Due to the technological and organizational
complexity of construction work, most safety risk
studies assume for simplicity that construction pro-
cesses can be decomposed into smaller parts.(17)
Such decomposition allows researchers to model risk
for a variety of units of analysis, like specific tasks
and activities.(18–20) Most commonly, trade-level risk
analysis has been adopted.(21–23) The major limita-
tion of these segmented approaches is that because
each one considers a trade, task, or activity in isola-
tion, it is impossible for the end user to comprehen-
sively characterize onsite risk in a standard, robust,
and consistent way.
Some studies attempted to overcome these
limitations. For instance, Shapira and Lyachin(24)
quantified risks for generic factors related to tower
cranes such as type of load or visibility, thereby
allowing safety risk modeling for any crane situation.
Esmaeili and Hallowell(7,8) went a step further by
introducing a novel conceptual framework allowing
any construction situation to be fully and objectively
described by a unique combination of fundamental
context-free attributes of the work environment.
This attribute-based approach is powerful in that
it shows possible the extraction of structured
standard information from naturally occurring,
unstructured textual injury reports. Additionally,
the universality of attributes allows to capture the
multifactorial nature of safety risk in the same
unified way for any task, trade, or activity, which
is a significant improvement over traditional seg-
mented studies. However, manual content analysis
of injury reports is expensive and fraught with data
consistency issues. For this reason, Tixier et al.(5)
introduced an NLP system capable of automatically
detecting the attributes presented in Table III
and various safety outcomes in injury reports with
more than 95% accuracy (comparable to human
performance), enabling the large-scale use of Es-
maeili and Hallowell’s attribute-based framework.
The data we used in this study were extracted by the
aforementioned NLP tool.
4Tixier, Hallowell, and Rajagopalan
Table III. Relative Risks and Counts of the P=77 Injury Precursors
Risk Based on Risk Based on
Real
Worst
Possible Real
Worst
Possible
Precursor ne(%) Outcomes Precursor ne(%) Outcomes
Concrete 29 41 7 96 Unstable support/surface 3 32 1 2
Confined workspace 21 2 115 336 Wind 29 37 6 16
Crane 16 12 22 76 Improper body position 7 25 3 6
Door 17 21 11 174 Imp. procedure/inattention 13 16 10 44
Sharp edge 8 38 2 5 Imp. security of materials 78 12 77 1007
Formwork 22 5 63 135 Insect 19 18 8 21
Grinding 16 16 11 34 No/improper PPE 3 67 0* 1
Heat source 11 20 4 13 Object on the floor 41 43 9 22
Heavy material/tool 29 30 11 247 Lifting/pulling/handling 141 31 49 439
Heavy vehicle 12 12 12 307 Cable tray 9 27 4 11
Ladder 23 14 15 52 Cable 8 33 1 3
Light vehicle 31 59 7 123 Chipping 4 16 1 4
Lumber 69 14 53 158 Concrete liquid 8 41 2 4
Machinery 40 8 67 3159 Conduit 11 31 4 14
Manlift 8 8 16 50 Congested workspace 2 32 0* 1
Object at height 14 50 4 136 Dunnage 2 16 1 3
Piping 74 38 19 141 Grout 3 41 1 1
Scaffold 91 33 28 74 Guardrail handrail 16 40 4 8
Stairs 28 41 8 25 Job trailer 2 59 0* 1
Steel/steel sections 112 35 33 281 Stud 4 41 1 5
Rebar 33 4 76 251 Spool 9 33 2 9
Unpowered transporter 13 9 23 401 Stripping 12 22 7 18
Valve 24 27 9 22 Tank 16 31 5 115
Welding 25 22 10 34 Drill 16 43 5 88
Wire 30 43 5 19 Bolt 36 41 7 27
Working at height 73 40 18 46 Cleaning 22 56 5 12
Wkg below elev. wksp/mat. 7 17 3 21 Hammer 33 50 5 18
Forklift 11 9 9 380 Hose 11 41 3 8
Hand size pieces 38 47 7 95 Nail 15 50 4 10
Hazardous substance 33 1 590 6648 Screw 7 50 1 2
Adverse low temperatures 33 3 101 292 Slag 10 10 8 32
Mud 6 6 9 20 Spark 1 12 2 11
Poor visibility 3 23 2 3 Wrench 23 39 5 23
Powered tool 32 27 12 54 Exiting/transitioning 25 49 6 17
Slippery surface 32 25 13 40 Splinter/sliver 9 44 1 2
Small particle 96 31 28 105 Working overhead 5 40 1 3
Unpowered tool 102 44 24 352 Repetitive motion 2 51 0* 1
Electricity 1 33 0* 1 Imp. security of tools 24 22 12 314
Uneven surface 33 32 11 129
*Values are rounded up to the nearest integer.
3. UNIVARIATE ANALYSIS
3.1. Attribute-Level Safety Risk
Following Baradan and Usmen(21) we defined
construction safety risk as the product of frequency
and severity as shown in Equation (1). More pre-
cisely, in our approach, the safety risk Rpaccounted
for by precursorp(or XPin Tables I) was computed
as the product of the number nps of injuries at-
tributed to precursorpfor the severity level s (given
by Table II) and the impact rating Ssof this severity
level (given by Table II, and based on Hallowell and
Gambatese(16)). We considered five severity levels,
s1=Pain, s2=First Aid, s3=Medical Case/Lost Work
Time, s4=Permanent Disablement, and s5=Fatality.
Medical Case and Lost Work Time were merged
Construction Safety Risk Modeling and Simulation 5
because differentiating between these two severity
levels was not possible based only on the information
available in the narratives and associated databases.
Equation (1) shows construction safety risk.
risk =frequency ×severity.(1)
The total amount of risk that can be attributed
to precursorpwas then obtained by summing the risk
values attributed to this precursor across all severity
levels, as shown in Equation (2):
Rp=
5
s=1nps Ss,(2)
where nps is the number of injuries of severity level s
attributed to precursorp, and Ssis the impact score of
severity level s.
Finally, as noted by Sacks et al.,(25) risk analy-
sis is inadequate if the likelihood of worker expo-
sure to specific hazards is not considered. Hence, the
risk Rpof precursorpwas weighted by its probability
of occurrence ep(see Equation (3)), which gave the
relative risk RRpof precursorp. The probabilities ep,
or exposure values, were provided by the same com-
pany that donated the injury reports. These data are
constantly being recorded by means of observation as
part of the firm’s project control and work character-
ization policy and therefore were already available.
RRp=1
ep×Rp=1
ep
5
s=1npsss,(3)
where Rpis the total amount of risk associated with
precursorp, and epis the probability of occurrence of
precursorponsite.
To illustrate the notion of relative risk, as-
sume that the precursor lumber has caused 15
first aid injuries, 10 medical cases and lost work
time injuries, and has once caused a permanent
disablement. By following the steps outlined
above, the total amount of risk Rlumber accounted
for by the attribute lumber can be computed as
15 ×48 +10 ×192 +1×1,024 =3,664. More-
over, if lumber is encountered frequently onsite, e.g.,
with an exposure value elumber =0.65, the relative
risk of lumber will be RRlumber =3,664/0.65 =
5,637. However, if workers are very seldom exposed
to lumber (e.g., elumber =0.07), RRlumber will be
equal to 3,664/0.07 =52,343. It is clear from this
example that if two attributes have the same total
risk value, the attribute having the lowest exposure
value will be associated with the greatest relative
risk. The assumption is that if a rare attribute causes
as much damage as a more common one, the rare
attribute should be considered riskier by proportion.
Note that relative risk values allow comparison
but do not have an absolute physical meaning. As
presented later, what matters more than the precise
risk value itself is the range in which a value falls.
Also, note that since Tixier et al.(5)’s NLP
tool’s functionality did not include injury severity
extraction at the time of writing, we used the real
and worst possible outcomes manually assessed for
each report by Prades.(9) Specifically, in Prades,(9) a
team of seven researchers analyzed a large database
of injury reports over the course of several months.
High output quality was ensured by using a harsh
95% intercoder agreement threshold, peer reviews,
calibration meetings, and random verifications by
an external reviewer. Regarding worst possible
injury severity, human coders were asked to use
their judgment of what would have happened in
the worst-case scenario should a small translation in
time and/or space had occurred. This method and the
resulting judgments were later validated by Alexan-
der et al.,(26) who showed that the human assessment
of maximum possible severity was congruent with
the quantity of energy in the situation, which, ulti-
mately, is a reliable predictor of the worst possible
outcome.
For instance, in the following excerpt of an in-
jury report: “worker was welding below scaffold and
a hammer fell from two levels above and scratched
his arm,” the real severity is a first aid. However, by
making only a small translation in space, the ham-
mer could have struck the worker in the head, which
could have yielded a permanent disablement or even
a fatality. Coders in Prades(9) were asked to favor
the most conservative choice. Thus, in this case, per-
manent disablement was selected. Whenever mental
projection was impossible or required some degree of
speculation, coders were required to leave the field
blank and the reports were subsequently discarded.
As indicated, these subjective assessments were em-
pirically validated.(26)
By considering severity counts for both real out-
comes and worst possible outcomes, we could com-
pute two relative risk values for each of the 77 pre-
cursors. These values are listed in Table III, and were
stored in two vectors of length P=77.
For each attribute, we computed the difference
between the relative risk based on worst possible
outcomes and the relative risk based on actual
outcomes. The top 10% attributes for this metric are
6Tixier, Hallowell, and Rajagopalan
hazardous substance (=6,059), machinery (3,092),
improper security of materials (930), lifting/pulling/
manual handling (390), unpowered transporter
(378), forklift (371), unpowered tool (328), impro-
per security of tools (302), and heavy vehicle (295).
These attributes can be considered as the ones
giving a construction situation the greatest po-
tential for severity escalation in the worst-case
scenario. Except lifting/pulling/manual handling
and unpowered tool, all these precursors are di-
rectly associated with human error or high energy
levels, which corroborates recent findings.(26) Fur-
thermore, one could argue that the attributes
lifting/pulling/manual handling and unpowered tool
are still related to human error and high en-
ergy levels, as the former is often associated
with improper body positioning (human factor)
whereas the latter usually designates small and
hand-held objects (hammer, wrench, screwdriver,
etc.) that are prone to falling from height (high
energy). Many attributes in Table III, such as
sharp edge, manlift, unstable support/surface, or
improper body position, have low risk values be-
cause of their rarity in the rather small data set
that we used to provide a proof of concept for our
methodology, but this does not incur any loss of
generality.
3.2. Report-Level Safety Risk
As shown in Equation (4), we define safety risk
at the situational level as the sum of the risk values
of all the attributes that were identified as present in
the corresponding injury report.
Rreportr=
P
p=1RRp·δrp,(4)
where RRpis the relative risk associated with
precursorp, and δrp =1 if precursorpis present in
reportr(δrp =0else).
In practice, computing real (or worst) safety risk
at the report level comes down to multiplying the
(R,P) attribute binary matrix (attribute matrix of
Fig. 1) by the (P,1) relative real (or worst) risk vector
as shown in Equation (5). In the end, two risk val-
ues (real and worst) were obtained for each of the
R=814 incident reports.
For instance, in the following description of a
construction situation: “worker is unloading a ladder
from pickup truck with bad posture,” four attributes
are present: namely (1) ladder, (2) manual handling,
(3) light vehicle, and (4) improper body positioning.
The risk based on real outcomes for this construc-
tion situation is the sum of the relative risk values of
the four attributes present (given by Table III), that
is, 15 +49 +7+3=74, and similarly, the risk based
on worst potential outcomes is 52 +439 +123 +6=
620. As already stressed, these relative values are
not meaningful in absolute terms, they only enable
comparison between situations and their categoriza-
tion into broad ranges of riskiness (e.g., low, medium,
high). Estimating these ranges on a small, finite sam-
ple such as the one we used in this study would
have resulted in biased estimates. To alleviate this,
we used stochastic simulation techniques to generate
hundreds of thousands of new scenarios honoring the
historical data, enabling us to make inferences from
a much richer, yet faithful sample.
010···101
010···100
100···001
011···010
.
.
..
.
..
.
.....
.
..
.
..
.
.
010···011
000···000
000···001
010···010
·
RR1
RR2
RR3
.
.
.
RR(P2)
RR(P1)
RRP
=,
Rreport1
Rreport2
Rreport3
Rreport4
.
.
.
Rreport(R3)
Rreport(R2)
Rreport(R1)
RreportR
(5)
Multiplying the (R,P) attribute matrix by the
(P,1) vector of relative risk values for each attribute
gives the (R,1) vector of risk values associated with
each injury report.
3.3. The Probability Distribution of Construction
Safety Risk Resembles That of Many
Natural Phenomena
For a given injury report, the risk based on real
outcomes and the risk based on worst potential out-
comes can each take on a quasi-infinite number of
values (2P1) with some associated probabilities.
Therefore, they can be considered quasi-continuous
random variables, and have legitimate probability
distribution functions (PDFs). Furthermore, since a
risk value cannot be negative by definition, these
PDFs have [0,+∞[ support.
The empirical PDF of the risk based on real out-
comes for the 814 injury reports is shown as a his-
togram in Fig. 2. The histogram divides the sample
Construction Safety Risk Modeling and Simulation 7
risk based on real outcomes
probability
0 200 400 600
0 0.1 0.2 0.3 0.4
●●●●●●●●●●●●●●●● ●●●● ● ●●●● ●●● ● ● ●●● ●●●●● ●●● ●●●●● ●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●● ●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●● ●●●●●●●●●●●●●●● ●●●● ●●●● ●●●●●●● ●●●●●●●●●● ●●● ● ● ●●●●● ●● ●●●●●●●●●●●●●●●● ● ●●●● ●●● ●●●● ● ●●●●●●●●●●●●●●●● ● ●●●●●● ●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●● ● ●●● ●●●●● ●●● ●●●●●●●●●●●●● ● ●●● ●●● ● ●●●●●● ●●●●●●●●●●● ● ●●●●●●●●●●● ●●● ●● ●●● ●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●● ● ●●●● ● ● ●●●●● ●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●● ●●●●●●●●●●●● ●●●●●●● ●● ●●●●●●●● ● ●●●● ●●●●●● ●● ●●●●● ●●● ●● ●●●●● ●● ●●●●● ●●● ● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●● ●● ●●●● ●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●
original observations
KDE of synthetic values (n=10^5)
Fig. 2. Histogram of original observations (n=814) with boundary corrected KDE of the simulated observations (n=105).
space into a number of intervals and simply counts
how many observations fall into each range. We can
clearly see that the empirical safety risk is rightly
skewed and exhibits a thick tail feature. In other
words, the bulk of construction situations present
risk values in the small-medium range, whereas only
a few situations are associated with high and extreme
risk. This makes intuitive sense and is in accordance
with what is observed onsite, i.e., frequent benign in-
juries, and low-frequency high-impact accidents.
Such heavy-tailed distributions are referred to
as “power laws” in the literature, after Pareto,(27)
who proposed that the relative number of individuals
with an annual income larger than a certain threshold
was proportional to a power of this threshold. Power
laws are ubiquitous in nature.(28,29) Some examples of
natural phenomena whose magnitude follow power
laws include earthquakes, ocean waves, volcanic
eruptions, asteroid impacts, tornadoes, forest fires,
floods, solar flares, landslides, and rainfall.(28,30–32)
Other human-related examples include insurance
losses and health-care expenditures,(33) hurricane
damage,(34,35) and the size of human settlements and
of files transferred on the web.(36,37)
To highlight the resemblance between construc-
tion safety risk and some of the aforementioned
natural phenomena, we selected four data sets that
are standard in the field of extreme value analy-
sis, and freely available from the “extRemes” R
package.(38) We overlaid the corresponding PDFs
with that of construction safety risk. For the sake of
comparison, variables were first rescaled as shown
in Equation (6). The output can be seen in Fig. 3. In
what follows, each data set is briefly presented.
Z=Xmin (X)
max (X)min (X)(6)
where Xis the variable in the original space and Zis
the variable in the rescaled space.
The first data set reported summer maximum
temperatures in Phoenix, AZ, from 1948 to 1990,
measured at Sky Harbor Airport. The observations
were multiplied by 1 (flipped horizontally) before
rescaling. The distribution is named “max temper-
ature” in Fig. 3. The second data set (“hurricane
damage” in Fig. 3) consisted of total economic dam-
age caused by every hurricane making landfall in the
United States between 1925 and 1995, expressed in
1995 U.S. $ billion. All individual storms costing less
than $0.01 billion were removed to minimize poten-
tial biases in the recording process. The final number
8Tixier, Hallowell, and Rajagopalan
0.0 0.2 0.4 0.6 0.8 1.0
rescaled units
probability
00.2 0.4 0.6 0.8
safety risk
hurricane damage
precipitation
max temperature
peak flow
Fig. 3. Safety risk compared to natural phenomena.
of hurricanes taken into account was 86. The third
data set included in our comparison was observations
of Potomac River peak stream flow measured in cu-
bic feet per second at Point Rocks, MD, from 1895
to 2000. The observations were divided by 105before
rescaling. The curve is labeled “peak flow” in Fig. 3.
The fourth and last data set contained 36,524 daily
precipitation amounts (in inches) from a single rain
gauge in Fort Collins, CO. Only values greater than
1 inch were taken into account, giving a final number
of 213 observations. The distribution is named “pre-
cipitation” in Fig. 3.
We estimated the PDFs shown in Fig. 3 via KDE
because overlaying histograms would have resulted
in an incomprehensible figure. The KDE is a non-
parametric way to estimate a PDF. It can be viewed
as a smoothed version of the histogram, where a con-
tinuous function, called the kernel, is used rather
than a box as the fundamental constituent.(39) The
kernel has zero mean, is symmetric, positive, and in-
tegrates to one. The last two properties ensure that
the kernel, and as a result the KDE, is a probabil-
ity distribution. More precisely, as shown in Equa-
tion (7), the KDE at each point xis the sum of the
weighted contributions from all the observations to
the point x, the weights being assigned by the kernel
function.(39,40)
Put differently, the KDE at xis a local average
of functions assigning weights to the neighboring ob-
servations xithat decrease as |xix|increases.(41,42)
The “local” estimation is the key feature of this
method in enabling to capture the features present
in the data. KDEs converge faster to the underly-
ing density than the histogram, and are robust to the
choice of the origin of the intervals.(42)
fX(x)=1
nh
n
i=1
Kxxi
h,h,(7)
where {x1,...,xn}are the observations, Kis the ker-
nel and his a parameter called the bandwidth. Note
that
fXis an estimator of the true unknown PDF.
his a parameter called the bandwidth that con-
trols the degree of smoothing and therefore affects
the final shape of the estimate.(40) In this study, we
used a standard and widespread way of estimating
hcalled Silverman’s rule of thumb(39) and shown
in Equation (8). We invite the reader to reference
Rajagopalan et al.(43) for a good review of the
Construction Safety Risk Modeling and Simulation 9
objective bandwidth selection methods.
h=0.9min
ˆσX,Q3Q1
1.34
n1/5,(8)
where Q3and Q1are the third and first quartiles (re-
spectively), ˆσXis the standard deviation of the sam-
ple, and nis the size of the sample. Here, n=R=
814.
Further, for our kernel K, we selected the stan-
dard normal distribution N(0,1), that is, the normal
distribution centered on zero with unit variance. Be-
cause the PDF of N(0,1) is 1
2πex2/2, the associated
KDE can be written using Equation (7) as shown
in Equation (9). Other popular kernels include the
triangular, biweight, or Epanechnikov, but the con-
sensus in the statistics literature is that the choice
of the kernel is secondary to the estimation of the
bandwidth.(41)
fX(x)=1
n(2h2π)p/2
n
i=1
e1
2(xix/h)2
(9)
where {x1,...,xn}are the observations, and his the
bandwidth. Here, n=R=814.
It is well known that the KDE suffers a bias at
the edges on bounded supports. Indeed, because the
kernel functions are symmetric, weights are assigned
to values outside the support, which causes the den-
sity near the edges to be significantly underestimated,
and creates a faulty visual representation. In our
case, safety risk takes on values in [0,+∞[, so issues
arise when approaching zero. We used the correction
for the boundary bias via local linear regression(44)
using the “evmix” package(45) of the R programming
language.(46) Log transformation and boundary re-
flection are other popular approaches for controlling
boundary bias.(39,43)
3.4. Univariate Safety Risk Stochastic Generator
In this section, we present a computational
method that can be used to generate synthetic safety
risk values that honor the historical data. We focus
on the risk based on real outcomes, but the same
methodology can be used to simulate from any dis-
tribution. Note that although many techniques and
concepts in risk modeling and management deal with
extreme values only, here we seek to capture and
simulate the entire risk spectrum (not only the ex-
tremes) in order to accurately and comprehensively
assess the safety risk of any construction situation.
The quantile function (or simply quantile, for
short) of a continuous random variable Xis defined
as the inverse of its cumulative distribution function
(CDF) as shown in Equation (10). The CDF is ob-
tained by integrating or summing the PDF, respec-
tively, in the continuous and discrete case.
Q(p)=FX1(p),(10)
where FXis the CDF of X defined as FX(x)=
P[Xx]=p[0,1]
The quantile is closely linked to the concept of
exceedances. In finance and insurance, for instance,
the value-at-risk for a given horizon is the loss that
cannot be exceeded with a certain probability of con-
fidence within the time period considered, which is
given by the quantile. For instance, the 99.95% value-
at-risk Q(99.95) at 10 days represents the amount of
money that the loss can only exceed with 0.5% prob-
ability in the next 10 days. In other words, the cor-
responding fund reserve would cover 199 losses over
200 (199/200 =0.995).
The quantile is also associated with the notion
of return period Tin hydroclimatology. For exam-
ple, the magnitude of the 100-year flood (T=100)
corresponds to the streamflow value that is only ex-
ceeded on average by 1% of the observations, assum-
ing one observation per year. This value is given by
Q(1 1/T)=Q(0.99), which is the 99th percentile,
or the 99th 100-quantile. Similarly, the magnitude of
the 500-year flood, Q(0.998), is only exceeded on av-
erage by 0.2% of the observations. For construction
safety, this quantity would correspond to the min-
imum risk value that is only observed on average
in one construction situation over five hundred. The
median value, given by Q(0.5), would correspond to
the safety risk observed on average in one construc-
tion situation over two.
3.4.1. Limitations of Traditional Parametric
Techniques
Traditional approaches to quantile estimation
are based on parametric models of PDF especially
from the extreme value theory (EVT) framework.(47)
These models possess fat tails unlike traditional
PDFs, and thus are suitable for robust estimation
of extremes. The candidate distributions from the
EVT are Frechet, Weibull, Gumbel, GEV, gener-
alized Pareto, or mixtures of these distributions.(48)
These parametric models are powerful in that
they allow complex phenomena to be entirely de-
scribed by a single mathematical equation and a few
10 Tixier, Hallowell, and Rajagopalan
parameters. However, being parametric, these mod-
els tend to be suboptimal when little knowledge is
available about the phenomenon studied(48,49) and
though they are heavy-tailed, they still are prone to
underestimating the extreme quantiles.(50) A popu-
lar remediation strategy consists in fitting a paramet-
ric model to the tail only, such as the generalized
Pareto, but selecting a threshold that defines the tail
is a highly subjective task,(51) and medium and small
values, which represent the bulk of the data, are of-
ten overlooked.(50) What is clearly better, however,
especially when the final goal is simulation, is to cap-
ture the entire distribution. As a solution, hydrocli-
matologists have proposed dynamic mixtures of dis-
tributions, based on weighting the contributions of
two overlapping models, one targeting the bulk of the
observations, and the other orientated toward cap-
turing extremes.(52,53) Unfortunately, threshold selec-
tion implicitly carries over through the estimation of
the parameters of the mixing function, and with most
mixing functions, conflicts arise between the two dis-
tributions around the boundary.(45) For all these rea-
sons, we decided to adopt a fully data-driven, non-
parametric approach that we describe below.
3.4.2. Proposed Approach
Our methodology consists in generating inde-
pendent realizations from the nonparametric PDF
estimated via the KDE described above. We base
our generator on the smoothed bootstrap with vari-
ance correction proposed by Silverman.(39) Unlike
the traditional nonparametric bootstrap(54) that sim-
ply consists in resampling with replacement, the
smoothed bootstrap can generate values outside of
the historical limited range, and does not repro-
duce spurious features of the original data such as
noise.(55) The smoothed bootstrap approach has been
successfully used in modeling daily precipitation,(56)
streamflow,(57) and daily weather.(55) Kernel quantile
function estimators(58) and local polynomial-based
estimators(59) are other attractive options. Here, we
propose simulation from smoothed bootstrap, which
is easier to implement and competitive to other
methods.
The algorithm we used to generate our synthetic
values according to the smoothed bootstrap scheme
can be broken down into the following steps:
For jin 1 to the desired number of simulated val-
ues:
(1) choose iuniformly with replacement from
{1,...,R};
(2) sample Xfrom the standard normal distribu-
tion with variance hX2;
(3) record Xsimj=X+(XiX+X)/1+hX2/ˆσ2
X.
where R=814 is the sample size (the number of
injury reports), Xand ˆσ2
Xare the sample mean and
variance, and hX2is the variance of the standard nor-
mal kernel (bandwidth of the KDE). Note that we
deleted the negative simulated values to be consis-
tent with the definition of risk.
Fig. 2 shows the KDE of 105simulated val-
ues overlaid with the histogram of the original sam-
ple. It can be clearly seen that the synthetic val-
ues are faithful to the original sample, as the PDF
from the simulated values fits the observations very
well. Also, while honoring the historical data, the
smoothed bootstrap generated values outside the
original range, as desired. The maximum risk value
in our sample was 709, whereas the maximum of the
simulated values was 740 (rounded to the nearest in-
teger). Table IV compares the quantile estimated via
the quantile() R function of the original and simu-
lated observations.
The quantile estimates of Table IV are roughly
equivalent before reaching the tails. This is because
the bulk of the original observations were in the low
to medium range, enabling quite accurate quantile
estimates for this range in the first place. The prob-
lem stemmed from the sparsity of the high to ex-
treme values in the historical sample, which made
estimation of the extreme quantiles biased. Our use
of the smoothed bootstrap populated the tail space
with new observations, yielding a slightly higher es-
timate of the extreme quantiles, as can be seen in
Table IV. It makes sense that the extremes are
higher than what could have been inferred based
simply on the original sample, as the original sam-
ple can be seen as a finite window in time whereas
our simulated values correspond to observations that
would have been recorded over a much longer pe-
riod. The chance of observing extreme events is of
course greater over a longer period of time. Based
on estimating the quantiles on the extended time
frame represented by the synthetic values, we pro-
pose the risk ranges shown in Table V. As already
explained, these ranges are more robust and un-
biased than the ones that would have been built
from our historical observations only. Thanks to
this empirical way of assessing safety risk, construc-
tion practitioners will be able to adopt an optimal
proactive approach by taking coherent preventive
Construction Safety Risk Modeling and Simulation 11
Table IV. Quantile Estimates Based on Original and Simulated Values for the Risk Based on Real Outcomes
Safety Risk Observed in One Situation Over
2 5 10 100 500 1,000 10,000
Original observations (n=R=814)
57 110 152 649 703 706 709
Simulated observations (n=105) 61 116 154 647 700 708 728
Table V. Proposed Ranges for the Risk Based on Real Outcomes
actions and provisioning the right amounts of re-
sources.
4. BIVARIATE ANALYSIS
In what follows, we study the relationship be-
tween the risk based on real outcomes (X,for
brevity) and the risk based on worst potential out-
comes (Y). Rather than considering these two ran-
dom variables in separation, we acknowledge their
dependence and aim at capturing it, and faithfully
reproducing it in our simulated observations. This
serves the final goal of being able to accurately as-
sess the potential of an observed construction situa-
tion for safety risk escalation should the worst-case
scenario occur. Fig. 4 shows a plot of Yversus X,
whereas a bivariate histogram can be seen in Fig. 5.
We can distinguish three distinct regimes in
Fig. 4. The first regime, corresponding roughly to
0<X<70, is that of benign situations that stay be-
nign in the worst case. Under this regime, there is lim-
ited potential for risk escalation. The second regime
(70 <X<300) shows that beyond a certain thresh-
old, moderately risky situations can give birth to haz-
ardous situations in the worst case. The attribute re-
sponsible for the switch into this second regime is
machinery (e.g., welding machine, generator, pump).
The last regime (X>300) is that of the extremes,
and features clear and strong upper tail dependence.
The situations belonging to this regime are haz-
ardous in their essence and create severe outcomes
in the worst-case scenarios. In other words, those
situations are dangerous in the first place and un-
forgiving. The attribute responsible for this extreme
regime is hazardous substance (e.g., corrosives, ad-
hesives, flammables, asphyxiants). Again, note that
these examples are provided as a result of applying
our methodology on a data set of 814 injury reports
for illustration purposes but do not incur any loss of
generality. Using other, larger data sets would allow
risk regimes to be characterized by different and pos-
sibly more complex attribute patterns.
4.1. Copula Theory
Many natural and human-related phenomena
are multifactorial by nature and as such their study
requires the joint modeling of several random vari-
ables. Traditional approaches consist in modeling de-
pendence with the classical family of multivariate dis-
tributions, which is clearly limiting, as it requires all
variables to be separately characterized by the same
univariate distributions (called the margins). Copula
theory addresses this limitation by creating a joint
probability distribution for two or more variables
while preserving their original margins.(60) In addi-
tion to the extra flexibility they offer, the many ex-
isting parametric copula models are also attractive in
that they can model the dependence among a poten-
tially very large set of random variables in a parsimo-
nious manner. For an overview of copulas, one may
refer to Cherubini et al.(61)
Although the introduction of copulas can be
tracked back as early as 1959 with the work of Sklar,
they did not gain popularity until the end of the
1990s when they became widely used in finance. Cop-
ulas are now indispensable to modeling multivariate
dependence,(62) and are used in various fields from
cosmology to medicine. Because many hydroclima-
tological phenomena are multidimensional, copulas
are also increasingly used in hydrology, weather, and
climate research, for instance, for precipitation infill-
ing and extreme storm tide modeling.(63–65)
Formally, a d-dimensional copula is a joint
CDF with [0,1]dsupport and standard uniform
margins.(66) Another equivalent definition is given by
12 Tixier, Hallowell, and Rajagopalan
●●
●●
●●
●●
●●
0 100 200 300 400 500 600 700
0 2000 4000 6000 8000 10000
risk based on real outcomes
risk based on worst possible outcomes
Fig. 4. Bivariate construction safety risk.
Sklar’s theorem,(67) which states in the bivariate case
that the joint CDF F(x,y) of any pair ( X,Y) of con-
tinuous random variables can be written in terms of
a copula as shown in Equation (11).
F(x,y)=CFX(x),FY(y),(x,y)R2,(11)
where FXand FYare the respective margins of Xand
Y, and C:[0,1]2[0,1] is a copula.
Note that Sklar’s theorem is consistent with the
first definition given because for any continuous ran-
dom variable Xof CDF FX,FX(X) follows a uniform
distribution (a result known as the probability inte-
gral transform).
Parametric copulas suffer from all the limitations
inherent to univariate parametric models evoked
previously. Therefore, like in the univariate case, we
decided to adopt an empirical, nonparametric ap-
proach to copula density estimation. We used the bi-
variate KDE to estimate the empirical copula, which
is defined as the joint CDF of the rank-transformed
(or pseudo) observations. The pseudo-observations
are obtained as shown in Equation (12).
UX(x)=rank (x)
length (X)+1,(12)
where UXis the transformed sample of the pseudo-
observations, and Xis the original sample.
Because the copula support is the unit square
[0,1]2, the KDE boundary issue arises twice this
time, near zero and one, yielding multiplicative
bias.(68) Therefore, the density is even more severely
underestimated than in the univariate case, and it is
even more crucial to ensure robustness of the KDE at
the corners to ensure proper visualization. We used
the transformation-based approach described by
Charpentier et al.(68) as our boundary correction
technique, using the inverse CDF of the normal
distribution, FN(0,1)1, as the transformation T.The
resulting empirical copula density estimate of the
original sample is shown in Fig. 6, and can be seen
to capture the data very well.
Construction Safety Risk Modeling and Simulation 13
risk based on real outcomes
0
200
400
600
risk based on worst possible outcomes
0
2000
4000
6000
8000
10000
empirical bivariate PDF
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.00
0.05
0.10
0.15
0.20
0.25
0.30
probability
Fig. 5. Bivariate histogram with 25 ×25 grid.
4.2. Bivariate Safety Risk Stochastic Generator
Like in the univariate case, we used a nonpara-
metric, fully data-driven approach, the smoothed
bootstrap with variance correction, as our simula-
tion scheme. Minor adaptations were needed due to
the two-dimensional nature of the task. The steps of
our algorithm are outlined below, and the resulting
105simulated values are shown in Fig. 7. Note that
the procedure is equivalent to simulating from the
nonparametric copula density estimate introduced
above. Like in the univariate case, we deleted the
negative simulated values to ensure consistency with
the definition of risk.
For j in 1 to the desired number of simulated
values:
(1) choose iuniformly with replacement from
{1,...,R};
(2) sample Xfrom the standard normal distribu-
tion with variance hX2, and Yfrom the stan-
dard normal distribution with variance hY2;
(3) take:
Xsi mj =X+XiX+X/1+hX2/ˆσ2
X
Y
simj =Y+Y
iY+Y/1+hY2/ˆσ2
Y
(4) record: U(simj)=FN(0,1)(X(simj)), Vsim j=
FN(0,1)(Y
(simj)).
where R=814 is the number of injury reports, X
and ˆσ2
Xare the mean and variance of X;Yand ˆσ2
Y
are the mean and variance of Y;hX2is the band-
width of the KDE of X;hY2is the bandwidth of
the KDE of Y; and FN(0,1) is the CDF of the stan-
dard normal distribution, that is, the inverse of our
transformation T.
Note that step 1 selects a pair (x,y) of values
from the original sample (X,Y), not two values
independently. This is crucial in ensuring that the
dependence structure is preserved. Step 4 sends
the simulated pair to the pseudo space to enable
visual comparison with the empirical copula density
estimate, which is defined in the rank space (i.e., the
unit square). We can clearly observe in Fig. 7 that our
sampling scheme was successful in generating values
that reproduce the structure present in the original
data, validating our nonparametric approach. For
the sake of completeness, we also compared (see
14 Tixier, Hallowell, and Rajagopalan
0.2 0.4 0.6 0.8
0.2 0.4 0.6 0.8
pseudo risk based on real outcomes
pseudo risk based on worst potential outcomes
●●
●●
●●
●●
●●
Fig. 6. Nonparametric copula density estimate with original pseudo-observations.
Table VI. Quantile Estimates Based on Original and Simulated Values for the Risk Based on Worst Potential Outcomes
Safety Risk Observed in One Situation Over
2 5 10 100 500 1,000 10,000
Original observations (n=R=814) 343 950 1,719 7,000 9,808 9,808 9,808
Simulated observations (n=105) 395 1,061 1,953 7,092 9,765 9,586 10,045
Fig. 8 and 9) the simulated pairs in the original space
with the original values. Once again, it is easy to see
that the synthetic values honor the historical data.
To enable comparison with the univariate case (see
Table IV), Table VI summarizes the empirical quan-
tiles for the historical and simulated observations
of risk based on worst potential outcomes (i.e., Y).
Like in the univariate case, we can observe that the
differences between the estimates increase with the
quantiles. Notably, simulation allows to obtain richer
estimates of the extreme quantiles, Q(1 1
1000 )=
Q(0.999) and Q(1 1
10000 )=Q(0.9999), whereas
Table VII. Proposed Ranges for the Risk Based on Worst
Potential Outcomes
with the initial limited sample, the values of the quan-
tile function plateau after Q(1 1
500 )=Q(0.998)
due to data sparsity in the tail. Similarly to Table
V, we also propose in Table VII ranges for the risk
based on worst potential outcomes.
Construction Safety Risk Modeling and Simulation 15
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
pseudo risk based on real outcomes
pseudo risk based on worst possible outcomes
Fig. 7. Simulated values n=105.
4.3. Computing Risk Escalation Potential Based
on Simulated Values
Using the synthetic safety risk pairs obtained via
our bivariate stochastic safety risk generator, and evi-
dence provided by the user (i.e., an observation made
onsite in terms of attributes), it is possible to compute
and estimate the upper limit of risk, i.e., the safety
risk presented by the observed construction situation
based on worst-case scenarios. This estimate is based
on large numbers of values simulated in a data-driven
approach that features the same dependence struc-
ture as the original, empirical data. The technique we
propose, based on conditional quantile estimation,
consists in the steps detailed in what follows.
First, the attributes observed in a particular
construction situation give the risk based on real
outcomes for the construction situation, say x0.By
fixing the value of Xto x0, this first step extracts
a slice from the empirical bivariate distribution of
the simulated values. This slice corresponds to the
empirical probability distribution of Yconditional
on the value of X, also noted P[Y|X=x0]. Because
only a few values of Ymay exactly be associated
with x0, we consider all the values of Yassociated
with the values of Xin a small neighboring range
around x0, that is, P[Y|x0x<X<x0+x+]. In our
experiments, we used x=x+=5; that is, a range of
[5,+5] around x0, because it gave good results, but
there is no absolute and definitive best range. The
second step simply consists in evaluating the quantile
function of P[Y|x0x<X<x0+x+]atsome
threshold. The reader can refer to Equation (10)
for the definition of the quantile function. In our
experiments, we used a threshold of 80%, (i.e., we
computed Q(0.8) with the quantile() R function),
but the choice of the threshold should be made at
the discretion of the user, depending on the desired
final interpretation. In plain English, the threshold
we selected returns the risk based on worst possible
outcomes that is only exceeded in 20% of cases for
16 Tixier, Hallowell, and Rajagopalan
●●
●●
●●
●●
●●
0 100 200 300 400 500 600 700
0 2000 4000 6000 8000 10000
risk based on real outcomes
risk based on worst possible outcomes
Fig. 8. Bivariate construction safety risk.
0 2000 4000 6000 8000 10000
risk based on real outcomes
risk based on worst possible outcomes
0 100 200 300 400 500 600 700
Fig. 9. Simulated values n=105.
Construction Safety Risk Modeling and Simulation 17
Table VIII. Illustration of the Proposed Risk Escalation Estimation Technique
Step 1: Prior Evidence Step 2: Conditional Quantile Estimate
Attributes
Risk Based on Real Outcomes (x0)and
Associated Rangea
Estimate Q(0.8) of Risk Based on Worst
Potential Outcomes and Associated Rangeb
Hazardous substance,
confined workspace
590 +115 =705 Extreme 7,266 Extreme
Hammer, lumber 5 +53 =58 Medium 676 High
Hand size pieces 7 Low 145 Low
aBased on the ranges proposed in Table V.
bBased on the ranges proposed in Table VII.
the particular value of risk based on real outcomes
computed at the first step. Finally, by categorizing
this value into the ranges of risk based on worst
possible outcomes provided in Table VII, we are
able to provide understandable and actionable in-
sight with respect to the most likely risk escalation
scenario.
These steps are illustrated for two simple con-
struction situations in Table VIII. For comparison
purposes, we also show the range of risk based on
real outcomes (provided in Table V) in which x0
falls.
5. LIMITATIONS
Given the data-driven nature of our approach,
attribute risk values are expected to change from one
injury report database to another, and from one set
of exposure values to another, even though the distri-
butions of safety risk based on real and worst poten-
tial outcomes are expected to remain the same (i.e.,
heavy-tailed). Also, in this study, we used a rather
small data set (final size of 814 injury reports) to pro-
vide a proof of concept for our methodology. With
larger data sets, more attributes would play a role in
characterizing the different regimes presented in Fig.
4, and their respective signature would therefore en-
joy a higher resolution.
6. CONCLUSION
We defined construction safety risk at the
attribute and situational level, and showed its empir-
ical probability distribution to be strikingly similar
to that of many natural phenomena. We then pro-
posed univariate and bivariate safety risk stochastic
generators based on nonparametric density estima-
tion techniques. The combination of kernels and
copulas and the introduction of these methods for
modeling construction safety risk make a unique
and pioneering contribution. It provides a powerful
methodology to model and visualize bivariate safety
risks, which are ubiquitous in construction and whose
understanding is of paramount importance to safety
performance improvement. Our approach can be
used as a way to ground risk-based safety-related de-
cisions under uncertainty on objective empirical data
far exceeding the personal history of even the most
experienced safety or project manager. Additionally,
the combined use of the attribute-based framework
and raw injury reports as the foundation of our work
allows the user to escape the limitations of traditional
construction safety risk analysis techniques that are
segmented and rely on subjective data. Finally, the
attribute-based nature of our methodology enables
easy integration with building information modeling
(BIM) and work packaging. We believe this study
gives promising evidence that transitioning from
an opinion-based and qualitative discipline to an
objective, empirically-grounded quantitative science
would be highly beneficial to construction safety
research.
ACKNOWLEDGMENTS
We would like to thank the National Science
Foundation for supporting this research through an
Early Career Award (CAREER) Program. This ma-
terial is based upon work supported by the National
Science Foundation under Grant No. 1253179. Any
opinions, findings, and conclusions or recommenda-
tions expressed in this material are those of the au-
thors and do not necessarily reflect the views of the
National Science Foundation. Sincere thanks also go
to Prof. Arthur Charpentier for his kind help on
nonparametric copula density estimation and on the
18 Tixier, Hallowell, and Rajagopalan
bivariate smoothed bootstrap, and to Prof. Carl Scar-
rott for his insight on dynamic mixture modeling.
REFERENCES
1. Bureau of Labor Statistics. Occupational injuries/illnesses
and fatal injuries profiles, 2015. Available at http://www.
bls.gov/news.release/pdf/cfoi.pdf.
2. Albert A, Hallowell MR, Kleiner B, Chen A, Golparvar-
Fard M. Enhancing construction hazard recognition with high-
fidelity augmented virtuality. Journal of Construction Engi-
neering and Management, 2014; 140(7):04014024.
3. Carter G, Smith SD. Safety hazard identification on construc-
tion projects. Journal of Construction Engineering and Man-
agement, 2006; 132(2):197–205.
4. Kahneman D, Tversky A. On the study of statistical intuitions.
Cognition, 1982; 11(2):123–141.
5. Tixier AJP, Hallowell MR, Rajagopalan B, Bowman D. Auto-
mated content analysis for construction safety: A natural lan-
guage processing system to extract precursors and outcomes
from unstructured injury reports. Automation in Construc-
tion, 2016; 62:45–56.
6. Tixier AJP, Hallowell MR, Rajagopalan B, Bowman D. Ap-
plication of machine learning to construction injury predic-
tion. Automation in Construction, 2016; 69:102–114.
7. Esmaeili B, Hallowell M. Attribute-based risk model for mea-
suring safety risk of struck-by accidents. Construction Re-
search Congress, 2012:289–298.
8. Esmaeili B, Hallowell MR. Using network analysis to model
fall hazards on construction projects. Saf Health Construct
CIB W, 2011; 99:24–26.
9. Prades Villanova M. Attribute-based risk model for assessing
risk to industrial construction tasks (master’s thesis). Univer-
sity of Colorado at Boulder, 2014.
10. Desvignes M. Requisite empirical risk data for integration
of safety with advanced technologies and intelligent systems
(master’s thesis). University of Colorado at Boulder, 2014.
11. Capen EC. The difficulty of assessing uncertainty. SPE-5579-
PA. JPT, 1976; 28(8):843–50.
12. Rose PR. Dealing with risk and uncertainty in exploration:
How can we improve? AAPG Bulletin, 1987; 71(1):1–16.
13. Tversky A, Kahneman D. The framing of decisions and
the psychology of choice. Science, 1981; 211(4481):453–458.
doi:10.1126/science.7455683.
14. Gustafsod Per E. Gender differences in risk perception: The-
oretical and methodological perspectives. Risk Analysis, 1998;
18(6):805–811. doi:10.1111/j.1539-6924.1998.tb01123.x.
15. Tixier AJP, Hallowell MR, Albert A, van Boven L, Kleiner
BM. Psychological antecedents of risk-taking behavior in con-
struction. Journal of Construction Engineering and Manage-
ment, 2014:140(11).
16. Hallowell MR, Gambatese JA. Qualitative research: Applica-
tion of the Delphi method to CEM research. Journal of Con-
struction Engineering and Management, 2009; 136(1):99–107.
17. Lingard H. Occupational health and safety in the construction
industry. Construct Manage Econ, 2013; 31(6):505–514.
18. Hallowell MR, Gambatese JA. Activity-based safety risk
quantification for concrete formwork construction. Jour-
nal of Construction Engineering and Management, 2009;
135(10):990–998.
19. Navon R, Kolton O. Model for automated monitoring of fall
hazards in building construction. Journal of Construction En-
gineering and Management, 2006; 132(7):733–740.
20. Huang X, Hinze J. Analysis of construction worker fall acci-
dents. Journal of Construction Engineering and Management,
2003; 129(3):262–271.
21. Baradan S, Usmen M. Comparative injury and fatality risk
analysis of building trades. Journal of Construction Engineer-
ing and Management, 2006; 132(5):533–539.
22. Jannadi O, Almishari S. Risk assessment in construction.
Journal of Construction Engineering and Management, 2003;
129(5):492–500.
23. Everett JG. Overexertion injuries in construction. Jour-
nal of Construction Engineering and Management, 1999;
125(2)109–114.
24. Shapira A, Lyachin B. Identification and analysis of fac-
tors affecting safety on construction sites with tower cranes.
Journal of Construction Engineering and Management, 2009;
135(1):24–33.
25. Sacks R, Rozenfeld O, Rosenfeld Y. Spatial and temporal
exposure to safety hazards in construction. Journal of Con-
struction Engineering and Management, 2009; 135(8):726–
736.
26. Alexander D, Hallowell M, Gambatese J. Energy-based
safety risk management: Using hazard energy to pre-
dict injury severity. In ICSC15: The Canadian Society for
Civil Engineering 5th International/11th Construction Spe-
cialty Conference, University of British Columbia, Vancou-
ver, Canada. June 7–10, 2015. Available at: https://open.
library.ubc.ca/cIRcle/collections/52660/items/1.0076370.
27. Pareto V. Cours d’economie politique. Geneva, Switzerland:
Droz, 1896.
28. Pinto C, Mendes Lopes A, Machado JA. A review of power
laws in real life phenomena. Communications in Nonlinear
Science and Numerical Simulation, 2012; 17(9):3558–3578.
29. Malamud Bruce D. Tails of natural hazards. Physics World,
2004; 17(8):31–35.
30. Papalexiou SM, Koutsoyiannis D, Makropoulos C. How ex-
treme is extreme? An assessment of daily rainfall distribution
tails. Hydrology and Earth System Sciences, 2013; 17(2):851–
862.
31. Menendez M, Mendez FJ, Losada IJ, Graham NE. Variability
of extreme wave heights in the northeast Pacific Ocean based
on buoy measurements. Geophysical Research Letters, 2008;
35(22).
32. Malamud BD, Turcotte DL. The applicability of power-law
frequency statistics to floods. J Hydrol, 2006; 322(1):168–
180.
33. Ahn S, Kim JH, Ramaswami V. A new class of models for
heavy tailed distributions in finance and insurance risk. Insur
Math Econ, 2012; 51(1):43–52.
34. Jagger TH, Elsner JB, Saunders MA. 2008: Forecasting U.S.
insured hurricane losses. Pp. 189–208 in Diaz HF, Murnane RJ
(eds). Climate Extremes and Society. Cambridge University
Press, 2008.
35. Katz RW. Stochastic modeling of hurricane damage. Journal
of Applied Meteorology, 2002; 41(7):754–762.
36. Reed WJ. The Pareto, Zipf and other power laws. Economics
Letters, 2001; 74(1):15–19.
37. Crovella ME, Bestavros A. Explaining World Wide Web
traffic self-similarity. Boston University Computer Sci-
ence Department, 1995. Available at: http://dcommon.
bu.edu/xmlui/handle/2144/1574.
38. Gilleland E, Katz RW. New software to analyze how extremes
change over time. Eos Transactions, American Geophysical
Union, 2011; 92(2):13–14.
39. Silverman BW. Density Estimation for Statistics and Data
Analysis, Vol. 26. CRC Press, 1986.
40. Hastie T, Tibshirani R, Friedman J. The Elements of Statisti-
cal Learning, Vol. 2, No. 1. New York: Springer, 2009.
41. Saporta G. Probabilites, analyse des donnees et statistique.
Paris, France: Editions Technip, 2011.
42. Moon YI, Rajagopalan B, Lall U. Estimation of mutual in-
formation using kernel density estimators. Physical Review E,
1995; 52(3):2318–2321.