Content uploaded by Antoine J.-P. Tixier
Author content
All content in this area was uploaded by Antoine J.-P. Tixier on Nov 19, 2018
Content may be subject to copyright.
Content uploaded by Antoine J.-P. Tixier
Author content
All content in this area was uploaded by Antoine J.-P. Tixier on Oct 13, 2017
Content may be subject to copyright.
Risk Analysis DOI: 10.1111/risa.12772
Construction Safety Risk Modeling and Simulation
Antoine J.-P. Tixier,1,∗Matthew R. Hallowell,2and Balaji Rajagopalan2
By building on a genetic-inspired attribute-based conceptual framework for safety risk anal-
ysis, we propose a novel approach to define, model, and simulate univariate and bivariate
construction safety risk at the situational level. Our fully data-driven techniques provide con-
struction practitioners and academicians with an easy and automated way of getting valuable
empirical insights from attribute-based data extracted from unstructured textual injury re-
ports. By applying our methodology on a data set of 814 injury reports, we first show the
frequency-magnitude distribution of construction safety risk to be very similar to that of
many natural phenomena such as precipitation or earthquakes. Motivated by this observa-
tion, and drawing on state-of-the-art techniques in hydroclimatology and insurance, we then
introduce univariate and bivariate nonparametric stochastic safety risk generators based on
kernel density estimators and copulas. These generators enable the user to produce large
numbers of synthetic safety risk values faithful to the original data, allowing safety-related
decision making under uncertainty to be grounded on extensive empirical evidence. One of
the implications of our study is that like natural phenomena, construction safety may bene-
fit from being studied quantitatively by leveraging empirical data rather than strictly being
approached through a managerial perspective using subjective data, which is the current in-
dustry standard. Finally, a side but interesting finding is that in our data set, attributes related
to high energy levels (e.g., machinery, hazardous substance) and to human error (e.g., im-
proper security of tools) emerge as strong risk shapers.
KEY WORDS: Construction safety; risk modeling; stochastic simulation
1. INTRODUCTION AND MOTIVATION
Despite the significant improvements that have
followed the inception of the Occupational Safety
and Health Act of 1970, safety performance has
reached a plateau in recent years and the construc-
tion industry still suffers from a disproportionate ac-
cident rate. Fatalities in construction amounted to
885 in 2014, the highest count since 2008.(1) In ad-
dition to dreadful human costs, construction injuries
1Computer Science Laboratory, ´
Ecole Polytechnique, Palaiseau,
France.
2Department of Civil, Environmental, and Architectural Engi-
neering, CU Boulder, USA.
∗Address correspondence to Antoine J.-P. Tixier, Postdoctoral
Researcher, Computer Science Laboratory, ´
Ecole Polytechnique,
Palaiseau, France; antoine.tixier-1@colorado.edu.
are also associated with huge direct and indirect eco-
nomic impacts.
A very large portion of construction work, up-
stream or downstream of groundbreaking, involves
making safety-related decisions under uncertainty.
Partly due to their limited personal history with acci-
dents, even the most experienced workers and safety
managers may miss hazards and underestimate the
risk of a given construction situation.(2,3) Design-
ers face an even greater risk of failing to recognize
hazards and misestimating risk.(2) In addition, when
uncertainty is involved, humans often recourse to
personal opinion and intuition to apprehend their
environment. This process is fraught with numer-
ous biases and misconceptions inherent to human
cognition(4) and compounds the likelihood of misdi-
agnosing the riskiness of a situation.
10272-4332/17/0100-0001$22.00/1 C2017 Society for Risk Analysis
2Tixier, Hallowell, and Rajagopalan
010⋯101
010⋯000
100⋯001
011⋯010
⋮⋮⋮⋱⋮⋮⋮
010⋯011
000⋯100
000⋯001
010⋯010
34
13
25
45
⋮⋮
33
24
12
34
Pbinary attributes
X1X2X3…XP
report1
reportR
report2
report3
severity
injury
reports
data set of R = 814 injury reports, P = 77 attributes
(present=1, absent=0), and real and worst severity
univariate and bivariate risk
modeling, simulation, and estimation
…
raw empirical
data
natural
language
processing
Risk modeling
& simulation
real worst
Fig. 1. Overarching research process: from raw injury reports to safety risk analysis.
Therefore, it is of paramount importance to
provide construction practitioners with tools to
mitigate the adverse consequences of uncertainty
on their safety-related decisions. In this study, we
focus on leveraging situational data extracted from
raw textual injury reports to guide and improve
construction situation risk assessment. Our method-
ology facilitates the augmentation of construction
personnel’s experience and grounds risk assessment
on potentially unlimited amounts of empirical and
objective data. In other words, our approach com-
bats construction risk misdiagnosis on two fronts, by
jointly addressing both the limited personal history
and the judgment bias problems previously evoked.
We used fundamental construction attribute
data extracted by a highly accurate natural language
processing (NLP) system(5) from a database of 921
injury reports provided by a partner company en-
gaged in industrial construction projects worldwide.
Attributes are context-free universal descriptors of
the work environment that are observable prior to
injury occurrence. They relate to environmental con-
ditions, construction means and methods, and human
factors, and provide a unified, standardized way of
describing any construction situation. To illustrate,
one can extract four attributes from the following
text: “worker is unloading a ladder from pickup
truck with bad posture”: ladder, manual handling,
light vehicle, and improper body positioning. Be-
cause attributes can be used as leading indicators
of construction safety performance,(6) they are also
called injury precursors. In what follows, we will
use the two terms interchangeably. Drawing from
national databases, Esmaeili and Hallowell(7,8)
initially identified 14 and 34 fundamental attributes
from 105 fall and 300 struck-by high-severity injury
cases, respectively. In this study, we used a refined
and broadened list of 80 attributes carefully engi-
neered and validated by Prades(9) and Desvignes(10)
from analyzing a large database of 2,201 reports
featuring all injury types and severity levels.
A total of 107 of 921 reports were discarded be-
cause they either were not associated with any at-
tribute or because the real outcome was unknown.
Additionally, 3 attributes out of 80 (pontoon, soffit,
and poor housekeeping) were removed because they
did not appear in any report. This gave us a final ma-
trix of R =814 reports by P =77 attributes. Al-
though other related studies concerned themselves
with predictive modeling,(6) here we focus on defin-
ing, modeling, and simulating attribute-based con-
struction safety risk. The overall study pipeline is
summarized in Fig. 1.
The contributions of this study are fourfold: (1)
we formulate an empirically-grounded definition of
construction safety risk at the attribute level, and ex-
tend it to the situational level, both in the univariate
and the bivariate case; (2) we show how to model
risk using kernel density estimators (KDE); (3) we
observe that the frequency-magnitude distribution
of risk is heavy-tailed, and resembles that of many
natural phenomena; and finally (4) we introduce
univariate and bivariate nonparametric stochastic
generators based on kernels and copulas to draw
conclusions from much larger samples and better
estimate construction safety risk.
Construction Safety Risk Modeling and Simulation 3
Table I. Counts of Injury Severity Levels Accounted for by Each Precursor
Severity Levels
Precursors s1=Pain s2=1st Aid
s3=Medical Case/Lost
Work Time
s4=Permanent
Disablement s5=Fatality
X1n11 n12 n13 n14 n15
X2n21 n22 n23 n24 n25
... ... ... ... ... ...
XP−1n(P−1)1 n(P−1)2 n(P−1)3 n(P−1)4 n(P−1)5
XPnP1 nP2 nP3 nP4 nP5
Table II. Severity Level Impact Scores Adapted from Hallowell
and Gambatese(16)
Severity Level (s) Severity Scores (Ss)
Pain S1=12
1st aid S2=48
Medical case/lost work time S3=192
Permanent disablement S4=1,024
Fatality S5=26,214
2. BACKGROUND AND POINT
OF DEPARTURE
The vast majority of construction safety risk
analysis studies use opinion-based data,(9) and thus
rely on the ability of experts to rate the relative
magnitude of risk based on their professional expe-
rience. This approach suffers two main limitations.
First, prior ranges are very often provided by re-
searchers to bound risk values. Second, and more im-
portantly, even the most experienced experts have
limited personal history with hazardous situations,
and their judgment under uncertainty suffers the
same cognitive limitations as that of any other hu-
man being,(11) such as overconfidence, anchoring,
availability, representativeness, unrecognized limits,
motivation, and conservatism.(11–13) It was also sug-
gested that gender(14) and even emotional state(15)
impact risk perception. Even if it is possible to some-
what alleviate the negative impact of adverse psy-
chological factors,(16) the reliability of data obtained
from expert opinion is questionable. Conversely,
truly objective empirical data, like the injury reports
used in this study, seem superior.
Due to the technological and organizational
complexity of construction work, most safety risk
studies assume for simplicity that construction pro-
cesses can be decomposed into smaller parts.(17)
Such decomposition allows researchers to model risk
for a variety of units of analysis, like specific tasks
and activities.(18–20) Most commonly, trade-level risk
analysis has been adopted.(21–23) The major limita-
tion of these segmented approaches is that because
each one considers a trade, task, or activity in isola-
tion, it is impossible for the end user to comprehen-
sively characterize onsite risk in a standard, robust,
and consistent way.
Some studies attempted to overcome these
limitations. For instance, Shapira and Lyachin(24)
quantified risks for generic factors related to tower
cranes such as type of load or visibility, thereby
allowing safety risk modeling for any crane situation.
Esmaeili and Hallowell(7,8) went a step further by
introducing a novel conceptual framework allowing
any construction situation to be fully and objectively
described by a unique combination of fundamental
context-free attributes of the work environment.
This attribute-based approach is powerful in that
it shows possible the extraction of structured
standard information from naturally occurring,
unstructured textual injury reports. Additionally,
the universality of attributes allows to capture the
multifactorial nature of safety risk in the same
unified way for any task, trade, or activity, which
is a significant improvement over traditional seg-
mented studies. However, manual content analysis
of injury reports is expensive and fraught with data
consistency issues. For this reason, Tixier et al.(5)
introduced an NLP system capable of automatically
detecting the attributes presented in Table III
and various safety outcomes in injury reports with
more than 95% accuracy (comparable to human
performance), enabling the large-scale use of Es-
maeili and Hallowell’s attribute-based framework.
The data we used in this study were extracted by the
aforementioned NLP tool.
4Tixier, Hallowell, and Rajagopalan
Table III. Relative Risks and Counts of the P=77 Injury Precursors
Risk Based on Risk Based on
Real
Worst
Possible Real
Worst
Possible
Precursor ne(%) Outcomes Precursor ne(%) Outcomes
Concrete 29 41 7 96 Unstable support/surface 3 32 1 2
Confined workspace 21 2 115 336 Wind 29 37 6 16
Crane 16 12 22 76 Improper body position 7 25 3 6
Door 17 21 11 174 Imp. procedure/inattention 13 16 10 44
Sharp edge 8 38 2 5 Imp. security of materials 78 12 77 1007
Formwork 22 5 63 135 Insect 19 18 8 21
Grinding 16 16 11 34 No/improper PPE 3 67 0* 1
Heat source 11 20 4 13 Object on the floor 41 43 9 22
Heavy material/tool 29 30 11 247 Lifting/pulling/handling 141 31 49 439
Heavy vehicle 12 12 12 307 Cable tray 9 27 4 11
Ladder 23 14 15 52 Cable 8 33 1 3
Light vehicle 31 59 7 123 Chipping 4 16 1 4
Lumber 69 14 53 158 Concrete liquid 8 41 2 4
Machinery 40 8 67 3159 Conduit 11 31 4 14
Manlift 8 8 16 50 Congested workspace 2 32 0* 1
Object at height 14 50 4 136 Dunnage 2 16 1 3
Piping 74 38 19 141 Grout 3 41 1 1
Scaffold 91 33 28 74 Guardrail handrail 16 40 4 8
Stairs 28 41 8 25 Job trailer 2 59 0* 1
Steel/steel sections 112 35 33 281 Stud 4 41 1 5
Rebar 33 4 76 251 Spool 9 33 2 9
Unpowered transporter 13 9 23 401 Stripping 12 22 7 18
Valve 24 27 9 22 Tank 16 31 5 115
Welding 25 22 10 34 Drill 16 43 5 88
Wire 30 43 5 19 Bolt 36 41 7 27
Working at height 73 40 18 46 Cleaning 22 56 5 12
Wkg below elev. wksp/mat. 7 17 3 21 Hammer 33 50 5 18
Forklift 11 9 9 380 Hose 11 41 3 8
Hand size pieces 38 47 7 95 Nail 15 50 4 10
Hazardous substance 33 1 590 6648 Screw 7 50 1 2
Adverse low temperatures 33 3 101 292 Slag 10 10 8 32
Mud 6 6 9 20 Spark 1 12 2 11
Poor visibility 3 23 2 3 Wrench 23 39 5 23
Powered tool 32 27 12 54 Exiting/transitioning 25 49 6 17
Slippery surface 32 25 13 40 Splinter/sliver 9 44 1 2
Small particle 96 31 28 105 Working overhead 5 40 1 3
Unpowered tool 102 44 24 352 Repetitive motion 2 51 0* 1
Electricity 1 33 0* 1 Imp. security of tools 24 22 12 314
Uneven surface 33 32 11 129
*Values are rounded up to the nearest integer.
3. UNIVARIATE ANALYSIS
3.1. Attribute-Level Safety Risk
Following Baradan and Usmen(21) we defined
construction safety risk as the product of frequency
and severity as shown in Equation (1). More pre-
cisely, in our approach, the safety risk Rpaccounted
for by precursorp(or XPin Tables I) was computed
as the product of the number nps of injuries at-
tributed to precursorpfor the severity level s (given
by Table II) and the impact rating Ssof this severity
level (given by Table II, and based on Hallowell and
Gambatese(16)). We considered five severity levels,
s1=Pain, s2=First Aid, s3=Medical Case/Lost Work
Time, s4=Permanent Disablement, and s5=Fatality.
Medical Case and Lost Work Time were merged
Construction Safety Risk Modeling and Simulation 5
because differentiating between these two severity
levels was not possible based only on the information
available in the narratives and associated databases.
Equation (1) shows construction safety risk.
risk =frequency ×severity.(1)
The total amount of risk that can be attributed
to precursorpwas then obtained by summing the risk
values attributed to this precursor across all severity
levels, as shown in Equation (2):
Rp=
5
s=1nps Ss,(2)
where nps is the number of injuries of severity level s
attributed to precursorp, and Ssis the impact score of
severity level s.
Finally, as noted by Sacks et al.,(25) risk analy-
sis is inadequate if the likelihood of worker expo-
sure to specific hazards is not considered. Hence, the
risk Rpof precursorpwas weighted by its probability
of occurrence ep(see Equation (3)), which gave the
relative risk RRpof precursorp. The probabilities ep,
or exposure values, were provided by the same com-
pany that donated the injury reports. These data are
constantly being recorded by means of observation as
part of the firm’s project control and work character-
ization policy and therefore were already available.
RRp=1
ep×Rp=1
ep
5
s=1npsss,(3)
where Rpis the total amount of risk associated with
precursorp, and epis the probability of occurrence of
precursorponsite.
To illustrate the notion of relative risk, as-
sume that the precursor lumber has caused 15
first aid injuries, 10 medical cases and lost work
time injuries, and has once caused a permanent
disablement. By following the steps outlined
above, the total amount of risk Rlumber accounted
for by the attribute lumber can be computed as
15 ×48 +10 ×192 +1×1,024 =3,664. More-
over, if lumber is encountered frequently onsite, e.g.,
with an exposure value elumber =0.65, the relative
risk of lumber will be RRlumber =3,664/0.65 =
5,637. However, if workers are very seldom exposed
to lumber (e.g., elumber =0.07), RRlumber will be
equal to 3,664/0.07 =52,343. It is clear from this
example that if two attributes have the same total
risk value, the attribute having the lowest exposure
value will be associated with the greatest relative
risk. The assumption is that if a rare attribute causes
as much damage as a more common one, the rare
attribute should be considered riskier by proportion.
Note that relative risk values allow comparison
but do not have an absolute physical meaning. As
presented later, what matters more than the precise
risk value itself is the range in which a value falls.
Also, note that since Tixier et al.(5)’s NLP
tool’s functionality did not include injury severity
extraction at the time of writing, we used the real
and worst possible outcomes manually assessed for
each report by Prades.(9) Specifically, in Prades,(9) a
team of seven researchers analyzed a large database
of injury reports over the course of several months.
High output quality was ensured by using a harsh
95% intercoder agreement threshold, peer reviews,
calibration meetings, and random verifications by
an external reviewer. Regarding worst possible
injury severity, human coders were asked to use
their judgment of what would have happened in
the worst-case scenario should a small translation in
time and/or space had occurred. This method and the
resulting judgments were later validated by Alexan-
der et al.,(26) who showed that the human assessment
of maximum possible severity was congruent with
the quantity of energy in the situation, which, ulti-
mately, is a reliable predictor of the worst possible
outcome.
For instance, in the following excerpt of an in-
jury report: “worker was welding below scaffold and
a hammer fell from two levels above and scratched
his arm,” the real severity is a first aid. However, by
making only a small translation in space, the ham-
mer could have struck the worker in the head, which
could have yielded a permanent disablement or even
a fatality. Coders in Prades(9) were asked to favor
the most conservative choice. Thus, in this case, per-
manent disablement was selected. Whenever mental
projection was impossible or required some degree of
speculation, coders were required to leave the field
blank and the reports were subsequently discarded.
As indicated, these subjective assessments were em-
pirically validated.(26)
By considering severity counts for both real out-
comes and worst possible outcomes, we could com-
pute two relative risk values for each of the 77 pre-
cursors. These values are listed in Table III, and were
stored in two vectors of length P=77.
For each attribute, we computed the difference
between the relative risk based on worst possible
outcomes and the relative risk based on actual
outcomes. The top 10% attributes for this metric are
6Tixier, Hallowell, and Rajagopalan
hazardous substance (࢞=6,059), machinery (3,092),
improper security of materials (930), lifting/pulling/
manual handling (390), unpowered transporter
(378), forklift (371), unpowered tool (328), impro-
per security of tools (302), and heavy vehicle (295).
These attributes can be considered as the ones
giving a construction situation the greatest po-
tential for severity escalation in the worst-case
scenario. Except lifting/pulling/manual handling
and unpowered tool, all these precursors are di-
rectly associated with human error or high energy
levels, which corroborates recent findings.(26) Fur-
thermore, one could argue that the attributes
lifting/pulling/manual handling and unpowered tool
are still related to human error and high en-
ergy levels, as the former is often associated
with improper body positioning (human factor)
whereas the latter usually designates small and
hand-held objects (hammer, wrench, screwdriver,
etc.) that are prone to falling from height (high
energy). Many attributes in Table III, such as
sharp edge, manlift, unstable support/surface, or
improper body position, have low risk values be-
cause of their rarity in the rather small data set
that we used to provide a proof of concept for our
methodology, but this does not incur any loss of
generality.
3.2. Report-Level Safety Risk
As shown in Equation (4), we define safety risk
at the situational level as the sum of the risk values
of all the attributes that were identified as present in
the corresponding injury report.
Rreportr=
P
p=1RRp·δrp,(4)
where RRpis the relative risk associated with
precursorp, and δrp =1 if precursorpis present in
reportr(δrp =0else).
In practice, computing real (or worst) safety risk
at the report level comes down to multiplying the
(R,P) attribute binary matrix (attribute matrix of
Fig. 1) by the (P,1) relative real (or worst) risk vector
as shown in Equation (5). In the end, two risk val-
ues (real and worst) were obtained for each of the
R=814 incident reports.
For instance, in the following description of a
construction situation: “worker is unloading a ladder
from pickup truck with bad posture,” four attributes
are present: namely (1) ladder, (2) manual handling,
(3) light vehicle, and (4) improper body positioning.
The risk based on real outcomes for this construc-
tion situation is the sum of the relative risk values of
the four attributes present (given by Table III), that
is, 15 +49 +7+3=74, and similarly, the risk based
on worst potential outcomes is 52 +439 +123 +6=
620. As already stressed, these relative values are
not meaningful in absolute terms, they only enable
comparison between situations and their categoriza-
tion into broad ranges of riskiness (e.g., low, medium,
high). Estimating these ranges on a small, finite sam-
ple such as the one we used in this study would
have resulted in biased estimates. To alleviate this,
we used stochastic simulation techniques to generate
hundreds of thousands of new scenarios honoring the
historical data, enabling us to make inferences from
a much richer, yet faithful sample.
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
010···101
010···100
100···001
011···010
.
.
..
.
..
.
.....
.
..
.
..
.
.
010···011
000···000
000···001
010···010
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
·
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
RR1
RR2
RR3
.
.
.
RR(P−2)
RR(P−1)
RRP
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
=,
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
Rreport1
Rreport2
Rreport3
Rreport4
.
.
.
Rreport(R−3)
Rreport(R−2)
Rreport(R−1)
RreportR
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
(5)
Multiplying the (R,P) attribute matrix by the
(P,1) vector of relative risk values for each attribute
gives the (R,1) vector of risk values associated with
each injury report.
3.3. The Probability Distribution of Construction
Safety Risk Resembles That of Many
Natural Phenomena
For a given injury report, the risk based on real
outcomes and the risk based on worst potential out-
comes can each take on a quasi-infinite number of
values (2P−1) with some associated probabilities.
Therefore, they can be considered quasi-continuous
random variables, and have legitimate probability
distribution functions (PDFs). Furthermore, since a
risk value cannot be negative by definition, these
PDFs have [0,+∞[ support.
The empirical PDF of the risk based on real out-
comes for the 814 injury reports is shown as a his-
togram in Fig. 2. The histogram divides the sample
Construction Safety Risk Modeling and Simulation 7
risk based on real outcomes
probability
0 200 400 600
0 0.1 0.2 0.3 0.4
●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●● ●●●●● ●●● ● ● ●● ●●●●● ●●●●●●● ●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●● ●●●● ●●●●● ●●●●●● ●●●●●●●●●● ●●● ● ● ●●●●● ●● ● ●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●● ●●● ●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●● ● ●●● ●●●●●● ●●● ●●●●●●●●●●●●●●●●●● ● ●●●● ●●● ● ●●●●●●●●● ●●●●●●●●●●● ● ●●●●●●●●●●●●● ●●● ●●● ●●● ●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●● ● ● ●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●