ArticlePDF Available

Probability Bounds Analysis in Environmental Risk Assessment

Authors:

Abstract and Figures

This document provides a detailed overview of probability bounds analysis. In the sections that follow, the conceptual background of the approach is briefly presented, followed by the mathematical derivation of probability bounds around parametric, non-parametric, empirical, and assumed or stipulated models. Computation with p-boxes is then described, and numerical examples of computations are provided. In the next section, probability bounds analysis is compared and contrasted with Monte Carlo simulation techniques. Methods used by Monte Carlo analysts for treating input variables, dependencies between input variables, and model uncertainty are compared to methods used in probability bounds analysis. Techniques for implementing microexposure event analysis models are also compared, along with methods for conducting sensitivity analysis. Finally, the use of probability bounds analysis within the tiered framework for conducting probabilistic risk assessments recommended by EPA is discussed.
Content may be subject to copyright.
Probability bounds analysis in
environmental risk assessments
W. Troy Tucker and Scott Ferson
Applied Biomathematics
100 North Country Road
Setauket, New York 11733
troy@ramas.com
, scott@ramas.com
Copyright 2003 Applied Biomathematics,
All rights reserved worldwide
2
Table of contents
1. Introduction................................................................................................... 3
2. Background ................................................................................................... 4
3. Deriving and computing probability bounds ............................................. 5
3.1 Parametric models .................................................................................. 6
3.2 Nonparametric models............................................................................ 8
3.3 Empirical models .................................................................................. 12
3.4 Assumed distributions ........................................................................... 13
3.5 Computing with probability bounds...................................................... 13
3.6 A numerical example............................................................................. 17
3.7 Another numerical example .................................................................. 20
4. Probability bounds analysis compared to Monte Carlo simulation....... 21
4.1 Uncertainty and variability................................................................... 22
4.2 Construction of input variables ............................................................ 23
4.2.1 Point estimates and their uncertainty ............................................ 23
4.2.2 Probability distributions................................................................ 25
4.2.3 Probability distributions with uncertain parameters ..................... 25
4.2.4 Uncertainty regarding the choice of probability distribution........ 26
4.2.5 Uncertainty about variability ........................................................ 27
4.3 Dependencies ........................................................................................ 28
4.3.1 Correlation and dependence among variables .............................. 28
4.3.2 Monte Carlo simulation strategies to account for dependencies .. 29
4.3.3 Probability bounds analysis approaches to dependencies............. 31
4.4 Model uncertainty ................................................................................. 33
4.5 Truncation............................................................................................. 37
4.6 Discretization error .............................................................................. 38
4.7 Microexposure event modeling ............................................................. 39
4.7.1 Dependency issues ........................................................................ 42
4.8 Sensitivity analysis................................................................................ 43
4.8.1 Generalizing sensitivity analyses for use with probability boxes. 44
4.8.2 Probability bounds analysis is a sensitivity analysis..................... 46
4.8.3 Sensitivity analysis on top of probability bounds analysis ........... 47
4.9 Probability bounds analysis within the tiered approach ...................... 53
4.10 Synopsis................................................................................................. 55
5. References.................................................................................................... 57
3
1. INTRODUCTION
Probability bounds analysis combines probability theory and interval arithmetic to
produce probability boxes (p-boxes), structures that allow the comprehensive
propagation of both variability and uncertainty through calculations in a rigorous
way. A bounding approach to risk analysis extends and complements traditional
probabilistic analyses under circumstances when analysts cannot specify (1)
precise parameter values for input distributions or point estimates in the risk
model (e.g., minimum, maximum, moments, etc.); (2) precise (marginal)
probability distributions for some or all of the variables in the risk model (which
may be thought of as uncertainty regarding the nature of the variability); (3) the
precise nature of dependencies between variables in the risk model; and (4) the
exact structure of the risk model (model uncertainty). These circumstances
correspond to the types of uncertainty highlighted in EPA’s guidance for
probabilistic risk assessment at Superfund sites (EPA, 2001). In addition, a
bounding approach is useful when an analyst needs to backcalculate cleanup
targets, and as a reliability assessment for Monte Carlo simulation results.
EPA guidance (EPA, 2001, sections 1.2.4 and 3.4.1) argues that one should avoid
developing probability distributions that intermingle or try to represent both
variability and uncertainty. The reason for this is that a single probability
distribution must be interpretable either as an expression of variability under a
frequentist interpretation of probability, or as an expression of uncertainty under a
subjectivist interpretation of probability (Morgan and Henrion 1990). An
intermingling of these two interpretations would be meaningless. Although p-
boxes model them both, uncertainty and variability are never confounded with a
p-box and are clearly distinguishable in the results of the analysis and recoverable
through sensitivity analysis to the maximum extent possible given the state of
knowledge regarding each variable. This is reasonable and in full conformance
with guidance because a p-box is not a single distribution. It represents a class of
distributions. Any particular individual from the class can be thought of as
representing the variability under a pure frequentist interpretation. The class, as a
whole, represents the associated uncertainty. There is no need to invoke a
subjectivist interpretation for probability.
This document provides a detailed overview of probability bounds analysis. In
the sections that follow, the conceptual background of the approach is briefly
presented, followed by the mathematical derivation of probability bounds around
parametric, non-parametric, empirical, and assumed or stipulated models.
Computation with p-boxes is then described, and numerical examples of
computations are provided. In the next section, probability bounds analysis is
compared and contrasted with Monte Carlo simulation techniques. Methods used
4
by Monte Carlo analysts for treating input variables, dependencies between input
variables, and model uncertainty are compared to methods used in probability
bounds analysis. Techniques for implementing microexposure event analysis
models are also compared, along with methods for conducting sensitivity analysis.
Finally, the use of probability bounds analysis within the tiered framework for
conducting probabilistic risk assessments recommended by EPA is discussed.
2. BACKGROUND
Probability bounds analysis is a combination of the methods of standard interval
analysis (Moore, 1966; Neumaier, 1990) and classical probability theory (see,
inter alia, Feller, 1968; 1971). The idea of bounding probability has a very long
tradition throughout the history of probability theory. Indeed, George Boole
(1854; Hailperin, 1986) used the notion of interval bounds on probability.
Chebyshev (1874) described bounds on a distribution when only the mean and
variance of the variable are known, and Markov (1886) found bounds on a
positive variable when only the mean is known. Fréchet (1935) discovered how
to make calculations with uncertain estimates of total probabilities without
making independence assumptions. Bounding probabilities has continued to the
present day (e.g., Walley and Fine, 1982; Loui, 1986; Madan and Owings, 1988;
Walley, 1991; Tessem, 1992; Smith, 1995). Kyburg (1998) reviewed the history
of interval probabilities and traces the development of the critical ideas from the
middle of the previous century. Probabilistic analyses using bounding arguments
of one kind or another are common today in environmental risk assessments (e.g.,
Goldwasser et al., 2000; Donald and Ferson, 1997; Spencer et al., 1999, 2001;
Regan et al., 2002a, 2002b). The methods are also being used in the Calcasieu
Estuary Project in Louisiana (for instance, see the website at
http://www.epa.gov/earth1r6/6sf/sfsites/calcinit.htm).
The methods of probability bounds analysis that could be routinely used in
environmental risk assessments were developed in the 1980s. Yager (1986)
described the elementary procedures by which bounds on convolutions
1
can be
computed under an assumption of independence. At about the same time, Frank
et al. (1987) solved a question originally posed by A.N. Kolmogorov about how
to find bounds on distributions of sums of random variables when no information
about their interdependency was available. Extending the approach of Frank et al.
(1987), Williamson and Downs (1990) developed a semi-analytical approach that
computes rigorous bounds on the cumulative distribution functions of
1
Convolution is the operation of finding the probability distribution of a sum of random variables
specified by probability distributions. The term is also applied to finding distributions of
products, differences, quotients, etc.
5
convolutions without necessarily assuming independence between the operands.
They demonstrated how the method can be used to estimate rigorous and optimal
bounds on the distributions of sums (or products, differences, or quotients) of
random variables specified only by their marginal distributions, without any
information about the dependence among the variables. Williamson and Downs
also described a method, similar to that of Yager (1986), to compute bounds on
distributions under independence assumptions. Because their approach uses
interval bounds to represent discretization and dependency errors, it can also
account for uncertainty about the shape of input distributions themselves.
Probability bounds analysis is closely allied in spirit with robust Bayes techniques
(Berger, 1985; Insua and Ruggeri, 2000), also called Bayesian sensitivity analysis.
In this approach, an analyst’s uncertainty about which prior distribution should be
used is expressed by replacing a single precise prior distribution by an entire class
of prior distributions. The analysis proceeds by studying the variety of outcomes
as each possible prior distribution is considered. In this approach, uncertainty
about the likelihood function or even the utility function can likewise be
expressed with classes of likelihood functions and classes of utility functions.
3. DERIVING AND COMPUTING PROBABILITY BOUNDS
A real-valued random variable is characterized by its distribution function, which
is a montonically increasing function from the real numbers onto the interval [0,
1] such that the value of the function at negative infinity is zero and the value of
the function at positive infinity is one. A probability box or “p-box” consists of a
pair of such functions that are used to circumscribe an imprecisely known
distribution function F. The p-box, consisting of the pair of bounding functions,
is identified with the class of probability distributions that lie entirely within these
bounds. In computations, it is often convenient to express a p-box in terms of its
inverse functions d and u defined on the interval of probability levels [0,1]. The
function u is the inverse function of the upper bound on the distribution function
and d is the inverse function of the lower bound. These monotonic functions are
bounds on the inverse of the unknown distribution function F
d(p) F
-1
(p) u(p),
where p is probability level.
Suppose, for example, that all that is known about the distribution function F is
that the random variable X it describes has a minimum of min and a maximum of
max. Figure 1 plots d(p) and u(p) for this case, in cumulative distribution function
(CDF) format. In the figure, the u function’s plot follows the left side and top of
the box. The d function’s plot follows the bottom and right side of the box. The
6
remainder of this section shows how these bounds, and refinements of them, can
be used in risk analyses in a way that properly acknowledges what is known and
what is unknown about distribution functions.
0
0.5
1
1234 5678
Value of X
Cumulative Probability
u
d
min max
0
0.5
1
1234 5678
Value of X
Cumulative Probability
u
d
min max
Figure 1. Probability bounds around the unknown distribution
function F of the random variable X, where the only information
available is the minimum (min) and maximum(max) values of X
3.1 PARAMETRIC MODELS
It is simple to compute probability bounds for many cases in which the
distribution family is specified but only interval estimates can be given for the
parameters. For instance, suppose that, from previous knowledge, it is assumed
that a distribution is lognormal, but the precise values of the parameters that
would define this distribution are uncertain. If there exist bounds on and
(mean and standard deviation), bounds on the distribution can be obtained by
computing the envelope of all lognormal distributions that have parameters within
the specified intervals. These bounds are
p
Lpd
1
max
pLpu
1
max
7
where:
2
121
,,,|,
and L is the CDF of a lognormal distribution with such parameters. In principle,
making these calculations might be a difficult task since indexes an infinite set
of distributions. However, in practice, finding the bounds requires computing the
envelope over only four distributions: those corresponding to the parameter sets
1
,
1
), (µ
1
,
2
), (µ
2
,
1
), and (µ
2
,
2
). This simplicity is the result of how the
family of distributions happens to be parameterized by µ and . Nevertheless, it
is just as easy to find probability bounds for cases with other commonly used
distribution families such as normal, uniform, exponential, Cauchy, and many
others.
This approach can also be applied in cases where empirical information is
available. Grosof (1986) suggested that standard confidence interval procedures
can be used to deduce probability bounds. For instance, instead of selecting the
lognormal distribution whose parameters are the best estimates from a limited
empirical study, one can incorporate some of the sampling uncertainty from the
study by using bounds computed from confidence intervals around the
parameters. As an example, suppose that strong arguments or convincing
evidence implies that a distribution is lognormal in form, with its µ and known
only within interval ranges. Figure 2 illustrates probability bounds for the case
{shape=lognormal, µ=[0.5, 0.6], =[0.05, 0.1]}, for which the true distribution is
known to be lognormal with µ somewhere in the interval [0.5, 0.6] and in [0.05,
0.1]. The gray line is the distribution that corresponds to assuming that the
midpoints of the intervals are precisely the true µ and for the distribution.
8
0
0.5
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Value of the Lognormally Distributed Variable
Cumulative Probability
Figure 2. Bounds on the CDF of a lognormal distribution having
somewhere in the interval [0.5, 0.6] and somewhere in [0.05, 0.1].
The gray line is the CDF for the lognormal distribution with =0.55
and =0.075.
3.2 NONPARAMETRIC MODELS
Probability bounds can also be derived for a variety of circumstances in which
distribution shape cannot be reliably determined and empirical information is very
limited. The case in which the only available information is {min, max} has
already been discussed. Further information, such as knowing that the mean is a
particular value, can be used to constrain this box to a smaller region. Figure 3
depicts bounds when only {min, max, mean} are known. The derivation of these
bounds is straightforward. First, consider the range of x-values between min and
mean. The upper bound over this range can be found by determining the largest
possible values attained by a CDF under the specified constraints. Consider an
arbitrary value x [min, mean]. The value p of a CDF at x represents probability
mass at and to the left of x. However much mass there is, it must be balanced by
mass on the right of the mean. The greatest possible mass would be balanced by
assuming that the rest of the probability (1-p) is concentrated at max. Likewise,
the arrangement of mass on the left side requires the least balance when it is all
concentrated at x. These considerations lead to the expression
px + (1 – p)max = mean
9
which can be solved to yield
xmax
meanmax
p
which specifies the largest value of the CDF for the value x. If there were any
more probability mass at values x, the constraint of the mean could not be
satisfied by any arrangement of mass at values max. Clearly then, the spike
distributions defined by this expression describe the bounding CDF over the range
[min, mean], subject to the fundamental constraint 0 p 1. The position of the
lower bound is determined by the degenerate distribution which has all its mass at
the mean. The CDF for this distribution follows the x-axis from min to mean.
Lower and upper bounds for values larger than the mean can be derived by similar
(but upside-down) arguments. The resulting bounds are optimal in the sense that
they could not be any tighter without excluding at least some portion of a CDF
from a distribution satisfying the specified constraints. It is important to
understand, however, that this does not mean that any distribution whose CDF is
inscribed within this bounded probability region would satisfy the constraints.
0
0.5
1
02468
x
Cumulative Prob.
min mean max
A
0
0.5
1
02
4
68
x
Cumulative Prob.
min mean max
B
0
0.5
1
02468
x
Cumulative Prob.
min mean max
C
Figure 3. Bounds on the CDF of a distribution with specified
minimum (min), maximum (max), and mean. Plot A shows the case
where mean is closer to min than max. Plot B shows the case where
mean is exactly between min and max. Plot C shows the case where
mean is closer to max than min.
10
The most useful information is knowledge of medians, which pinches the
uncertainty to a definite point. When information is limited to {min, max,
median}, bounds on probability such as are depicted in Figure 4 can be obtained.
Having reliable knowledge of other percentiles would correspond to similar points
at other probability levels through which the true distribution must pass (e.g.
Figure 4, plot D).
0
0.5
1
1234567
x
Cumulative Prob.
min median max
A
0
0.5
1
123
4
567
x
Cumulative Prob.
min median max
B
0
0.5
1
1234567
x
Cumulative Prob.
min median max
C
0
0.5
1
123
4
567
x
Cumulative Prob.
min 25% median 75% max
D
Figure 4. Bounds on the CDF of a distribution with known minimum
(min), maximum (max) and median. Plots A, B, and C show the
cases where median is closer to min than max, median is exactly
between min and max, and median is closer to max than min,
respectively. Plot D shows bounds on the CDF when min, max,
median, and two other percentiles (25
th
and 75
th
) are all known.
If it can be assumed that the underlying distribution is unimodal and that reliable
estimates are available for {min, max, mode}, then the probability bounds are
depicted in Figure 5. Again, not every curve contained in this region satisfies the
given constraints. However, the bounds are optimal in the sense that they could
not be tighter without excluding some distribution that does satisfy the specified
constraints.
11
0
0.5
1
02468
x
Cumulative Prob.
min mode max
A
0
0.5
1
02
4
68
x
Cumulative Prob.
min mode max
B
0
0.5
1
02468
x
Cumulative Prob.
min mode max
C
Figure 5. Bounds on the CDF of a distribution with known minimum
(min), maximum (max), and mode. Plots A, B, and C show cases
where the mode is closer to min than to max, exactly between min
and max, and closer to max than to min, respectively.
As with parametric models, it will usually be the case that the mean, percentiles,
and other parameters are only imperfectly known. This suggests that confidence
intervals rather than point estimates should be used for them. Likewise, when min
and max are determined by empirical observation rather than by mathematical
argument or theoretical consideration, they are better estimated as confidence
intervals whenever point estimates would imply a false certainty. The envelope
of all possible underlying distributions can be calculated as the union of bounded
probability regions in a manner analogous to that used above to compute
probability bounds for parametric models. Again, in many cases, the calculation
of the bounds is a fairly simple matter even though it formally implies
consideration of an infinite family of curves. All of the cases derived so far, for
instance, are easy to generalize so that any of the numeric parameter values may
be given either as a fixed number or an interval.
Further cases may also be derived for simultaneous multiple constraints by
intersecting the bounded probability regions obtained for each constraint
separately. The bounds would be
)(max)( pdpd
i
i
12
)(max)( pupu
i
i
where i indexes the constraints. For instance, if the min and max values are
known and reliable estimates of both the mean and the mode are available, it is
justifiable to construct probability bounds for this case by intersecting the regions
described for {min, max, mode} and {min, max, mean}. Although this approach
will often yield optimal bounds, it is not guaranteed to do so because it cannot
take into account constraints that are simultaneous and interacting. For instance,
if only {min, max, mean=median} or {min, max, median=mode} is known,
intersecting the bounded regions will not in general yield optimal bounds, which
must be derived separately for these cases.
3.3 EMPIRICAL MODELS
Kolmogorov-Smirnov confidence limits (Sokal and Rohlf, 1994) on an empirical
histogram can be used to construct input distributions. This method requires the
analyst to specify both the range (outside of which the distribution is truncated)
and the confidence level to be used. Figure 6 illustrates an empirical histogram
composed from a sample of six points (0.2, 0.5, 0.6, 0.7, 0.75, and 0.8) for a
variable constrained between zero and one. The empirical distribution function
histogram is shown in gray and the 95% Kolmogorov-Smirnov limits are shown
in black
2
. The bounds would be tighter if there were more samples or if a lower
confidence level were used.
2
A maximum entropy (Jaynes, 1957) solution for a given set of empirical data is discussed by
Solana and Lind (1990).
13
0
0.5
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.8
0.9 1
Value of the empirical variable
Cumulative Probability
0.75
Figure 6. Kolmogorov-Smirnov 95% confidence bounds on an
empirical distribution made up of eight values (0.2, 0.5, 0.6, 0.7, 0.75,
and 0.8) assuming the distribution ranges from 0 to 1.
3.4 ASSUMED DISTRIBUTIONS
In some cases, an analyst may be legitimately confident about the shapes and
parameters of some input distributions based on good empirical evidence. This
confidence can be represented by using osculating bounds
d(p) = F
-1
(p) = u(p)
describing the distribution function. Software is available which implements
most of the commonly used distributions, including Bernoulli, Cauchy, Dirac’s
delta, discrete uniform, exponential, geometric, Gumbel, Laplace, logistic,
lognormal, logtriangular, loguniform, normal, Pareto, power function, Rayleigh,
triangular, uniform, and Weibull. For distributions with theoretically infinite
supports, the software permits automatic truncation at the 100 % and 100(1– )%
percentiles, where can be selected by the analyst.
3.5 COMPUTING WITH PROBABILITY BOUNDS
Williamson and Downs (1990) provided explicit numerical methods for
computing bounds on the result of addition, subtraction, multiplication and
14
division of random variables when only bounds on the input distributions are
given. These algorithms have been implemented in software (Ferson and Long,
1995; Ferson et al., 1997; Ferson, 2002) and have been extended to
transformations such as logarithms and square roots, other convolutions such as
minimum, maximum and powers, and other dependence assumptions. Using the
techniques of mathematical programming, Berleant and his colleagues (1993;
1996; Berleant and Cheng, 1998; Berleant and Goodman-Strauss, 1998)
independently derived—and also implemented in a software—algorithms to
compute convolutions of bounded probability distributions, both with and without
independence assumptions.
Because all of the necessary mathematical operations can be performed using p-
boxes, the input distributions used in a probabilistic risk assessment need not be
particular, well-defined statistical distributions. Suppose that variables A and B
have bounds (d
A
, u
A
) and (d
B
, u
B
) respectively, and that each of these four
functions is evenly discretized into m+1 elements. Assuming A and B are
independent, the bounds on the sum A+B have a discretization
d(i/m), u(i/m) i {0, 1, 2, …, m}
where d(i/m) is approximated by the (i+im+m)th element of a numerical sorting of
the (m+1)
2
values
d
A
(j/m) + d
B
(k/m) j,k {0, 1, 2, …, m}
and u(i/m) is approximated by the (i+im)th element of a numerical sorting of the
values
u
A
(j/m) + u
B
(k/m) j,k {0, 1, 2, …, m}.
The algorithm for subtraction is virtually the same except that the pluses between
the d’s and the u’s are replaced by minuses. Multiplication and division use their
respective operators too, so long as both variables are strictly positive. A more
elaborate algorithm is required in the general case, although division is undefined
whenever the divisor includes zero. Aside from the four basic arithmetic
operations, it is also possible to handle other functions as well, including
minimum, maximum, logarithms, exponentiation and integral powers.
The three graphs in Figure 7 depict the modeling of a sum using these algorithms.
The quantity A depicted in the graph on the left is modeled as a lognormal
distribution whose mean is in the interval [20, 23] and whose standard deviation is
in the interval [3.5, 4.5]. The distribution is truncated at the 0.5
th
and 99.5
th
percentiles. The quantity B, depicted in the middle graph, has a symmetric
triangular distribution, with minimum value 10, mode 20, and maximum 30. The
15
upper and lower bounds for this distribution touch each other because the
distribution is precisely known; the bounds are just wide enough to contain the
representation error introduced by discretization of the continuous distribution.
The sum A+B of these two quantities, computed under the assumption that they
are mutually independent, is depicted in the right graph of Figure 7.
A B A+B
20 30
40
50 60 70
0
0.5
1
20 30
40
50 60 70
0
0.5
1
0 10 20 30 40
0
0.5
1
0 10 20 30 40
0
0.5
1
10 20 30 40
0
0.5
1
10 20 30 40
0
0.5
1
Figure 7. Convolution A+B (right) of A (left) and B (middle)
Figure 8 is a matrix containing a few of the calculations showing how this
addition is computed. Each addend is decomposed into a collection of intervals
called focal elements. Each focal element is paired with a probability mass that
depends on the discretization scheme employed. In this case, 100 discretization
levels are used, so the focal elements are [u(i/100), d((i+1)/100)], where i {0, 1,
2,…, 99} and every probability mass is 1/100. The first line in each cell is an
interval focal element and the second line is the probability mass associated with
that focal element. The elements of A are arrayed along the top row of the matrix.
The elements of B are in the first column. The cells inside the matrix form the
Cartesian product, crossing each element from A with every element from B. The
first line of a cell inside the matrix is determined by interval arithmetic on the
corresponding focal elements from A and B. Because the model asserts that the
quantity is the sum of A and B, each of these interval operations is addition. The
second line in each cell is the probability mass associated with the interval on the
first line. The probability masses in the top row and first column are each 0.01;
these are the masses that arose from the discretization of the continuous
distributions. The masses inside the matrix are all 0.0001, which is the product
(under independence) of 0.01 and 0.01. Because there are 100 focal elements in
both A and B, there will be 10,000 focal elements in their sum. Williamson
(1989) describes a condensation strategy that can reduce this number back to 100
in a way that conservatively captures uncertainty.
16
addition,
independent
[11.0, 15.0]
0.01
[11.6, 16.7]
0.01
[12.4, 17.1]
0.01
[27.9, 35.7]
0.01
[29.2, 37.5]
0.01
[10, 11.4]
0.01
[21, 26.4]
0.0001
[21.6, 28.1]
0.0001
[22.4, 28.5]
0.0001
[37.9, 47.1]
0.0001
[39.2, 48.9]
0.0001
[11.4 12.0]
0.01
[22.4, 27]
0.0001
[23, 28.7]
0.0001
[23.8, 29.1]
0.0001
[39.3, 47.7]
0.0001
[40.6, 49.5]
0.0001
[12.0 12.4]
0.01
[23, 27.4]
0.0001
[23.6, 29.1]
0.0001
[24.4, 29.5]
0.0001
[39.9, 48.1]
0.0001
[41.2, 49.9]
0.0001
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
[28.0 28.6]
0.01
[39, 43.6]
0.0001
[39.6, 45.3]
0.0001
[40.4, 45.7]
0.0001
[55.9, 64.3]
0.0001
[57.2, 66.1]
0.0001
[28.6 30.0]
0.01
[39.6, 45]
0.0001
[40.2, 46.7]
0.0001
[41, 47.1]
0.0001
[56.5, 65.7]
0.0001
[57.8, 67.5]
0.0001
Figure 8. Matrix of interval focal elements (first line in each cell) and
associated probability mass (second line in each cell) used to
compute the sum of a p-box and a probability distribution. Focal
elements of A are arrayed across the top row. Focal elements of B
are arrayed down the first column. Other cells in the matrix contain
the results of the Cartesian product of the first row and first column.
Williamson and Downs (1990) also describe numerical methods for computing
bounds without using an assumption of independence between the variables.
Bounds on the sum of A and B, for example, are
mm
j
i
dmjdmid
BA
m
i
j
/
/min/
m
j
i
u
mjumiu
BA
i
j
//min/
0
where i varies between 0 and m. These bounds are guaranteed to enclose the true
answer no matter what correlation or statistical dependency exists between A and
B. Similar expressions can be used for subtraction, multiplication, division and
other mathematical operations. These methods constitute a comprehensive
dependency bounds analysis (Ferson and Long, 1994).
These algorithms are very general and can be used for practically any distribution
that has a finite support (which means that, as for Monte Carlo methods, infinite
distributions must be truncated to a finite range). The resulting bounds can also
be used in subsequent calculations. For instance, the bounds for A+B may be
combined with C where C is a particular CDF or bounds on a CDF. This
numerical method can therefore be used in risk analysis to compute bounds on
probability distributions (Ferson and Long, 1994; Ferson and Burgman, 1995;
17
Ferson, 1995). Under certain conditions and if the mathematical expression
contains only one instance of each variable, then the resulting bounds are,
furthermore, the best bounds possible (Williamson, 1989). In any case, the
bounds on the probabilities are guaranteed to be conservative. This is to say that,
if the model is correct and the input bounds enclose the true distributions of the
inputs, then the bounds obtained by this approach are guaranteed to enclose the
true answer.
Since the bounds computed from these arithmetic operations are guaranteed by
construction to enclose the true answer, they can be analyzed to discover facts
about the distribution of the true result, even without computing it directly. These
facts are usually also in the form of constraints and inequalities. For instance,
from bounds on the distribution function, bounds on the distribution’s mean can
be computed. Bounds on the variance of the sum, as well as higher-order
functions, can also be computed, although these may often be too wide to be of
practical interest. Often the most interesting inequalities pertain to the bounds on
the tails of the distribution.
Distribution tails, which are often the focus of regulatory concern (Lambert et al.,
1994), can be and often are very sensitive to the shapes of input distributions
(Haas, 1997; Bukowski et al., 1995; Ferson, 1994). Probability bounds analysis
gives analysts a computational way to avoid underestimating tails even when they
cannot supply much empirical information about the input variables. Although
the results are conservative with respect to the professed ignorance about input
distributions, they do not appear to be hyperconservative (Burmaster and Harris,
1993) to the degree observed in worst cases analyses. Probability bounds analysis
yields final results whose supports are identical in breadth to those that would be
obtained by a Monte Carlo analysis. While they can strongly disagree about the
magnitude of the tail probabilities, truly impossible outcomes are excluded by
both methods. If, as many analysts suggest, there are commonly only a handful of
variables whose uncertainties are of sufficient breadth to strongly influence the
uncertainty of the answer (e.g., Cullen and Frey, 1999; Helton 1994), then it is
perhaps even less likely that this approach could compound conservativisms into
hyperconservativism.
3.6 A NUMERICAL EXAMPLE
To illustrate the use of probability bounds in risk assessments, the bounds on an
arithmetic expression involving random variables whose distributions are
incompletely specified can be computed. For instance, consider the sum of four
random variables
A + B + C + D
18
where the random variables are characterized by the following limited empirical
information:
A = {shape=lognormal, µ=[0.5, 0.6], =[0.05, 0.1]},
B = {min=0, max=0.4, mode=0.3},
C = {min=0, max=1, data=(0.2, 0.5, 0.6, 0.7, 0.75, 0.8)}, and
D = uniform(0,1).
The distribution of A is known to be lognormal in shape, but its parameters are
only given within intervals. The distribution shape for the random variable B is
unknown, and only a few its parameters can be specified. The quantity C is
characterized by a small collection of sample values. The random variable D, on
the other hand, is known to be well specified by a uniform distribution over the
unit interval. The p-boxes derived from these respective bodies of empirical
information are displayed in black in Figure 9. The corresponding precise
distribution functions that would be used in a traditional Monte Carlo simulation
are shown in gray on the same graphs. The p-box for the random variable A is the
envelope of all lognormal distribution functions (truncated to a convenient finite
range) having a mean in the interval [0.5, 0.6] and a standard deviation in the
interval [0.05, 0.1]. The model for this quantity that would be used in a Monte
Carlo analysis is the lognormal distribution with mean 0.55 and standard
deviation 0.75. The p-box for B is the envelope of all unimodal distributions over
the range [0, 0.4] having a mode at 0.3. The Monte Carlo model of this random
variable is the triangular distribution with minimum 0, mode 0.3, and maximum
0.4. The random variable C is modeled with a precise empirical distribution
function of the six values or with the Kolmogorov-Smirnov confidence limits
around this distribution function. The p-box and the precise distribution are
indistinguishable for the random number D. These four p-boxes constructed in
disparate ways are to be ‘added’ together. In this calculation, the true underlying
distributions for these four variables are assumed independent (although this
assumption could be omitted).
0 1
0
0.5
1
0.5
0 1
0
0.5
1
0.5
0 1
0
0.5
1
0.5
0
1
0
0.5
1
0.5
Cum. Prob.
abcd
ABC D
0 1
0
0.5
1
0.5
0 1
0
0.5
1
0.5
0 1
0
0.5
1
0.5
0
1
0
0.5
1
0.5
Cum. Prob.
abcd
ABC D
Figure 9. P-boxes (black) and precise distributions (gray) for four
inputs
19
The result of convolving these four inputs together is displayed in Figure 10. The
distribution of the sum A+B+C+D that results from Monte Carlo simulation
combining the precise input distributions is shown in gray as a cumulative
distribution function. The p-box which bounds the uncertainty about this
distribution is shown in black. The mean of the true distribution of the sum must
lie somewhere in the interval [1.35, 2.34]. The most important information from
a risk analyst’s perspective is likely to be the bounds on the tail probabilities.
Whatever the true inputs are, the distribution of the result has a minimum that is
no smaller than about 0.5, and a maximum that is no larger than about 3. This
small example shows that fully accounting for the uncertainty in input
distributions yields a very different picture with respect to the tails of the
distribution. In particular, the probability of extreme values is seen to be
potentially much greater than would be predicted by a one-dimensional Monte
Carlo analysis, although it is not possible to conclude that they necessarily will
be. The uncertainty in these probabilities is a direct consequence of the
uncertainty in the input distributions.
0
0.5
1
0 0.5 1 1.5 2 2.5
3
3.5
A + B + C +
D
Cumulative Probability
Figure 10. Bounds on the distribution of the sum of four random
numbers whose distributions are described by probability bounds.
The gray curve is the distribution that would be obtained had the
uncertainty about the input probability distributions been ignored.
20
3.7 ANOTHER NUMERICAL EXAMPLE
The previous example assumed the four input variables, A, B, C, and D, to be
independent of one another. This is a strong assumption, which analysts would
sometimes like to relax. In the following hypothetical case, no assumption about
the dependencies among the variables is made. Figure 11 depicts four p-boxes of
various shapes for a hypothetical exposure equation of the form Exposure =
Concentration * Intake Rate / Body Weight. The graphs in this figure are
expressed in terms of complementary cumulative probability (one minus
cumulative probability) which emphasizes the exceedance risk of a variable being
larger than a given value. The leftmost p-box is for the concentration variable.
Modeling the variable with the p-box assumes that the complementary cumulative
distribution function for the variable Concentration, whatever it is, lies within the
region circumscribed by the p-box. The p-box for the variable Body Weight is
equivalent to a precise probability distribution. The p-box for the variable Intake
rate asserts that the random variable is entirely within the interval [2, 2.5], but no
other claims about the distribution of this variable are made.
It seems likely that the variables Body Weight and Intake Rate are statistically
dependent on one another because they are biologically related. It may be the
case, however, that a lack of specific paired data about these values prevents a
complete scientific understanding of the exact nature and magnitude of that
dependence. If receptors can detect concentrations by smell, and olfactory
maturation varies with receptor age and therefore body weight, then the variables
Concentration and Body Weight could be stochastically dependent too. In such a
situation, the analyst may wish to relax independence assumptions, or perhaps not
make any assumption at all about dependence. Using probability bounds analysis,
one can still make reliable quantitative conclusions about the distribution of
exposures. The p-boxes are propagated through the risk expression without
making any dependence assumptions by accounting for all possible dependencies
that could exist among the variables. The calculation produces bounds on the
resulting exposure distribution, which are depicted in the graph on the far right of
Figure 11. The estimate of exposure is also a p-box, and it reflects the overall
uncertainty about the quantity. If some of the variables were thought to be
independent of one another, the p-box for Exposure would be somewhat tighter.
21
0 10 20
0
0.5
1
50 60 70 80 90
0
0.5
1
0 1 2 3
0
0.5
1
0
0.2 0.4 0.6 0.8
0
0.5
1
Concentration Body weightIntake rate
Exposure
Exceedance probability
0 10 20
0
0.5
1
0 10 20
0
0.5
1
50 60 70 80 90
0
0.5
1
50 60 70 80 90
0
0.5
1
0 1 2 3
0
0.5
1
0 1 2 3
0
0.5
1
0
0.2 0.4 0.6 0.8
0
0.5
1
0
0.2 0.4 0.6 0.8
0
0.5
1
Concentration Body weightIntake rate
Exposure
Exceedance probability
Figure 11. Three input variables for a hypothetical exposure
equation, and the probability bounds around the output.
The p-box for exposure is known to be rigorous in the sense that it contains all
distributions of exposure that could possibly result from combining distributions
for concentration, intake rate and body weight, so long as they lie within their
respective p-boxes (Frank et al., 1987; Williamson and Downs, 1990). The p-box
for exposure is also known to be best-possible or optimal in the sense that the
bounds could not be any tighter and still contain all such resulting distributions
(ibid.). Like any calculation, the guarantees of the answer are contingent on the
assumptions. Here, the p-box is sure to contain the complementary cumulative
distribution function of exposure so long as the distribution for concentration is
within the displayed p-box on the left, body weight has the displayed distribution,
the value or values of intake rate are surely within the depicted interval, and the
exposure is actually related to the other variables in the way described by the
equation.
Because probability bounds analysis does not require the analyst to assume
independence when it is not warranted or to specify the precise shapes of input
distributions when they are difficult to estimate, probability bounds analysis can
be complementary to analogous Monte Carlo methods. Used in conjunction with
Monte Carlo simulation, it provides a separate analysis of the importance of
uncertainty in the results, and assesses the reliability of the Monte Carlo results.
4. PROBABILITY BOUNDS ANALYSIS COMPARED TO
MONTE CARLO SIMULATION
This section is intended to make probability bounds analysis easily accessible to
the user of Monte Carlo simulation by comparing the two methods side by side. It
begins with a brief discussion of the concepts of uncertainty and variability.
Then, the construction of input variables for Monte Carlo simulation and
probability bounds analysis of risk models, are compared, highlighting the
treatment of parameter uncertainty and probability distribution uncertainty. The
22
treatment of dependencies between variables in a model by Monte Carlo and
probability bounds analysts is compared next, followed by a discussion of model
uncertainty, problems associated with assigning minima and maxima to
distributions with infinite tails, and discretization error. The implementation of
microexposure event modeling in a probability bounds analysis is then discussed
and compared to the Monte Carlo simulation approach. Finally, Monte Carlo
analysis and probability bounds analysis strategies for sensitivity analysis are
compared and the employment of probability bounds analysis within the
framework of the tiered approach to incorporating probabilistic risk assessments
into site risk assessments is discussed.
4.1 UNCERTAINTY AND VARIABILITY
EPA guidance (EPA, 2001) distinguishes between variability and uncertainty.
Variability (also called randomness, aleatory uncertainty, objective uncertainty, or
irreducible uncertainty) arises from natural stochasticity, environmental variation
across space or through time, genetic heterogeneity among individuals, and other
sources of randomness. Variability in a parameter can exist between individuals
within a population, across populations, and within an individual over time. Body
weight, for example, varies between individuals within a population, across
populations, and within a single individual over its lifetime.
Uncertainty (also called incertitude, epistemic uncertainty, subjective uncertainty,
or reducible uncertainty) arises from incomplete knowledge about the world.
Sources of uncertainty include measurement uncertainty (called measurement
error in older texts), small sample sizes, detection limits and data censoring,
ignorance about the details of the mechanisms and processes involved, and other
imperfections in scientific understanding.
Variability and uncertainty are fundamentally different. Uncertainty can in
principle be reduced by focused empirical effort. Although variability can often
be better characterized by further specific study, it is not generally reducible by
empirical effort. Variability can be translated into risk (i.e., probability of some
adverse consequence) by the application of an appropriate probabilistic model.
The result of applying the model is a characterization of risk, usually as the
relationship between the magnitude of some adverse effect and its probability or
frequency of occurrence. Uncertainty cannot be translated into probability in the
same way, at least without appeal to a subjectivist interpretation of probability,
which is usually considered inappropriate for regulatory purposes. However, it
can be used to generate bounds on the risk assessments. These bounds illustrate
the range within which the risk distribution might vary given the uncertainty
present in the risk model.
23
4.2 CONSTRUCTION OF INPUT VARIABLES
4.2.1 Point estimates and their uncertainty
In a Monte Carlo simulation, an analyst might use either a precise point (perhaps
a constant, a measurement of an input variable, or the mean of a set of
observations), or a probability distribution. Representing a parameter as a precise
point implies either that the underlying variable exhibits no variability, or that the
analyst has reason to believe that any variability that exists is unimportant to the
analysis. For a known constant, such as the number of days in a year or the
number of milligrams per gram, a precise point is appropriate, and a probability
bounds analyst uses a precise point to represent the parameter as well. When the
precise point must be determined using an imprecise measurement or a set of
observations, however, the exact value to be used for the quantity is uncertain.
A Monte Carlo analyst might employ a two-dimensional (second-order) Monte
Carlo simulation that includes an uncertainty loop for an uncertain point estimate.
A two-dimensional Monte Carlo simulation is a nesting of two ordinary Monte
Carlo simulations. By nesting one Monte Carlo simulation within another,
analysts can discover how variability and uncertainty interact in the calculation of
the risk distribution. Typically, the inner simulation represents natural variability
of the underlying physical and biological processes, while the outer simulation
represents the analyst’s uncertainty about the particular parameters that should be
used to specify inputs to the inner simulation. This structure means that each
iterate in the outer simulation entails an entire Monte Carlo simulation, which can
lead to a very large computational burden. The approach is similar to a sensitivity
study, except that the combinations to be studied are determined not directly by
the analyst, but as realizations from statistical distributions on the values of the
parameters. Two-dimensional Monte Carlo methods for use in risk analysis were
championed by Hoffman and Hammonds (1994) and Helton (1994), among many
others. Cullen and Frey (1999) give a good introduction to the techniques of the
method, which has roots in Fisher (1957) and Good (1965).
In a second-order Monte Carlo simulation, one might represent an uncertain
number that is known to vary within some range with a uniform distribution in the
outer loop that draws a possible value for use as a fixed constant in each
simulation in the inner loop. In a probability bounds analysis, the same interval of
possible values used in the Monte Carlo analyst’s uncertainty loop replaces the
point estimate, however the semi-analytic nature of the probability bounds
analysis results in an exact representation of the stated uncertainty. As the
number of times the inner loop is called in the Monte Carlo simulation approaches
infinity, the result of the Monte Carlo analysis converges on the probability
bounds result.
24
For example, consider the simple additive operation C = A + B, where C
represents the results of a risk calculation, and A and B represent inputs. Suppose
that B is a triangular probability distribution specified by minimum = 1, maximum
= 3, and mode = 2, and A = 1.5 is the parameter for which the analyst wishes to
include uncertainty. Suppose further that the analyst believes that the actual value
of A must lie in the interval a = [1, 2], where values inside the square brackets
represent the interval’s smallest and largest possible values. The Monte Carlo
analyst samples at random from interval a using an uncertainty loop, and
calculates C for each instantiation of A = a. Fluctuation in C due to using
multiple instantiations of A represents the effect on the risk calculation of
uncertainty regarding the quantity A. The probability bounds analyst replaces A =
1.5 with A = [1,2] and analytically calculates the bounds on C for all possible
values of A. Figure 12 compares these approaches. The gray lines represent
distributions for C from ten randomly drawn instantiations of A using the Monte
Carlo analyst’s approach. The black lines show the p-box that bounds C for all
values of A in the interval [1,2]. As the number of times the Monte Carlo
simulation randomly samples from a and calculates C increases, the results of the
Monte Carlo simulation converge on the result from the probability bounds
analysis by filling up the p-box. Given a large number of Monte Carlo iterations,
the two approaches produce the same answer.
0
0.5
1
22.533.544.555.566.57
C
Cumulative Probability
Monte Carlo vs. Probability Bounds
Uncertainty around a Single Point
probability bounds
Monte Carlo distributions
Figure 12. Comparison of results of Monte Carlo simulation and
probability bounds analysis for uncertainty regarding the exact value
of a point estimate.
25
4.2.2 Probability distributions
Variability in the value of a random variable can be represented in a Monte Carlo
simulation using a probability distribution, in cases where strong assumptions
regarding distributional shape are warranted. These distributions may be
parametric, such as normal, lognormal, triangular, etc., or they may be empirical
distributions based on data. For example, distributions of human body weight
come from very large samples and are sufficiently well known that a precisely
specified distribution may reasonably be used to represent that variable in a
model. The simulation samples at random from this distribution to propagate
variability through the model and retain its effects in the result. The probability
bounds analyst may also use a probability distribution for an input variable. As
the number of iterations in the Monte Carlo simulation increase, and as the
discretization levels (see Section 3.5) of the probability bounds increase, the
results of the Monte Carlo simulation and the probability bounds analysis
converge. Although Monte Carlo techniques for the numerical calculation of
probability distributions allow the control of error in the sense that it approaches
zero as the number of points used to represent the distribution function
approaches infinity, they do not allow the calculation of rigorous bounds on the
error in any given calculation. In contrast, probability bounds analysis provides
lower and upper bounds between which the probability distributions must surely
lie, no matter what discretization is employed.
4.2.3 Probability distributions with uncertain parameters
The Monte Carlo analyst often may posit a known probability distribution, but be
unsure about the precise values to be used as parameters (e.g. minimum,
maximum, mean, variance, etc.). Uncertainty regarding the parameters of a
known distribution represents uncertainty regarding constraints on variability. A
two-dimensional Monte Carlo analysis would propagate parameter uncertainty
through the model with an outer uncertainty loop, sampling at random from an
interval of possible values for each uncertain parameter. (Some Monte Carlo
analysts might substitute a probability distribution representing uncertainty in a
parameter’s value for the interval of possible values, however this both adds
significantly to the complexity of the model, and mixes a frequentist interpretation
of probability for the underlying variable with a subjectivist interpretation for the
parameters of the distribution.) A probability bounds analyst would substitute the
same intervals of possible variable values for the parameter values. As the
number of iterations in the Monte Carlo simulation increase, the result of the
Monte Carlo simulation using an uncertainty loop to vary the parameters used in
the precise probability distribution converge on the result of the probability
bounds analysis using intervals to represent the uncertain parameters.
26
As an example, consider a case where C = A × B, where A and B are independent,
A is distributed normally with a mean of 20 and a variance of 1, and B is
distributed triangularly with uncertain parameters b
1
= [0.5,1.5], b
2
= [2.5,3.5],
and b
3
= [4.5,5.5]. The two-dimensional Monte Carlo simulation samples at
random from each of the intervals, b
1
, b
2
, and b
3
, to instantiate the variable B in
each iteration of the outer loop. The probability bounds analysis finds the
analytical solution equivalent to a Monte Carlo uncertainty loop with infinite
iterations. Figure 13 compares the results of the two methods, with 10 Monte
Carlo realizations of variable B.
0
0.5
1
2 22 42 62 82 102 122 142
C
Cumulative Probability
Monte Carlo vs. Probability Bounds
Uncertainty around parameter values
probability bounds
Monte Carlo distributions
Figure 13. Comparison of results of Monte Carlo simulation and
probability bounds analysis for uncertainty regarding the exact value
of the parameters of a probability distribution.
4.2.4 Uncertainty regarding the choice of probability distribution
Often, a Monte Carlo analyst is aware of important variability in an input
variable, but is unable to specify a precise distribution (either parametric or
empirical). Some constraints on the distribution (e.g., minimum, maximum,
mean, variance, etc.) may be available to partially specify the unknown true
distribution of the variable, but no general method is available to the Monte Carlo
27
modeler to comprehensively
3
represent uncertainty regarding the distribution of
variability. A common strategy is to try a few parametric distributions (e.g.,
normal, lognormal, etc.), parameterized and constrained with the available
information. The results of modeling runs using these different parametric
distributions are then compared and the sensitivity of the model results to the
choice of distributional shape roughly assayed. Because the number of
distributions meeting the known constraints is huge compared to the few that are
arbitrarily chosen, this approach is not comprehensive. When precise
distributions are unknown but information that constrains the potential
distribution of a variable are available, this information can be used to generate a
p-box that comprehensively encloses all possible probability distributions that
satisfy the known constraints. This p-box represents uncertainty regarding the
shape of variability and can be used to propagate this uncertainty through to the
model result and thereby determine the importance of the lack of information
regarding the distribution of stochastic variability in a particular variable.
Probability bounds methods for constructing p-boxes that bound all possible
distributions satisfying a set of precisely known constraints are discussed in
Section 3.1.
4.2.5 Uncertainty about variability
Uncertainty regarding parameter values of a precisely specified probability
distribution is referred to as uncertainty about constraints on variability;
uncertainty regarding the distributional form given precisely known constraints,
including parameters, is referred to as uncertainty regarding the shape of
variability. These two cases generalize to a third, where the analyst is uncertain
both about the shape of the probability distribution, and about the exact values of
the known constraints, which is the general case of uncertainty about variability.
The Monte Carlo analyst could approach this situation by specifying a few
candidate distributions and parameterizing them using an uncertainty loop,
however the result would not be comprehensive and the level of effort necessary
to perform the computations and interpret the results would be high. A
probability bounds analyst uses intervals for the known constraints and constructs
a p-box comprehensively bounding all probability distributions that satisfy all
possible ranges of all constraints. Although it appears that this would confound
the different types of uncertainty with one another and with variability, the
sensitivity of the overall model of risk to each type of uncertainty can be isolated,
as discussed in Section 4.8 below. Probability bounds methods constructing p-
3
The maximum entropy criterion (Jaynes, 1957; Lee and Wright, 1994) can be used to select a
particular distribution from the class of distributions meeting given constraints, but the
maximum entropy solution cannot be said to fully represent the uncertainty.
28
boxes that bound all possible distributions satisfying a set of constraints, which
are themselves uncertain, are discussed in Section 3.2.
4.3 DEPENDENCIES
This section compares the Monte Carlo analyst’s approach to accounting for
dependencies between variables in a model to approaches used in probability
bounds analysis. To model correlations or, more generally, dependencies among
random variables using Monte Carlo simulation, the analyst must specify the
correlation coefficient and the functional form of the interdependence among the
variables. Often, few or no data regarding the dependency structure between
variables are available. The Monte Carlo analyses thus often assume strict
independence between all variables, not because this is likely, but because of a
lack of relevant data required to parameterize a more realistic model. In
Section 4.3.1, the importance of accounting for dependencies is discussed. Next,
Monte Carlo simulation strategies for dealing with dependencies are outlined in
Section 4.3.2, followed in Section 4.3.3 by a description of how a probability
bounds analysis accounts for dependencies.
4.3.1 Correlation and dependence among variables
Although almost all software packages for Monte Carlo simulation will treat input
variables as statistically independent of one another unless specifically instructed
otherwise, this default behavior is generally not appropriate for environmental or
ecological risk assessments. Nevertheless, when analysts do not have empirical
information about the statistical dependencies among the input variables,
simulations usually assume independence as a matter of convenience. This
approach is not recommended and one cannot easily predict the effect it has on
the results (Ferson, 1994). There are essentially two obstacles that complicate the
handling of correlations and dependencies in Monte Carlo simulation. The first is
the potential complexity of dependencies and the second is that empirical
information is usually lacking.
Linear correlations and monotonic relationships are not the only possible kinds of
dependencies among variables in a Monte Carlo analysis (Hart et al., 2003; Haas,
1999). Numerous biological phenomena can produce complex dependencies that,
although perhaps empirically subtle, can still have a substantial impact on the
risks being estimated (Ferson, 1994). For instance, there may be differences
between males and females of a species in terms of their reactions to an
environmental contaminant. In laboratory toxicity studies, different regression
lines are common for males and females. When data are pooled without regard to
gender, a positive relation can emerge from what, in each sex, is a negative
29
relation. In other cases, similar complexity can arise when the mechanism of
toxicity changes as a function of environmental conditions.
Although independence implies zero correlation, the converse is not true.
Observing a correlation to be zero does not imply independence between the
variables, and pairwise independence does not imply multivariate independence.
In fact, there may be dependencies that are important to the analysis that are
easily overlooked by the analyst. In general, assuring independence requires a
compelling mechanistic argument or considerable empirical evidence. In most
risk assessments, empirical information is extremely sparse, and typically cannot
by itself justify an independence assumption or, for that matter, a particular
precise correlation coefficient.
Smith et al. (1994) found that under certain conditions it may be reasonable to
neglect correlations if the focus of the study is the mean of the risk distribution.
However, in environmental or ecological risk assessments the focus also includes
the dispersion (variance) and the tails of the risk distribution. The estimated
dispersion can strongly depend on correlations (Langewisch and Choobineh,
2003; Ferson and Burgman, 1995; Ferson, 1994). In some cases, the dispersion
will be larger than it would be under independence; in other cases, it will be
smaller. Especially sensitive to correlations and dependencies are the tails of
distributions which represent the extreme elements in the underlying population
with, for instance, large contaminant exposures. It is the tails that are often of
prime concern in risk analyses because they correspond to the individuals most
likely to experience adverse effects. Peer review of ecological risk assessment
guidance from EPA (Risk Assessment Forum, 1994, page 36) identified the
evaluation of the effect of the independence assumption on the propagation of
errors as one of three primary research needs under the topic of uncertainty in
ecological risk assessment. While there is certainly a need for future research in
this area, it is already clear that neglecting a correlation or more complex
statistical dependency can invalidate a Monte Carlo simulation. Failing to
account for these patterns can result in substantial under- or overestimates of
variance and tail probabilities.
4.3.2 Monte Carlo simulation strategies to account for
dependencies
There are several strategies a Monte Carlo analyst can use in an environmental or
ecological risk analysis to account for knowledge and uncertainty about
correlations and dependencies. These include assuming independence, functional
modeling, simulating observed correlations, assuming perfect covariance, and
assuming linear dependency. Apart from assuming independence, which was
discussed in Section 4.3.1, each of these is discussed in this section in order to
30
contrast the Monte Carlo analyst’s choice of approaches to accounting for
dependencies with that of the probability bounds analyst.
Functional modeling. Morgan and Henrion (1990) argue that the best way to
account for dependencies between variables is to build a mechanistic or at least
statistical model of the functional relationship between the variables that gives
rise to the dependence naturally. For instance, if both body mass and skin surface
area are involved in some assessment, then the analyst should develop a model
that explains how the anatomical constraints arising from morphological
development determine the relationship between surface area and mass. This
allows the risk assessment to be expressed in such terms that all fundamentally
stochastic driver variables are independent of one another. This is, of course, a
purist’s approach that may not always be entirely reasonable in practical analyses
if it requires modeling beyond the immediate scope of the assessment.
Simulating observed correlations. When the correlations among variables can be
estimated empirically, standard techniques can be used to simulate the
dependence relationships in a Monte Carlo analysis (Cullen and Frey, 1999).
Scheuer and Stoller (1962) describe a numerical method to generate normal
deviates with a specified matrix of (Pearson product-moment) correlations in the
general multivariate case (i.e., more than two variables). Iman and Conover
(1982) give an ad hoc but perhaps robust technique for simulating deviates from
distributions of more general shapes and (Spearman) rank correlation structure.
Nelsen (1986, 1987) gives methods to simulate bivariate deviates from
distributions having arbitrary marginal shapes and arbitrary rank correlation
measured with either Spearman’s rho or Kendall’s tau. Nelsen’s method is based
on the mathematical notion of copula (the function that joins together marginal
distributions to form a joint distribution function) and therefore should have a
straightforward multivariate generalization. Clemen and Reilly (1999) give
another copula-based approach that uses the normal family of copulas. When the
marginal distribution shapes are normal, the correlation can be specified with the
Pearson coefficient, but because copulas can be freely wedded to arbitrary
marginals, the approach immediately applies to all other distribution shapes too.
The resulting correlations are no longer the specified Pearson’s coefficients, but
the transformation leaves rank correlations unchanged. Lurie and Goldberg
(1997) describe an iterative approach for obtaining a desired pattern of
correlations matching specified marginal distributions. It is essentially a trial-
and-error approach. Cario and Nelson (1997) describe another very general
analytical approach to the problem and spell out how it can be applied in the
multivariate case. Whatever simulation technique is employed, the Monte Carlo
analyst must ensure that the matrix of planned correlation coefficients is feasible
(positive semi-definite). The analyst should also check that the realized
dependency patterns are consistent with the input specifications. Although these
methods for simulating correlations are often very useful, they are not flexible
enough to account for all types of dependence. In general, other strategies may be
31
needed, either because the relationship is intrinsically nonlinear or because the
available empirical information is insufficient to justify a particular structure of
correlation coefficients.
Assuming perfect covariance. In some cases, assuming variables are perfectly
correlated may be a better default strategy than assuming they are independent.
For instance, suppose the variables are body mass and body surface area and the
population under study includes a wide variety of individuals. Presuming the
variables covary perfectly is probably somewhat better than assuming they are
statistically independent which is manifestly false. In other cases, correlation may
be perfect but opposite in sign. This often happens, for example, with reciprocal
losses and gains, as well as with quantities that are constrained to add to a fixed
sum. When perfect correlation is negative, knowing that one variable is large tells
us that the other variable is surely small. It is generally easy to simulate perfect
correlation in a Monte Carlo analysis (Bratley et al., 1983; Whitt, 1976; Fréchet,
1951, 1935; Hoeffding, 1940). Saying that variables perfectly covary in this way
means that knowing that one variable is large with respect to its statistical
distribution implies the other variable is surely large to the same degree with
respect to its own statistical distribution. This suggests the relationship Y = G
1
(F(X)) where F and G are the cumulative distribution functions for the random
variables X and Y respectively. If the correlation is negative, the relationship is
just Y = G
–1
(1 – F(X)). Perfect covariance implies the quantities have extremal
correlations, either the maximum or minimum for the particular marginal
distribution shapes. Note that perfect covariance is different from merely
assuming that Y is completely dependent on X, which is a more general situation.
Linear dependency. In some situations, it may be reasonable to assume that some
or all of the statistical relationships among the variables are linear and do not
harbor cryptic nonlinearities. Even with such an assumption of linearity, the
strengths of the correlations may be unknown, or the correlation coefficients may
be known only to within an interval. In these cases, dispersive Monte Carlo
sampling (Bukowski et al., 1995; Ferson, 1994; Bratley et al., 1983; Whitt, 1976)
can be used to maximize the variance of the risk output. There is no mathematical
guarantee that the result of this strategy will be conservative given the uncertainty
about the correlations. However, in many situations the maximally dispersed
result is likely to be an appropriately conservative estimate.
4.3.3 Probability bounds analysis approaches to dependencies
The probability bounds analyst can account for dependencies between variables in
the same manner as the Monte Carlo analyst. Probability bounds analysis allows
for the assumption of independence between variables, and it gives results
identical to those achieved with Monte Carlo simulation, as the number of Monte
Carlo iterations approaches infinity. Probability bounds can be computed for
32
known correlation coefficients. Perfect positive and perfect negative
dependencies can be modeled as well. As in the case of independence, each of
these can give the same results as the analogous Monte Carlo simulation method
for accounting for dependencies, as the number of Monte Carlo iterations grows
large. Additionally, however, probability bounds may be calculated that allow for
(1) any possible dependency structure between variables, (2) any possible positive
(or negative) relationship, and (3) precisely specified copulas that fully
characterize the statistical dependence.
In cases where very little information is available regarding dependencies
between variables, Monte Carlo analysts often assume independence, which is a
very strict, and often misleading assumption (discussed in Section 4.3.1). When
empirical information is lacking so that the analyst cannot be confident about the
nature of the correlations or dependencies among the variables, it may be useful to
compute bounds on the risk result without making any assumption at all about
one, some, or all of the dependencies. This approach is more comprehensive than
the assumption of strict independence because the uncertainty regarding the
relationships between variables is retained and propagated through the analysis to
the estimate of risk. These bounds are, and are sure to enclose the result, no
matter what correlation or nonlinear dependency may exist between or among any
of the variables and are often also mathematically best-possible in the sense of
being as tight as possible given the stated information (Frank et al., 1987;
Williamson and Downs, 1990; Ferson and Long, 1995). Note that the resulting
bounds on the answer are not equivalent to those obtained by merely computing
the result under all possible correlation coefficients from 1 to +1; they are
slightly wider because they include nonlinear dependencies. The strategy is
flexible enough to model independence among some variables while making no
assumption about the dependencies among others.
An analyst may include uncertainty regarding dependencies in a probability
bounds analysis that also includes uncertainty from other sources, or the analyst
may conduct a separate assessment of the effect of uncertainty regarding
dependencies on the calculated risk distribution. Such a “dependency bounds
analysis” may be used to relax the assumptions of independence made in a
parallel Monte Carlo analysis, and explore risks under other dependency
assumptions. This is a sensitivity analysis that considers any and all possible
dependencies that may exist between the variables and propagates them through
the calculations. The results are rigorous bounds encompassing the set of risk
distributions that could result from exposure via ingestion without making any
assumption at all about the dependence among the variables. When all variables
are assumed to be independent of one another, dependency bounds analysis gives
exactly the same risk distribution as a Monte Carlo simulation (within
discretization error). This confirms consistency between the two computational
approaches. However, the results of a dependency bounds analysis are generally
bounds on a cumulative distribution function, rather than an approximation of one
33
such as might be given by Monte Carlo simulation. Thus, this approach will be
typically most useful as a final, uncertainty analysis step that explores how the
Monte Carlo simulation result might vary as a result of the unknown
dependencies.
4.4 MODEL UNCERTAINTY
“Model uncertainty” refers to incertitude about the correct form that the model
should take, and may be thought of as a series of questions, e.g.:
Are the mathematical expressions right?
Are the dependencies and interactions among physical components
reasonably and accurately represented?
Are the submodels appropriate for the situation and do they mesh
together coherently?
The model in a risk assessment includes all the structural decisions made by the
analyst that govern how the parameters interact. Each of these decisions is, in
principle, subject to some measure of doubt. Model uncertainty is about whether
or not those parameters are combined together in the right way. In most cases, it
is a form of incertitude or epistemic uncertainty because the analyst is unsure
whether the constructions are reasonable and complete.
Model uncertainty is distinguished from parametric uncertainty, which is the
uncertainty about the value(s) of a particular constant or variable. Risk analysts
have many computational tools available to them to assess the consequences of
parametric uncertainty. But analyses consist of statements about both parameter
values and the relationships that tie the parameters together. These relationships
are expressed in models. This means that model uncertainty could be just as
important, or sometimes even more important, than parametric uncertainty.
Despite this, almost all risk analyses and, indeed, statistical analyses in general
neglect this source of uncertainty entirely. Ignoring model uncertainty could lead
to over-confident inferences and decisions that are more risky than one thinks
they are. A risk analyst who constructs a single model for use in an assessment
and then uses it to make forecasts is behaving as though the chosen model is
actually correct. Draper (1995) argued that model uncertainty should be taken
very seriously in computing forecasts and calculating parameter estimates.
There are essentially three ways to treat model uncertainty in a risk assessment.
The first is re-express it as parametric uncertainty and apply Monte Carlo or
equivalent methods of probability theory as though the uncertainty were just
about a parameter. Morgan and Henrion (1990, page 67f; Cullen and Frey, 1999,
page 37ff) asserted that it is inappropriate to assign probabilities to models, but
34
then proceeded to do an equivalent thing by defining a meta-parameter whose
value represents one of the possible models which they treat like any other
parameter and assign a probability distribution. This approach clearly requires a
subjective interpretation of probability. Finkel (1995), however, argued
convincingly that averaging together incompatible theories does not generally
yield sensible results. Insofar as summary statistics are abstracted from the final
distributions resulting from an analysis, this approach inescapably averages
incompatible theories about the model structure and therefore itself has
questionable validity.
The second way to handle model uncertainty is to conduct a sensitivity analysis
that explores the effect of each combination of modeling decisions that might be
in contention. Of course, there may potentially be many such decisions, and the
sensitivity analysis can consequently become combinatorially complex and
cumbersome. In practice, such sensitivity analyses rarely study more than a
handful of factors. The third way to treat model uncertainty is to bound the
results that arise from an entire space of models representing the possible models
given the recognized uncertainty. This space may be infinite-dimensional,
however, and how to actually identify the bounding cases is not always obvious.
Although this approach may not always be achievable in practice, when it is, it
provides a useful way to characterize model uncertainty that is usually
computationally tractable.
A traditional Monte Carlo analysis might handle model uncertainty by creating a
new parameter, say m, to represent which model to use (e.g., Apostolakis, 1995;
Morgan and Henrion, 1990; cf. Cullen and Frey, 1999). If there are two possible
models, this parameter would be represented as a Bernoulli random variable
taking on both possible values with probability determined by the relative
likelihoods that either model is the right one. If this probability is unknown, the
traditional approach is to assume both models are equiprobable. If there are
several possible models, then the parameter m would be a more general discrete
variable, whose values would again be equiprobable unless the relative
probabilities of the different models were known. Finally, if there are infinitely
many models possible, but they can be parameterized in a single-dimensional
family, then a continuous version of the parameter m can be used. In all cases,
values for this variable are randomly generated in a Monte Carlo simulation.
Which model is used is determined by the random value. Typically, the model
selection would happen in the outer loop of a two-dimensional simulation (in
which the inner loop simulated variability), but this is not essential. The result of
the Monte Carlo simulation depends then on a randomly varying model structure.
This approach requires that the analyst know, and be able to enumerate or at least
continuously parameterize, all the possible models.
The Bayesian approach to handling model uncertainty, which is called Bayesian
model averaging (Raftery et al., 1997; Hoeting et al., 1999), has essential
35
similarities to the Monte Carlo approach, and it will typically produce similar if
not identical results. Until very recently, analysts chose a single model and then
acted as though it had generated the data. Bayesian model averaging recognizes
that conditioning on a single selected model ignores model uncertainty, and
therefore can lead to underestimation of uncertainty in forecasts. The Bayesian
strategy to overcome the problem involves averaging over all possible models
when making inferences about quantities of interest. Draper (1995) suggested
employing standard techniques of data analysis, but when a good model is found,
embedding it in a richer family of models. By assigning prior probabilities for the
parameters of this family of model and treating model selection like other
Bayesian parameter estimation problems, this approach produces a weighted
average of the predictive distributions from each model, where the weights are
given by the posterior probabilities for each model. By averaging over many
different competing models, this approach incorporates model uncertainty into
conclusions about parameters and predictions. In practice, however, this
approach is often not computationally feasible because it can be difficult to
enumerate all possible models for problems with a large number of variables.
However, a variety of methods for implementing the approach for specific kinds
of statistical models have been developed. The approach has been applied to
many classes of statistical models including several kinds of regression models
(Hoeting et al., 1999).
The Monte Carlo strategy to account for model uncertainty and Bayesian model
averaging are similar in that they both use what is essentially a mixture of the
competing models. Aside from the technical burden of parameterizing the space
of possible models and assigning a probability to each, there is a far greater
problem with the approach that these strategies use. In representing model
uncertainty as a stochastic mixture of the possible models, this approach
effectively averages together incompatible theories (Finkel, 1995). It is
equivalent in this respect to the approach to modeling what is fundamentally
incertitude as an equiprobable stochastic mixture (the uniform distribution). This
approach is due originally to Laplace, but when it is applied in risk analysis to the
study of distributions (rather than estimating point values), it can underestimate
4
the true tail risks in an assessment. The potential results are distributions that no
theories for any of the models would consider reasonable.
The probability bounds analyst might instead use an envelope of the models rather
than an average or mixture of models. Because model uncertainty typically has
the form of doubt about which of a series of possible models is actually the right
4
Some probabilists maintain that one can use stochastic mixtures to represent model uncertainty
and that this does not average alternative models so long as the results are presented properly. It
is hard to see how this is a tenable position if the probabilities in the output of a quantitative risk
analysis are interpreted as frequencies.
36
one, such an approach would propagate precisely this doubt through subsequent
calculations. An enveloping approach would clearly be more comprehensive than
the traditional approach based on model averaging. Note that it would even be
able to handle non-stationarity of distributions, which is another important source
of uncertainty that is usually ignored in traditional assessments for lack of a
reasonable strategy to address it. Unlike the Monte Carlo and Bayesian model
averaging strategies, an enveloping approach will work even if the list of possible
models cannot be enumerated or parameterized. So long as the regions in any
output or intermediate variables that depend on the choice of the model can be
bounded, the uncertainty about the model can be represented and propagated in a
comprehensive way.
There are several examples of how a probability bounds analyst can bound a class
of models without enumerating or parameterizing them. The manifestations of
model uncertainty are extremely numerous, but there are some particular forms
for which useful bounding approaches have been developed. These include
uncertainty about distribution family, dependence, choice among specific
competing theories, and choice among unknown theories when consequences are
bounded. The relevant strategies for handling the first two of these are discussed
above (Sections 4.2.4, 4.2.5, and 4.3). Each of the remaining situations is
discussed below in turn.
An accounting for model uncertainty can be done by enumeration if it can be
narrowed to a controversy among a finite list of specific competing theories. In a
variety of cases, there are two or a few models that have appeared in the scientific
literature as descriptions of an incompletely understood phenomenon. In some
cases these models are extreme cases of a continuum that captures the possible
ways that the phenomenon could work. In other cases, the models are the
idiosyncratic products of individual scientists, and the resulting collection of
models cannot be claimed to be comprehensive in any sense. In either case, so
long as the model uncertainty in question is about only these specific models, then
surveying each model and enveloping the results they produce will suffice to
construct the bounds needed for analysis. If the number of competing models is
small, a null aggregation strategy that studies each possible model in turn might
even be workable.
Even in situations where the possible models cannot be listed because there are
infinitely many of them, it may still be possible to bound the consequences of
model choice. Trivial cases include models that influence a probability value. It
is known, for example, that the probability is constrained to the interval [0,1], no
matter what the model is. Nontrivial cases depend on specific knowledge of the
biology or chemistry of the system under study. Note that it is often possible to
bound the results in such situations, even though a mixture distribution could not
possibly be formed.
37
Although there are several important forms of model uncertainty that are
amenable to a complete assessment by enumeration or bounding, there are, of
course, other forms of model uncertainties that remain difficult to address in
comprehensive way, such as choice of what parameters to use and choice about
the level of abstraction and depth of detail to incorporate into the model. For such
uncertainties, the family of possible models may be infinite-dimensional and a
probability bounds analyst may lack any natural way to bound the parameters that
depend on the model selection.
4.5 TRUNCATION
There are two truncation issues relevant to probabilistic analyses. Firstly, input
variables such as body weight, exposure duration, concentration, and so on are all
finite. There is no such thing as an infinite body size, and all random variables
relevant to real-world risk analyses come from bounded distributions. Monte
Carlo analysts often use distributions with infinite tails to represent some of these
finite distributions, however. Although their calculations are truncated, by
computer hardware, the random number generator, and the algorithm chosen to
sample from the distribution, such truncation is often operationally negligible, and
the limits of the generated distributions are more a function of the number of
iterations chosen for the simulation than computational constraints. Because there
are no physical variables with infinite ranges, clearly some truncation is in order,
particularly given that the choice of truncation can affect the results over a wide
range of percentiles besides the most extreme (Ferson et al., 2003). The
probability bounds analyst chooses the truncation of each variable with explicit
real-world limitations in mind.
The second truncation issue relevant to probabilistic analysis occurs when
empirical information regarding the limits of a distribution is lacking, or when
modeling constraints require a variable to be truncated. For example, the modeled
number of years individuals are exposed may be a regulated requirement, even
though observed exposure durations may sometimes exceed that figure. Or, as
another example, a decision may be made to use the largest observed
concentration in a medium as the largest possible concentration, when
concentrations are known to be declining and risk is being projected into the
future. The Monte Carlo modeler truncates the distribution at the required value,
often using methods that preserve the observed central tendency and variance.
The probability bounds analyst does the same. Sensitivity analysis may be
employed to examine the effect of the truncation choices.
38
4.6 DISCRETIZATION ERROR
Discretization error in probability bounds analysis is analogous to the
representation error in a Monte Carlo simulation arising from the use of a finite
number of replications. Although Monte Carlo analysis neglects this error,
probability bounds analysis strictly accounts for discretization error, which is the
uncertainty due to using a finite number of probability levels in representing a
continuous distribution. The way it does this is depicted in Figure 14. In the
example on the left, there are six discretization levels used to characterize a
cumulative probability distribution (gray curve). The upper bound is a stair-step
function that exactly touches the distribution at each of five points. Because
distribution functions are monotonic, this upper bound is certain to be larger than
any distribution that agrees with this distribution at these 5 points. The lower
bound is another stair-step function that also exactly matches the underlying
distribution at the 5 points, but is sure to be below any distribution through the
same 5 points. Given that it is impossible to represent an arbitrary continuous
function exactly on a computer that has only finite memory resources, some loss
of detail is unavoidable. It is useful to manage this loss of detail in a way that
permits rigorous inference about the calculations that are made with computer
representations of distribution functions. This outward-directed discretization
consisting of upper and lower bounds is the most conservative representation of
the given distribution given only the range and 5 internal points. The computer
representation of the distribution between the 6 points is not represented as a
straight line or any other interpolation scheme, but is instead conservatively
bounded. This allows probability bounds analysis to scrupulously account for its
uncertainty in representing the distribution no matter how many discretization
levels are used.
The area of the region between the upper and lower stair-step functions represents
an upper bound on the discretization error. As the number of discretization levels
is increased, the discretization error decreases. The example on the right side of
Figure 14 depicts the discretization of a similar distribution using 12
discretization levels. The discretization error can be made arbitrarily small by
using more levels. In the current version of RAMAS Risk Calc, there were 100
discretization levels used. As discussed in Section 3.5, using 100 discretization
levels to represent a variable results in 10,000 levels in the Cartesian cross-
product formed to compute a convolution.
39
0
0.5
1
8 1012141618
x
Cumulative Probability
Figure 14 Accounting for discretization error. P-boxes (in black) are
discretizations (at 6 levels on the left, and 12 levels on the right) of
underlying probability distributions (shown as gray curves). The
area within each p-box represents an upper bound on discretization
error.
4.7 MICROEXPOSURE EVENT MODELING
Microexposure event modeling (Price et al. 1996) avoids the problem of assigning
a single “lifetime average” value to variables for which different values are
realized many times over an individual’s lifetime. In a traditional (one-
dimensional) Monte Carlo simulation of exposure, for example, an individual
would be assigned a randomly selected value from the probability distribution
representing each variable. These random values are combined and scaled to
compute a long-term exposure as though each value were fixed over the whole
life of the individual. This is equivalent to assuming that an individual receives
exactly the same exposure each day for an entire lifetime. It is sometimes more
realistic to assume that the size, duration, or intensity of the exposures varies
across exposure events such as a meals of contaminated food, visits to a
contaminated site or days of residential exposure, and that the number of such
events per year varies over a lifetime. This is simulated in a microexposure
model by nesting computational loops representing each time scale: exposure
event, year, and lifetime. The increased realism of microexposure event modeling
removes the possibility that some individual will be simulated who receives the
maximum exposure of the most contaminated medium at every exposure event for
an entire lifetime. On the other hand, in a microexposure model, simulating each
exposure event for each year of an individual’s life may overemphasize the
40
central tendency of the input distributions, eliminating the possibility in the
simulation that some individuals may in fact, because of personal idiosyncrasies,
experience larger-than-average exposures consistently over a year or even
throughout an entire lifetime.
The summations needed for a microexposure model may be directly computable
with p-boxes as an additive convolution of the inputs. This approach might be
used in calculating exposure during childhood, for example, where the exposure
duration is constrained to be a few years. The analyst would need to decide
whether these additions should be computed under the assumption of
independence, under perfect dependence, or without any assumption about
dependence (Fréchet case). Assuming independence would be saying, for
instance, that one meal is not statistically related in any way to any other meal,
that every meal is a distinct and random draw from the underlying statistical
distribution(s). This issue of dependence is considered further in Section 4.7.1
below.
Once the choice about dependence has been made, it is straightforward using
probability bounds analysis to compute the sum of a fixed number of random
values represented by distributions or p-boxes. When the number of additions
becomes large, this straightforward approach can suffer from unnecessary
inflation of the discretization error (see Section Error! Reference source not
found.) which accumulates with each addition. Fortunately, there is an
alternative strategy that avoids this problem. As the number of loops increases
within a nested microexposure simulation, the effective contribution of the input
variable approaches the mean of its distribution. (This is the reason that
microexposure modeling discounts the probability of extreme events.) When the
number of exposure events is very large, simply using the mean of each nested
input variable directly provides the same answer as calculating the microexposure
simulation. Thus, an analyst can replace summations in the risk expression using
the simple rule
)(
1
YEXY
X
i
i
where E denotes the expectation operator (which gives the average of a
distribution) so long as X is large. This is just a consequence of the weak law of
large numbers, and it flows from the assumption that the values Y
i
sampled from
their distribution are independent of each other. Without this independence
assumption, a different rule would be implied. In practice, “large” usually means
more than 20 or 30, unless the distribution for Y is strongly skewed in which case
more iterations would be needed for convergence. Notice that the use of a central
estimate (UCL) for the concentration term is an example of this approach of using
41
the mean as the integration of many samples from the concentration distribution
(EPA, 2001, Appendix C).
What happens when X, the number of terms to be added, is itself a random
variable? The direct computation of the integration in this case would require a
complex calculation which involves forming convolutions of Y of various
numbers of terms, and forming a stochastic mixture of the resulting distributions
with weights corresponding to the probability masses for each integral value of X
for that number of terms. This “convolution-mixture” can be well approximated
by
)max(
11
)max(
X
i
i
X
i
i
Y
X
X
Y
where max(X) denotes the largest possible value of the random variable X. If that
value is large, then we can again employ the asymptotic formula
)()max(
)max(
1
Y
EXY
X
i
i
and cancel terms to obtain
)(
1
YEXY
X
i
i
where the multiplicative convolution is performed under independence. Thus we
see that the formula for an integer-valued X generalizes for the case of
distributional X’s. This result was confirmed in simulation studies, which show
that the range and mean from the convolution-mixture and the X · E(Y)
formulations match exactly. The variance of the simpler formulation was
sometimes slightly smaller than it should be, but the p-box for the resulting
integration was an excellent, tight enclosure of both the midrange and upper tail
of the distribution obtained from the convolution-mixture. The left tail of the
distribution (corresponding to small doses) is sometimes underestimated,
especially when the distribution ED is strongly positively skewed.
Consider the microexposure cancer risk expression for adults
ij
ED
j
EF
i
ij
ij
LOSSIRC
ATBW
11
1
where AT is averaging time, BW is body weight, C is concentration of the
contaminant in food, IR is intake rate during a meal, LOSS is cooking loss during
42
meal preparation, i indexes meals, j indexes years, and ED and EF are exposure
duration and frequency respectively. When the number of exposure events
(implied by ED and EF) is large the replacement rule suggests that the quantity
can also be computed as
LOSSIRCEEFEED
ATBW
1
.
This approximation is reasonable for exposure scenarios involving numerous
exposure events over many years. Because the risk assessment is primarily
concerned with large exposures that result from such scenarios, the analyst can be
confident of obtaining an appropriate result even when some or many of the
values of the ED and EF distributions are comparatively small.
When the inputs are probability distributions, the E( ) operator yields a point
value for the average of the distribution. Because the algorithms used in
probability bounds analysis are analytical, applying this method to precisely
specified input distributions would be expected to yield answers that are
indistinguishable to those from a microexposure event analysis using Monte Carlo
simulation with a large number of exposure events. Simulation studies confirm
this expectation. In the context of probability bounds analysis, these inputs
variables are p-boxes and the E( ) operator gives an interval estimate for the mean
which reflects the acknowledged uncertainty about the underlying distribution.
A correction can be made to the replacement rule to account for cases when the
largest value of the ED or EF distribution does not warrant the asymptotic
formula. For instance, the microexposure cancer risk expression for children
might be formulated as
)max(
1
)max(
1
ED
i
LOSSIRCEEF
ED
ED
ATBW
This uses the asymptotic formula for the EF loop, but not for the ED loop.
4.7.1 Dependency issues
The replacement rule YEXY
X
i
i
1
assumes independence among
successive exposure events Y
i
. In some situations it may be reasonable to make
some other assumption about the dependence among these effects. For instance,
they could have autocorrelation of one. That would be the case if high values for
one meal (or years) are invariably associated with high values for subsequent
meals (or years) and this association were perfect. If events are perfectly
43
autocorrelated, then the integration is simply the scaled sum
YX
Y
X
i
i
1
obtained in the one-dimensional Monte Carlo simulation, and microexposure
modeling degenerates into a traditional Monte Carlo simulation. There might,
however, be a more complicated dependence that arises because temporal
changes. For instance, an individual might be disposed to one pattern of behavior
early in life, but abandon it as an adult, and perhaps return to it in old age. The
possibility of such complexity could make a risk assessment very complex unless
a great deal of empirical information were available to develop an appropriate
model of this changing behavior. It is possible with probability bounds analysis
to compute bounds on the risks without such empirical information and without
special modeling. By making no assumption about the temporal dependence
between events, one can compute bounds on exposure and risk that are sure to
contain the true distributions. Williamson (1991) showed that, in the limit as X
becomes large, the replacement rule becomes
YXY
X
i
i
range
1
where
range( ) is the range of the support of the random variable.
4.8 SENSITIVITY ANALYSIS
Sensitivity analysis is the general term for quantitative study of how the inputs to
a model influence the results of the model. Sensitivity analysis has many
manifestations in probabilistic risk analysis and there are many disparate
approaches based on various measures of influence and response. Sensitivity
analyses are conducted for fundamentally two reasons: to understand the
reliability of the conclusions and inferences drawn from an assessment, and to
focus future empirical studies so that effort might be expended to improve
estimates of inputs that would lead to the most improvement in the estimates of
the outputs.
The relationship between sensitivity analysis and probability bounds analysis is
complex. The following sections emphasize three major theses:
1. Some existing methods for sensitivity analysis generalize for use in
probability bounds analysis.
2. Probability bounds analysis is itself a kind of sensitivity analysis with
considerable comprehensiveness.
3. Meta-level sensitivity analyses can be applied to probability bounds
analysis.
44
4.8.1 Generalizing sensitivity analyses for use with probability
boxes
EPA guidance discusses several methods that can be used for sensitivity analysis
in a probabilistic assessment (EPA, 2001, Section A). While some of these
methods do not have analogs in probability bounds analysis (e.g., correlation
analysis), many can be immediately generalized to work with p-boxes. For
instance, one of the most basic ways to evaluate sensitivity of an input variable is
with calculus or automatic differentiation (Fischer, 1993). For example, from the
expression
fg
deabc
x
(which is very similar in form to many of the risk expressions commonly
encountered), one can deduce the partial derivative of x with respect to c as
fg
ab
c
x
.
If the nominal values for these variables are the point estimates a = 3, b = 4, c = 5,
d = 8, e = 10, f = 2, and g = 1, one would estimate the sensitivity of x to changes
in c to be x/ c = (3 4)/(2 1) = 6. Using analogous formulas for the other
variables and the same point values, one would compute the following
sensitivities:
Variable Sensitivity
a 10
b 7.5
c 6
d 5
e 4
f 35
g 70
which suggest that the quantity x is most sensitive to changes in the variable g,
followed by variable f and variable a. When the parameters have units, it is often
desirable to express the sensitivities in a way that is insensitive to the units so that,
for instance, changing from kilograms to grams doesn’t increase sensitivity 1000-
fold. Guidance (EPA, 2001, Section A.3.2) recommends using the normalization
( x/ y)(y/x) where y is one of the seven inputs parameters. There is one important
caveat about using partial derivatives as a measure of sensitivity: the results are
local in nature. They represent the slope of how x changes in response to
45
infinitesimal changes in an input parameter. If the relevant range for a parameter
is broad, the sensitivity computed this way may begin to loose its meaning.
These calculations can be extended directly to the case in which the inputs are
uncertain numbers. The methods of Monte Carlo simulation and probability
bounds analysis can be applied directly to the formulas for the partial derivatives
to obtain distributions or p-boxes estimating the sensitivities. Suppose that the
nominal values for the variables are
a = [2.5, 3.5],
b = normal(mean=4,stdev=0.02),
c = p-box(min=4.3, max=5.2, mean=5),
d = 8,
e = [9.95, 10.05],
f = uniform(min=1.9, max=2.2), and
g = p-box(mean=1, variance=0.001),
which include two intervals, two precise probability distributions, two
distribution-free p-boxes and one point value. If these inputs are propagated
through the seven formulas for partial derivatives of the input variables, one
obtains p-boxes for each variable representing an estimate of the sensitivity of x
to changes in that variable. The ranges, means, medians and variances for these
p-boxes are given in the following table.
Sensitivity
Variable Range Mean Median Variance
a [5.34,20.0] [9.76,9.81] [ 8.77, 10.6] [0.11,1.58]
b [3.38,17.3] [6.09,8.59] [ 5.49, 9.25] [0.04,2.74]
c [3.10,13.5] [4.87,6.87] [ 4.66, 7.15] [0.02,1.48]
d [3.13,9.55] [4.85,4.93] [ 4.64, 5.13] [0.027,0.261]
e [2.51,7.60] [3.90,3.92] [ 3.73, 4.08] [0.017,0.165]
f [ 201, 6.68] [ 41.1, 28.1] [ 41.8, 26.3] [2,204]
g [ 306, 22.9] [ 84.4, 57.7] [ 81.9, 56.6] [12,864]
The means, medians and variances are given as intervals because the sensitivity
estimates are p-boxes. The calculations assumed mutual independence among all
the inputs, but this assumption is not essential and could be relaxed to reflect
empirical evidence. These results also reveal the relative importance of g and f
compared to the other variables. Note, however, that it will no longer generally
be possible to strictly rank the sensitivities. Because they can overlap one
another, it is not possible to define an ordering
5
for p-boxes. For instance, in the
5
An ordering could, however, be defined with reference to some scalar characteristic of the p-
boxes such as the midpoint, upper bound on the 95
th
percentile, largest possible value, etc.
46
point sensitivity analysis, the quantity seemed to be more sensitive to c than to d.
When uncertainty and variability are taken into account, this difference is not so
clear as the sensitivities overlap quite broadly. The caveat about computing
sensitivities as partial derivatives also applies when they are used with p-boxes.
The results reflect sensitivities to local changes. If the range of an uncertain
number is broad, the estimates may be hard to interpret. One benefit of explicitly
accounting for uncertainty and variability is that the true imprecision of the
resulting sensitivities and their ranks becomes self-evident.
Another of the sensitivity analysis method that can easily be generalized to handle
p-boxes is the computation of the percentage contribution from pathways to the
total exposure or risk (EPA, 2001, Section A.2.1.1). This method estimates the
sensitivity of an assessment result to a particular contaminant exposure pathway.
For instance, an estimate of a percent contribution for a drinking pathway might
be
100 (HI
drinking
/ HI
total
)%
where HI
drinking
is the hazard index for some contaminant associated with the
exposure through the imbibition pathway and HI
total
is the hazard index cumulated
over all exposure pathways. Analogous expressions for each of the exposure
pathways in turn can be computed to obtain sensitivity measures for each
pathway. These measures can also be computed in a probability bounds analysis.
The hazard indices in the numerator and denominator are computed from p-boxes
using the methods described in the earlier sections of this attachment. The final
division between the resulting p-boxes should be done under the assumption of
positive dependence (see Section 4.3). (Assuming perfect dependence between
them could underestimate the true dispersion if they are not perfect dependent.
Assuming independence would be counterfactual since the numerator is part of
the denominator, but it would produce a result with an upper bound on the true
dispersion.) As was the case for partial derivatives, because the quotient is
estimated by p-box, it will sometimes not be possible to strictly rank the results if
the associated uncertainty implies that the sensitivities overlap.
4.8.2 Probability bounds analysis is a sensitivity analysis
Monte Carlo analysis can be viewed as a kind of sensitivity analysis itself (Helton
and Davis, 2002; Morgan and Henrion, 1990; Iman and Helton, 1985) in that it
yields a distribution describing the variability about a point estimate. This section
explains that, likewise, probability bounds analysis is also a kind of sensitivity
analysis, though at a higher level.
Many Monte Carlo simulations employ what-if sensitivity studies to explore the
possible impact on the assessment results of varying the inputs. For instance, the
47
effect of the truncation of some variable might be explored by re-running the
model with various truncation settings, and observing the effect on the risk
estimate. The effect of particular parameter and probability distribution choices,
and assumptions regarding dependencies between variables can also be examined
in this way. Model uncertainty can be probed by running simulations using
different models. However, such studies are often very difficult to conduct
because of the large number of calculations that are required. While informative,
this approach is rarely comprehensive because when there are multiple
uncertainties at issue (as there usually are), the shear factorial problem of
computing all of the possible combinations becomes prohibitive. Usually, in
practice, only a relatively tiny number of such analyses can be performed.
Probability bounds analysis can be used to automate such what-if sensitivity
studies and vastly increase their comprehensiveness (Ferson, 1994; 1996; 2001).
It can produce rigorous bounds around the risk distribution from an assessment
that enclose all the possible distributions that could actually arise given what is
known and what is not known about the model and its inputs. For this reason, it
can be used as a complementary quality assurance check on Monte Carlo
simulation (Ferson, 1995; 1997). Because it is based on the idea of bounding
rather than approximation, it provides an estimate of its own reliability (cf. Adams
and Kulisch, 1993). As outlined and illustrated in previous sections of this
attachment, probability bounds analysis can comprehensively account for possible
deviations in assessment results arising from uncertainty about
Distribution parameters (Sections 3.1 and 4.2.3),
Distribution shape or family (Sections 3.2 and 4.2.4),
Intervariable dependence (Section 4.3.3), and even
Model structure (Section 4.4).
Moreover, it can handle all of these kinds of uncertainties in a single calculation
that gives a simple and rigorous characterization of how different the result could
be given all of the professed uncertainty.
4.8.3 Sensitivity analysis on top of probability bounds analysis
As discussed in Section 4.8.1, several methods for sensitivity analysis commonly
used by risk analysts can be extended to handle probability boxes. Besides the
direct methods of estimating sensitivity such as computing partial derivatives or
correlations, there are also various inferential techniques that estimate sensitivities
by comparing the results of assessments performed under test conditions to those
from a base case. Regan et al. (2002b) describe a straightforward approach for
making such comparisons in the context of a probability bounds analysis.
One of the fundamental purposes of sensitivity studies is to learn where focusing
future empirical efforts would be most productive. This purpose requires
48
estimating the value of additional empirical information. Of course, the value of
information not yet observed cannot be measured, but it can perhaps be predicted.
One strategy to this end is to assess how much less uncertainty the calculations
would have if extra knowledge about an input were available. This might be done
by comparing the uncertainty before and after “pinching” an input, i.e., replacing
it with a value without uncertainty. Of course, one does not generally know the
correct value without uncertainty, so this replacement must be conjectural in
nature. To pinch a parameter means to hypothetically reduce its uncertainty for
the purpose of the thought-experiment. The experiment asks what would happen
if there were less uncertainty about this number. Quantifying this effect amounts
to measuring the contribution by the input to the overall uncertainty in a
calculation.
The estimate of the value of information for a parameter will depend on how
much uncertainty is present in the parameter, and how it affects the uncertainty in
this final result. The sensitivity could be computed with an expression like
%
)unc(
)unc(
1100
B
T
where B is the base value of the risk expression, T is the value of the risk
expression computed with an input pinched, and unc( ) is a measure of the
uncertainty of a p-box. The result is an estimate of the value of additional
empirical information about the input in terms of the percent reduction in
uncertainty that might be achieved in the expression when the input parameter is
replaced by a better estimate obtained from future empirical study. The pinching
can be applied to each input quantity in turn and the results used to rank the inputs
in terms of their sensitivities. (Note that these reductions will not generally add
up to 100% for all the input variables.) In principle, one could also pinch multiple
inputs simultaneously to study interactions.
There are multiple possible ways to define unc( ) to measure uncertainty. One
obvious measure is the area between the upper and lower bounds of the p-box. As
the p-box approaches a precise probability distribution where all uncertainty has
evaporated and only the natural variability remains, this area approaches zero. An
analyst might also elect to define unc( ) as some measure of dispersion or perhaps
the heaviness of the tails (Hettmansperger and Keenan, 1980) of the p-box. Using
different measure will obviously allow the analyst to address different questions
in a sensitivity analysis. If the measure of uncertainty is a scalar quantity (i.e., a
real number), then the sensitivities that come from the analysis will also be scalars
and can be strictly ordered.
There are also multiple possible ways to pinch uncertainty. Pinching in different
ways can result in strongly different estimates of the overall value of information.
49
Several strategies are possible in estimating sensitivities from comparative
assessments:
Replace an input with a point value,
Replace an input with a precise distribution function, or
Replace an input with a zero-variance interval.
Replacing a p-box with a precise probability distribution would be pinching away
the uncertainty about the distribution. Replacing a p-box or a distribution
function with a point value would be pinching away both the uncertainty and the
variability about the quantity. For inputs that are known to be variable (variance
greater than zero), such a pinching is counterfactual, but it may nevertheless be
informative. In particular, in may be especially useful in planning remediation or
cleanup goals. In some situations, it may be reasonable to replace a p-box with a
p-box shaped like an interval but prescribed to have a variance of zero. The effect
of this would be to pinch away the variability but leave uncertainty. Such a
replacement might be reasonable for p-boxes having a core (a region along the
abscissa for which the upper bound of the p-box is one and lower bound is zero).
This approach of pinching inputs and recalculating the assessment is not
unfamiliar to Monte Carlo analysts. Many routinely conduct sensitivity studies of
the proportional contribution of variability and/or uncertainty in each variable to
the overall variability and/or uncertainty in the calculated risk distribution. To
determine the effects of variability in a Monte Carlo simulation using this method,
each variable containing variability (i.e., expressed as a probability distribution) is
reduced in turn to its mean or other appropriate point value, and the simulation is
repeated. The measure of sensitivity is often the proportional effect of variability
in each variable on the model, which is computed as the variance in the risk
distribution from each of the simulations divided by the variance in the risk
distribution from the base model result. Although the general idea of pinching is
known to Monte Carlo analysts, the notions of pinching to a precise distribution
and pinching to a zero-variance interval has no analog in Monte Carlo sensitivity
analyses.
Figure 15 shows a numerical example of pinching to a precise distribution. The
top panel of the figure depicts of addition of two p-boxes a and b (assuming
independence). This is the “base case” against which the pinchings will be
compared. The area between the upper and lower bounds for the sum a+b is 2.12.
The middle panel of the figure shows the first pinching. The p-box a is replaced
with a precise probability distribution that lies entirely within the p-box. When a
distribution replaces the p-box in the addition with b (which is still the same p-
box), the result is the p-box shown at the far right on the middle panel. This p-
box has an area of about 1.14. The percentage reduction in this area compared to
that of the p-box for the sum shown on the top panel is 46.24%. This percent,
which labels the sum on the middle panel, represents the sensitivity measure for
50
pinching the variable a to a precise probability distribution. The bottom panel of
Figure 15 shows the reduction of uncertainty (area) for the sum a+b from
pinching the p-box for b to a precise distribution. Compared to the base case in
the top panel, the area is reduced by 47.17%. In this case, the potential reduction
in uncertainty from additional information about a and b are roughly the same.
0 10 20
0
1
a b a+b
0 10 20
0
1
46.24%
reduction
0
10
20
0
1
47.17%
reduction
Cumulative Probability
Cumulative Probability Cumulative Probability
0 10 20
0
1
a b a+b
0 10 20
0
1
46.24%
reduction
0
10
20
0
1
47.17%
reduction
Cumulative Probability
Cumulative Probability Cumulative Probability
Figure 15 Sensitivity analysis by pinching a p-box to a precise
distribution.
Figure 16 shows a similar set of sensitivity analyses based on pinching p-boxes to
precise distribution functions. The calculation for the base case in this figure
(shown in the top panel) was made without making any assumption about the
dependence between the variables a and b. For this reason, even though the p-
boxes for the variables a and b are just the same as were used in Figure 15, the
area of the sum grows to about 3.05. The second panel of Figure 16 depicts
pinching the p-box for the variable a to a precise distribution and its consequence
for the resulting uncertainty about the sum. The third panel likewise shows the
51
pinching for variable b. Both panels are annotated with the percent reduction in
the area of the p-box for the sum compared to the base case in the top panel. The
reduction in uncertainty from pinching the variable a in this situation is perhaps
surprisingly small. The sensitivity to changing b is more than three times greater
than that of a. The bottom panel shows the effect of pinching the dependence
from the Fréchet case of assuming nothing about dependence to assuming
independence. (The pinching could have specified any particular dependence.)
0 10 20
0
1
30.53%
0 10 20
0
1
9.1%
0 10 20
0
1
32.78%
a
b
0 10 20
0
1
a+b
Cum. Prob.Cum. Prob.Cum. Prob. Cum. Prob.
0 10 20
0
1
30.53%
0 10 20
0
1
9.1%
0 10 20
0
1
32.78%
a
b
0 10 20
0
1
a+b
Cum. Prob.Cum. Prob.Cum. Prob. Cum. Prob.
Figure 16 Sensitivity analysis for the Fréchet case without
dependence assumptions.
Figure 17 shows a third hypothetical sensitivity study. The base case in the top
panel is identical to the base case shown in Figure 15, but in this study, the p-
boxes are pinched to scalar values. The second and third panels of Figure 17
depict the additions resulting from pinching one of the addends to a point value.
The observed percentage reduction in the area of each resulting sum compared to
52
the base case is shown beside its p-box. What would the reductions in uncertainty
have been if the base calculation had not assumed independence? The pinchings
would have yielded exactly the same results, simply because dependence
assumptions have no effect when either of the addends is a point. Thus, the lower
two panels of Figure 17 would look exactly the same. However, if the base
calculation had not assumed independence, then the base uncertainty about the
sum a+b would have been slightly greater (area = 3.05, compared to 2.12 under
independence). That would make the rightmost p-box in the top panel of Figure
17 noticeably wider. Therefore the reductions in uncertainty by pinching to a
point would have been somewhat greater than they were for the independent case.
Instead of 50.4% and 52.36% reductions, pinching the variables a and b to points
under no assumption about dependence would have respectively yielded 65.54%
and 66.9% reductions in uncertainty as measured by the area within the resulting
p-boxes.
0 10 20
0
1
a b a+b
0 10 20
0
1
50.4%
reduction
0 10 20
0
1
52.36%
reduction
Cumulative Probability
Cumulative Probability Cumulative Probability
0 10 20
0
1
a b a+b
0 10 20
0
1
50.4%
reduction
0 10 20
0
1
52.36%
reduction
Cumulative Probability
Cumulative Probability Cumulative Probability
0 10 20
0
1
a b a+b
0 10 20
0
1
50.4%
reduction
0 10 20
0
1
52.36%
reduction
Cumulative Probability
Cumulative Probability Cumulative Probability
Figure 17 Sensitivity analysis by pinching a p-box to a point value.
53
4.9 PROBABILITY BOUNDS ANALYSIS WITHIN THE TIERED
APPROACH
The tiered approach is a central feature of EPA’s guidance for probabilistic risk
assessment (EPA 2001). Tier 1 uses point estimates that only qualitatively
express uncertainties. Higher tiers reflect increasing complexity and increasingly
comprehensive characterizations of variability and uncertainty about the risk
estimate. Sensitivity analyses play a focal role in all three tiers. Probability
bounds analysis was designed to be used within each of the three tiers.
In Tier 1, point estimates requiring minimal resources to achieve a basic
characterization of uncertainty are used to estimate exposures and risks. This
often involves estimating both a central tendency exposure (CTE) and a
reasonable maximum exposure (RME). In Tier 1, uncertainty in many of the
terms of the risk equations such as intake rates, exposure frequency and duration,
toxicity factors, etc., is often addressed by choosing input values that are more
likely to yield an overestimate than an underestimate of risk. For the
concentration term, the upper 95% confidence limit is usually selected for use as
the point value. Although not often articulated, performing calculations with such
non-central values is a form of interval analysis (Moore, 1966). For the most part
this is a degenerately simple analysis, but there are a few wrinkles. For instance,
although selecting large values for most quantities will lead to larger exposures
and risk, it is small values for quantities in the denominator of the risk expression
(e.g., body weight and averaging time) that lead to larger exposures and risks.
Because probability bounds analysis is a strict generalization of interval analysis,
it yields the same answers as interval analysis if provided with the same inputs.
Tier-2 assessments make use of additional information about random variables
beyond their plausible ranges. Typically, this information is available from
literature reviews and initial empirical efforts. Tier-2 assessments generally
involve one-dimensional probabilistic assessment and probabilistic sensitivity
analysis. The one-dimensional assessments in Tier 2 are typically conducted via
Monte Carlo simulation, but this is not critical. Because probability bounds
analysis is a strict generalization of probability theory, it yields the same answers
as Monte Carlo simulation if it is provided with the same inputs and assumptions.
As discussed in earlier sections of this attachment, the results of Monte Carlo
simulation with a large number of iterations converge to the same results obtained
by probability bounds analysis with a large number of discretization levels, so
long as the inputs are precise distributions and correlations. The methods are
algorithmically different but equivalent tools
6
to compute risk distributions.
6
Indeed, there are other tools available as well (Kaplan, 1981; Springer, 1979; Iman and Helton,
1985; inter alia).
54
Sensitivity analysis in Tier 2 is used to direct further specific empirical efforts in
support of Tier-3 assessment. As explained in Section 4.8, probability bounds
analysis is a probabilistic sensitivity analysis and can serve as an important part of
sensitivity analysis used in Tier 2 and emphasized in Tier 3.
In Tier 3, risk assessments typically include a full characterization of the
variability and uncertainty that attend the estimation of exposures and risks. This
accounting is quantitative and maintains a strict segregation of variability from
uncertainty so that the numerical results are maximally interpretable and
practically useful to risk managers. Because probability bounds analysis marries
the methods of probability theory and interval analysis, it can provide the
methodological underpinning for such an accounting. It does not confound
uncertainty and variability into a single distribution that tries to express both, but
rather uses a bounding strategy that captures the duality and is faithful to a purely
frequentist interpretation of probability. Because of its relative simplicity,
probability bounds analysis may be more convenient and easier to set up and
explain than a two-dimensional Monte Carlo simulation. Tier 3 often involves
more detailed information about correlations and dependencies among variables
in the risk expression. The tools of probability bounds analysis for modeling
correlations and dependencies are among the most flexible and sophisticated
methods for this purpose currently available to risk analysts.
The tiered approach is an iterative process intended to deliver reliable inferences
conveniently and with reasonable and proportional computational effort. There
are no absolute restrictions about where within the tiers information should or
should not be used. Nevertheless, it will usually be the case that assessments in
the three tiers follow broad patterns, to wit:
In Tier 1, point values and/or intervals that represent uncertain
variables are combined together according to the basic risk
expressions, dependencies are ignored, and sensitivity analysis is used
to identify parameters where empirical effort would be most useful.
In Tier 2, the intervals are tightened to more informative p-boxes that
express constraints and evidence about each random variable known
from the literature or gathered from initial local empirical study.
Independence assumptions among some variables are employed to
tighten the exposure and risk results, and dependency bounds analysis
(the Fréchet case) is used to account for a lack of information about
dependence and correlation among some variables. Again, sensitivity
analysis is used to direct future empirical effort.
In Tier 3, p-boxes describing input variables are tightened further
based on any new empirical data. More specific evidence about
intervariable cross-correlations and dependencies may be brought into
55
the analysis to tighten the results further still. This may involve
qualitative information such as the sign of the dependence or
quantitative information about the magnitude of the correlation. In
some cases, the copula fully characterizing the dependence among
variables may be specified. Model uncertainty arising from the
expressions used to estimate exposures may be considered. For
instance, microexposure event modeling or geostatistical methods such
as spatial weighting may be employed to refine the estimate or at least
to provide a broader perspective about uncertainty in the estimate.
4.10 SYNOPSIS
Like Monte Carlo simulation, probability bounds analysis is a method for
conducting probabilistic risk analyses. Indeed, the two methods share many
commonalities of purpose, underlying theory, and interpretation. They differ in
terms of their computational mechanics: Monte Carlo methods are based on
simulation with pseudo-random numbers and probability bounds analysis is based
on an analytical approach implemented by discretization. But these differences
are really mostly a matter of computational convenience and are not of
fundamental importance. The essential distinction between the two methods is in
their approaches to the problem of computing in the face of uncertainty. They
exemplify the two basic strategies that are available for this problem:
approximation and bounding. Monte Carlo is effectively a way to approximate a
probability distribution of interest. Probability bounds analysis, in contrast,
provides bounds on that distribution. This difference means that the methods
provide complementary perspectives of the quantities computed in risk
assessments. The list below reviews the main tactics employed by these two
methods to address a variety of empirical circumstances commonly encountered
in risk assessments. It summarizes the differences that have been discussed and
illustrated in this attachment.
56
What is known empirically Monte Carlo analysis Probability bounds analysis
Know only range of variable Assume uniform distribution Assume interval
Know some constraints about
random variable
Select distribution with largest
entropy from class of
distributions matching
constraints
Form envelope around class of
distributions matching
constraints
Uncertainty about
distribution family or shape
Repeat analysis for other
plausible distribution shapes
Form distribution-free p-box as
envelope of all plausible
distribution shapes
Sample data Form empirical distribution
function (EDF)
Form Kolmogorov-Smirnov
confidence limits around EDF
Variable follows known
marginal distribution
Sample from particular
distribution
Use particular distribution
Measurement uncertainty Ignore it (usually), or perform
sensitivity analysis
Express it in intervals and
incorporate it into analysis
Non-detects Replace non-detect with point
value ½DL (detection limit)
Replace non-detect with interval
[0, DL]
Know variables are
independent
Assume independence Assume (random-set)
independence
Know magnitude of
correlation
Simulate correlation from
particular (but usually
unspecified) copula
Simulate correlation if copula is
also known, or bound result
from all possible copulas with
given correlation
Know only the general sign
(+ or ) of dependence
Assume some correlation of
appropriate sign, or repeat
analysis for different
correlations
Bound result assuming only the
sign of the dependence and
specific or all possible copulas
Do not know the nature of the
dependence
Assume independence
(usually), or repeat analysis
for different correlations
Bound result for all possible
dependencies (Fréchet case)
Model uncertainty Form stochastic mixture
(vertical average) of
distribution functions
Form envelope of distribution
functions
57
5. REFERENCES
Adams, E. and U. Kulisch (eds.). 1993. Scientific Computing with Automatic
Result Verification., Mathematics in Science and Engineering, Volume 189,
Academic Press, San Diego.
Apostolakis, G.E. 1995. A commentary on model uncertainty. Proceedings of
Workshop on Model Uncertainty, A. Mosleh, N. Siu, C. Smidts, and C. Lui
(eds.), Center for Reliability Engineering, University of Maryland, College
Park, Maryland.
Berger, J.O. 1985. Statistical Decision Theory and Bayesian Analysis. Springer-
Verlag, New York.
Berleant, D. 1993. Automatically verified reasoning with both intervals and
probability density functions. Interval Computations 1993 (2): 48-70.
Berleant, D. 1996. Automatically verified arithmetic on probability distributions
and intervals. In: Applications of Interval Computations, B. Kearfott and V.
Kreinovich, eds., Kluwer Academic Publishers, 227-244.
Berleant, D. and H. Cheng. 1998. A software tool for automatically verified
operations on intervals and probability distributions. Reliable Computing 4:
71-82.
Berleant, D. and C. Goodman-Strauss. 1998. Bounding the results of arithmetic
operations on random variables of unknown dependency using intervals.
Reliable Computing 4: 147-165.
Boole, G. 1854. An Investigation of the Laws of Thought, On Which Are Founded
the Mathematical Theories of Logic and Probability. Walton and Maberly,
London.
Brainard, J. and D.E. Burmaster. 1992. Bivariate distributions for height and
weight of men and women in the United States. Risk Analysis 12: 267-275.
Bratley, P., B.L. Fox, and L.E. Schrage. 1983. Guide to Simulations. Springer-
Verlag, New York.
Bukowski, J., L. Korn, and D. Wartenberg. 1995. Correlated inputs in quantitative
risk assessment: the effects of distributional shape. Risk Analysis 15: 215-
219.
Burmaster, D.E. 1997. Lognormal distributions for skin area as a function of body
weight. Risk Analysis 18: 27-32. Also available on line at
www.alceon.com/skinarea.pdf.
Burmaster, D.E. and E.A.C. Crouch. 1997. Lognormal distributions for body
weight as a function of age for males and females in the United States, 1976-
1980. Risk Analysis 17:499-505.
Burmaster, D.E. and R.H. Harris. 1993. The magnitude of compounding
conservatisms in superfund risk assessments. Risk Analysis 13: 131-134.
Cario, M.C. and B. L. Nelson. 1997. Modeling and generating random vectors
with arbitrary marginal distributions and correlation matrix.
http://www.iems.nwu.edu/%7Enelsonb/norta4.ps
.
58
Chebyshev [Tchebichef], P. 1874. Sur les valeurs limites des integrales. Journal
de Mathematiques Pures Appliques. Ser 2, 19: 157-160.
Clemen, R. and T. Reilly. 1999. Correlations and copulas for decision and risk
analysis. Management Science 45: 208-224.
Cullen, A.C., and H.C. Frey. 1999. Probabilistic Techniques in Exposure
Assessment: A Handbook for Dealing with Variability and Uncertainty in
Models and Inputs. Plenum Press: New York.
Donald, S. and S. Ferson. 1997. Human health risks from the Visalia Pole Yard: a
quality assurance study. Report to Southern California Edison, Rosemead,
California, and the Electric Power Research Institute, Palo Alto, California.
Draper, D. 1995. Assessment and propagation of model uncertainty. Journal of
the Royal Statistical Society Series B 57: 45 97.
EPA (U. S. Environmental Protection Agency) Risk Assessment Forum. 1994.
Peer Review Workshop Report on Ecological Risk Assessment Issue Papers.
U.S. Environmental Protection Agency, Washington, D.C.
EPA (U. S. Environmental Protection Agency). 1997. Exposure Factors
Handbook Volume 1 General Factors. August 1997. National Center for
Environmental Assessment, Washington DC EPA/600/P-95/002Fa.
EPA (U. S. Environmental Protection Agency). 2001. Risk Assessment Guidance
for Superfund (RAGS), Volume III - Part A: Process for Conducting
Probabilistic Risk Assessment. EPA 540-R-02-002, Office of Emergency and
Remedial Response, U.S. Environmental Protection Agency, Washington,
D.C. Also available on line at the EPA website at
http://www.epa.gov/superfund/programs/risk/rags3a/index.htm
Feller, W. 1968. An Introduction to Probability Theory and Its Applications.
Volume 1. John Wiley & Sons, New York.
Feller, W. 1971. An Introduction to Probability Theory and Its Applications.
Volume 2. John Wiley & Sons, New York.
Ferson, S. 1994. Naive Monte Carlo methods yield dangerous underestimates of
tail probabilities. Proceedings of the High Consequence Operations Safety
Symposium, Sandia National Laboratories, SAND94-2364, J.A. Cooper (ed.),
pp. 507-514.
Ferson, S. 1995. Quality assurance for Monte Carlo risk assessments. Proceedings
of the 1995 Joint ISUMA/NAFIPS Symposium on Uncertainty Modeling and
Analysis, IEEE Computer Society Press, Los Alamitos, California, pp. 14-19.
Ferson, S. 1996. What Monte Carlo methods cannot do. Human and Ecological
Risk Assessment 2: 990–1007.
Ferson, S. 1997. Probability bounds analysis. Computing in Environmental
Resource Management. Proceedings of the Conference, A. Gertler (ed.), Air
and Waste Management Association and the U.S. Environmental Protection
Agency, Pittsburgh, Pennsylvania. pp. 669–678.
Ferson, S. 2001. Probability bounds analysis solves the problem of incomplete
specification in probabilistic risk and safety assessments. Risk-Based
Decisionmaking in Water Resources IX, Y.Y. Haimes, D.A. Moser and E.Z.
59
Stakhiv (eds.), American Society of Civil Engineers, Reston, Virginia, page
173-188.
Ferson, S. 2002. RAMAS Risk Calc 4.0 Software: Risk Assessment with Uncertain
Numbers. Lewis Publishers, Boca Raton, Florida.
Ferson, S. and M. Burgman. 1995. Correlations, dependency bounds and
extinction risks. Biological Conservation 73:101-105.
Ferson, S. and L.R. Ginzburg. 1996. Different methods are needed to propagate
ignorance and variability. Reliability Engineering and Systems Safety 54:
133–144.
Ferson, S. and T.F. Long. 1995. Conservative uncertainty propagation in
environmental risk assessments. Environmental Toxicology and Risk
Assessment, Third Volume, ASTM STP 1218, J.S. Hughes, G.R. Biddinger
and E. Mones (eds.), ASTM, Philadelphia, pp. 97–110.
Ferson et al. 2003a. Bounding uncertainty analyses. Proceedings from a
Workshop on the Application of Uncertainty Analysis to Ecological Risks of
Pesticides. A. Hart (ed.), Society for Environmental Toxicology and
Chemistry, Pensacola, Florida. [forthcoming].
Ferson, S., V. Kreinovich, L. Ginzburg, D.S. Myers, and K. Sentz. 2003b.
Constructing Probability Boxes and Dempster-Shafer Structures.
SAND2002-4015. Sandia National Laboratories, Albuquerque, NM.
Finkel, A.M. 1995. A second opinion on an environmental misdiagnosis: the risky
prescription of Breaking the Vicious Circle. New York University
Environmental Law Journal 3: 295-381.
Finley, B., D. Proctor, P. Scott, N. Harrington, D. Paustenbach, and P. Price.
1994. Recommended distributions for exposure factors frequently used in
health risk assessment. Risk Analysis 14: 533-553.
Fischer, H.-C. 1993. Automatic differentiation and applications. Scientific
Computing with Automatic Result Verification. E. Adams and U. Kulisch
(eds.), Mathematics in Science and Engineering, Volume 189, Academic
Press, San Diego.
Fisher, R.A. 1957. The underworld of probability. 6DQNK\ 18 201-210.
Frank, M.J., R.B. Nelsen, and B. Schweizer. 1987. Best-possible bounds for the
distribution of a sum—a problem of Kolmogorov. Probability Theory and
Related Fields 74: 199-211.
Fréchet, M. 1935. Généralisations du théorème des probabilités totales.
Fundamenta Mathematica 25: 379–387.
Fréchet, M. 1951. Sur les tableaux de corrélation dont les marges sont données.
Annales de l’Université de Lyon. Section A: Sciences mathematiques et
astronomie 9: 53-77.
Goldwasser, L., L. Ginzburg, and S. Ferson. 2000. Variability and measurement
error in extinction risk analysis: the northern spotted owl on the Olympic
Peninsula. Pages 169-187 in Quantitative Methods for Conservation Biology,
S. Ferson and M. Burgman (eds.), Springer-Verlag, New York.
60
Good, I.J. 1965. The Estimation of Probabilities. MIT Press, Cambridge,
Massachusetts.
Grosof, B.N. 1986. An inequality paradigm for probabilistic knowledge: the logic
of conditional probability intervals. Uncertainty in Artificial Intelligence.
L.N. Kanal and J.F. Lemmer (eds.), Elsevier Science Publishers, Amsterdam.
Haas, C.N. 1997. Importance of distributional form in characterizing inputs to
Monte Carlo risk assessments. Risk Analysis 17: 107-113.
Haas, C.N. 1999. On modeling correlated random variables in risk assessment.
Risk Analysis 19: 1205-1214.
Hailperin, T. 1986. Boole’s Logic and Probability. North-Holland, Amsterdam.
Hart, A., S. Ferson, and J. Shaw. 2003 forthcoming. Problem formulation for
probabilistic ecological risk assessments. Proceedings from Workshop on the
Application of Uncertainty Analysis to Ecological Risks of Pesticides.
SETAC Press, Pensacola, Florida.
Helton, J.C. 1994. Treatment of uncertainty in performance assessments for
complex systems. Risk Analysis 14: 483-511.
Helton, J.C. and F.J. Davis. 2002. Latin Hypercube Sampling and the
Propagation of Uncertainty in Analyses of Complex Systems. SAND2001-
0417, Sandia National Laboratories, Albuquerque, New Mexico.
Hettmansperger, T. and M. Keenan. 1980. Tailweight, statistical inference, and
families of distributions a brief survey. Statistical Distributions in Scientific
Work, Volume 1, G.P. Patil et al. (eds.), Kluwer, Boston.
Hoeffding, W. 1940. Masstabinvariante Korrelationstheorie. Schriften des
Matematischen Instituts und des Instituts für Angewandte Mathematik der
Universität Berlin 5 (Heft 3): 179-233 [translated in Hoeffding, W. 1940.
Scale-invariant correlation theory. Collected works of Wassily Hoeffding,
N.I. Fisher and P.K. Sen (eds.), Springer-Verlag, New York].
Hoeting, J., D. Madigan, A. Raftery and C.T. Volinsky. 1999. Bayesian model
averaging: a tutorial (with discussion). Statistical Science 14: 382-417. A
corrected version can be found at
http://www.stat.washington.edu/www/research/online/hoetingl999.pdf.
Hoffman, F.O. and J.S. Hammonds. 1994. Propagation of uncertainty in risk
assessments: The need to distinguish between uncertainty due to lack of
knowledge and uncertainty due to variability. Risk Analysis 14: 707-712.
Iman, R.L. and W.J. Conover. 1982. A distribution-free approach to inducing rank
correlation among input variables. Communications in Statistics B11: 311.
Iman, R.L. and J.C. Helton. 1985. A Comparison of Uncertainty and Sensitivity
Analysis Techniques for Computer Models. NUREG/CR-3904, SAND84-
1461. U.S. Nuclear Regulatory Commission, Washington, DC.
Insua, D.R. and F. Ruggeri (eds.). 2000. Robust Bayesian Analysis. Lecture Notes
in Statistics, Volume 152. Springer-Verlag, New York.
Jaynes, E.T. 1957. Information theory and statistical mechanics. Physical Review
106: 620-630.
61
Kaplan, S. 1981. On the method of discrete probability distributions—application
to seismic risk assessment. Risk Analysis 1:189-198.
Kyburg, H.E., Jr. 1998. “Interval Valued Probabilities,” available on line at
http://ippserv.rug.ac.be/documentation/interval_prob/interval_prob.html in an
electronic collection, part of the Imprecise Probabilities Project
(http://ippserv.rug.ac.be/home/ipp.html), edited by G. de Cooman and P.
Walley.
Lambert, J.H., N.C. Matalas, C.W. Ling, Y.Y. Haimes, and D. Li. 1994. Selection
of probability distributions in characterizing risk of extreme events. Risk
Analysis 14: 731-742.
Langewisch, A.T. and F.F. Choobineh. 2003. Mean and variance bounds and
propagation for ill-specified random variables. (unpublished manuscript).
Lee, R.C. and W.E. Wright. 1994. Development of human exposure-factor
distributions using maximum-entropy inference. Journal of Exposure
Analysis and Environmental Epidemiology 4: 329-341.
Loui, R.P. 1986. Interval based decisions for reasoning systems. Uncertainty in
Artificial Intelligence. L.N. Kanal and J.F. Lemmer (eds.), Elsevier Science
Publishers, Amsterdam.
Lurie, P.M., M.S. Goldberg. 1997. A method for simulating correlated random
variables from partially specified distributions. Institute for Defense Analysis
IDA paper P-2998, Alexandria, VA.
Madan, D.P. and J.C. Owings. 1988. Decision theory with complex uncertainties.
Synthese 75: 25-44.
Markov [Markoff], A. 1886. Sur une question de maximum et de minimum
proposée par M. Tchebycheff. Acta Mathematica 9: 57-70.
Moore, R.E. 1966. Interval Analysis. Prentice Hall, Englewood Cliffs, New
Jersey.
Morgan, M.G. and M. Henrion. 1990. Uncertainty: A Guide to Dealing with
Uncertainty in Quantitative Risk and Policy Analysis. Cambridge University
Press, New York.
Nelsen, R.B. 1986. Properties of a one-parameter family of bivariate distributions
with specified marginals. Communications in Statistics (Theory and Methods)
A15:3277-3285.
Nelsen, R.B. 1987. Discrete bivariate distributions with given marginals and
correlation. Communications in Statistics (Simulation and Computation)
B16:199-208.
Neumaier, A. 1990. Interval Methods for Systems of Equations. Cambridge
University Press, Cambridge.
Price, P.S., C.L. Curry, P.E. Goodrum, M.N. Gray, J.I. McCrodden, N.H.
Harrington, H. Carlson-Lynch, and R.E. Keenan. 1996. Monte Carlo
modeling of time-dependent exposures using a microexposure event
approach. Risk Analysis 16: 339–348.
62
Raftery, A.E., D. Madigan, and J.A. Hoeting. 1997. Bayesian model averaging for
linear regression models. Journal of the American Statistical Association 92:
179 191.
Regan, H.M., B.E. Sample, and S. Ferson. 2002a. Comparison of deterministic
and probabilistic calculation of ecological soil screening levels.
Environmental Toxicology and Chemistry 21: 882-890.
Regan, H.M., B.K. Hope, and S. Ferson. 2002b. Analysis and portrayal of
uncertainty in a food web exposure model. Human and Ecological Risk
Assessment 8: 1757-1777.
Scheuer, E.M. and D.S. Stoller. 1962. On the generation of normal random
vectors. Technometrics 4: 278.
Smith, A.E., P.B. Ryan and J.S. Evans. 1992. The effect of neglecting correlations
when propagating uncertainty and estimating the population distribution of
risk. Risk Analysis 12:467-474.
Smith, J.E. 1995. Generalized Chebychev inequalities: theory and applications in
decision analysis. Operations Research 43: 807-825.
Springer, M.D. 1979. The Algebra of Random Variables. Wiley, New York.
Sokal, R.R. and F.J. Rohlf. 1994. Biometry (3
rd
Edition). Freeman and Company,
New York.
Solana, V. and N.C. Lind. 1990. Two principles for data based on probabilistic
system analysis. Proceedings of ICOSSAR ’89, 5th International Conferences
on Structural Safety and Reliability. American Society of Civil Engineers,
New York.
Spencer, M., N.S. Fisher, and W.-X. Wang. 1999. Exploring the effects of
consumer-resource dynamics on contaminant bioaccumulation by aquatic
herbivores. Environmental Toxicology and Chemistry 18: 1582-1590.
Spencer, M., N.S. Fisher, W.-X. Wang, and S. Ferson. 2001. Temporal variability
and ignorance in Monte Carlo contaminant bioaccumulation models: a case
study with selenium in Mytilus edulis. Risk Analysis 21: 383-394.
Tessem, B. 1992. Interval probability propagation. International Journal of
Approximate Reasoning 7: 95-120.
Walley, P. 1991. Statistical Reasoning with Imprecise Probabilities. Chapman
and Hall, London.
Walley, P. and T.L. Fine. 1982. Towards a frequentist theory of upper and lower
probability. Annals of Statistics 10: 741-761.
Warren-Hicks, et al. 1998. Uncertainty Analysis in Ecological Risk Assessment.
W.J. Warren-Hicks and D.R.J. Moore (eds.), Society of Environmental
Toxicology and Chemistry, Pensacola, Florida.
Whitt, W. 1976. Bivariate distributions with given marginals. The Annals of
Statistics 4:1280-1289.
Williamson, R.C. 1991. An extreme limit theorem for dependency bounds of
normalized sums of random variables. Information Sciences 56: 113-141
63
Williamson, R.C. and T. Downs. 1990. Probabilistic arithmetic I: Numerical
methods for calculating convolutions and dependency bounds. International
Journal of Approximate Reasoning 4: 89-158.
Yager, R.R. 1986. Arithmetic and other operations on Dempster-Shafer structures.
International Journal of Man-machine Studies 25: 357-366.
... The analysis can be performed using the probability box (in short p-box) for modelling the distribution when only the bounds of the distribution parameters are known [9,10]. The p-box consists of the bounded cumulative distribution function rather than mass or density functions, which can support the process of decision without precise distribution parameters [11,12]. When the sample information of the distribution is very strong, a tight bound is obtained which is a very close approximation to the precise probabilistic distribution function [13]. ...
... After the sequential updating of two epistemic parameters of λ fc and λ icorr , the 95% HDI of each stage was computed using Eq. (11). The obtained lower and upper bounds from HDI of prior (existing) and posterior (updated) distribution were then used to calculate the time-variant steel weight loss and corresponding reliability index. ...
... (11). The 95% bounds from the posterior distribution of λ icorr were obtained as [0.357,1.192], ...
Article
Full-text available
This paper proposes a framework for the reliability-based remaining service life prediction of existing deteriorated structures considering both aleatory and epistemic uncertainties separately. A Bayesian probability box (p-box) is developed for modelling the epistemic uncertainty by considering the bounds of the distribution parameter, whereas the aleatory uncertainty is modelled as a precise distribution function. The framework allows to automatically update results and the bounds of estimation of remaining service life (RSL) by deploying data from regular and recurrent inspections of bridges, which are typically available in practice. The framework is applied to a real reinforced concrete bridge that has deteriorated due to corrosion of steel reinforcements to assess the capabilities of the proposed method. Results show a notable dispersion of the prediction of the RSL considering the imprecision of data which strongly highlights that the epistemic uncertainty should be considered when making decisions related to existing bridges. In fact, for the example presented, considering the epistemic uncertainty on corrosion of reinforcements could impact the probability of failure by a factor of nearly two. In addition, the updated Bayesian p-box can efficiently quantify significant reductions of epistemic uncertainty if additional information becomes available.
... It has no dependency problem. Its statistical context rejects certain input intervals mathematically, such as inversion and square root of an interval containing zero. 1 There is some attempt [25] to connect intervals in interval arithmetic to confidence interval or the equivalent so-called p-box in statistics. Because this attempt seems to rely heavily on 1) specific properties of the uncertainty distribution within the interval and/or 2) specific properties of the functions upon which the interval arithmetic is used, this attempt does not seem to be generic. ...
Preprint
Full-text available
Statistical Taylor expansion replaces the input precise variables in a conventional Taylor expansion with random variables each with known mean and deviation, to calculate the result mean and deviation. It is based on the uncorrelated uncertainty assumption: Each input variable is measured independently with fine enough statistical precision, so that their uncertainties are independent of each other. Statistical Taylor expansion reviews that the intermediate analytic expressions can no longer be regarded as independent of each other, and the result of analytic expression should be path independent. This conclusion differs fundamentally from the conventional common approach in applied mathematics to find the best execution path for a result. This paper also presents an implementation of statistical Taylor expansion called variance arithmetic, and the tests on variance arithmetic.
... There is some attempt[23] to connect intervals in interval arithmetic to confidence interval or the equivalent so-called p-box in statistics. Because this attempt seems to rely heavily on 1) specific properties of the uncertainty distribution within the interval and/or 2) specific properties of the functions upon which the interval arithmetic is used, this attempt does not seem to be generic. ...
Preprint
Full-text available
A new deterministic uncertainty-bearing floating-point arithmetic called variance arithmetic is developed to predict the result uncertainty variance for analytic functions. It is based on the central limit theorem and independence of input uncertainties. It uses Taylor expansion to predict the result value bias and uncertainty variance due to input uncertainty variances , and such prediction is precise if the input uncertainty variances are all precise. For floating-point rounding errors, it predicts the result uncertainty variance numerically within approximately 4-fold of the actual result uncertainty variance. Variance arithmetic tracks the correlations of input within analytic expression using standard statistics through Taylor expansion. Due to this dependency tracing, variance arithmetic allows no numerical execution freedom, and it has no dependency on such freedoms. Variance arithmetic rejects certain calculations such as inverting an input near zero on the mathematical ground of the divergence of the result uncertainty variance, which is also the statistical ground that either a distributional pole or a distributional zero is in the input bounding ranges statistically. Variance arithmetic has been showcased to be widely applicable, such as for analytic calculations, progressive calculation, regressive generalization , polynomial expansion, statistical sampling, and transformations.
... Berleant appelle son approche ' détermination de l'enveloppe de probabilités' (ou DEnv) [9]. Regan, [100] a prouvé que la méthode de DEnv et l'approche de produit de convolution sontéquivalentes pour des fonctions de distribution cumulées pour les réels positifs [125]. Ces deux approches sont entièrement suffisantes pour la propagation des quantités incertaines dans le cas des opérations binaires, mais elles sont insuffisantes pour d'autres opérations mathématiques complexes. ...
Thesis
Dans ce travail, le problème d'imprécision dans l'évaluation de la performance des systèmes instrumentés de sécurité est traité. Deux méthodes d'évaluation sont appliquées à un SIS. La première méthode d'évaluation utilise les arbres de défaillances, la seconde se base sur les chaînes de Markov. L'imperfection des données concerne les paramètres caractéristiques des SIS, tels que le taux de défaillance et le facteur de défaillance de cause commune. Les probabilités élémentaires sont remplacées par des modèles mathématiques permettant aux experts fiabilistes d'exprimer leur incertitude dans l'énoncé des valeurs de probabilités de défaillances et d'autres paramètres des systèmes. Nous montrons comment l'imprécision sur la valeur de taux de défaillance induit des variations particulièrement significatives sur la qualification du niveau d'intégrité de sécurité du système. Ce travail peut avoir beaucoup d'intérêt pour le décideur de connaître l'imprécision sur les paramètres de performance des systèmes. i
... Imprecise probability theory comprises probability bounds analysis, Dempster-Shafer evidence theory [32,33], random set theory [34], fuzzy set theory [35], polymorphic uncertainty theory [36], and information-gap theory [37]. To be specific, probability bound analysis (PBA) [38,39] can be explored based on a defined probability box (p-box). ...
Article
Full-text available
Uncertainties are widely present in the design and simulation of aero-engine combustion systems. Common non-probabilistic convex models are only capable of processing independent or correlated uncertainty variables, while conventional precise probabilistic sensitivity analysis based on ideal conditions also fails due to the presence of uncertainties. Given the above-described problem, an imprecise p-box sensitivity analysis method is proposed in this study in accordance with a multi-dimensional parallelepiped model, comprising independent and correlated variables in a unified framework to effectively address complex hybrid uncertainty problems where the two variables co-exist. The concepts of the correlation angle and correlation coefficient of any two parameters are defined. A multi-dimensional parallelepiped model is built as the uncertainty domain based on the marginal intervals and correlation characteristics of all parameters. The correlated variables in the initial parameter space are converted into independent variables in the affine space by introducing an affine coordinate system. Significant and minor variables are filtered out through imprecise sensitivity analysis using pinching methods based on p-box characterization. The feasibility and accuracy of the method are verified based on the analysis of the numerical example and the outlet temperature distribution factor. As indicated by the results, the coupling between the variables can be significantly characterized using a multi-dimensional parallelepiped model, and a notable difference exists in the sensitivity ranking compared with considering only the independence of the variables, in which input parameters (e.g., inlet and outlet pressure, density, and reference flow rate) are highly sensitive to changes in the outlet temperature distribution factor. Furthermore, the structural parameters of the flame cylinder exert a secondary effect.
... Helton et al. [96] pointed out that results of uncertainty propagation must always be interpreted in the context of the theory from which the uncertainty representation was derived (see Sec. 3.3.2). A consistent analysis framework when using probability theory for uncertainty representation is given by Probability Bounds Analysis (PBA) [63,64,261]. Concepts of PBA are outlined in this section. ...
... This information is particularly useful when the model outcome represents a negative outcome (or a catastrophic event). 37 The p-box of a model outcome suggests that the outcome will not be smaller (or larger) than a minimum (or maximum), which can be identified by the infimum (supremum) of the support of the UBF (LBF). The standard approach in DAM and CEA studies, that is, the (over-) reliance on using "off-the-shelf" probability distributions for characterizing uncertainties about model parameters, may potentially lead to an underestimation of the probability of observing extreme values of the model outcome. ...
Article
Full-text available
Decisions about health interventions are often made using limited evidence. Mathematical models used to inform such decisions often include uncertainty analysis to account for the effect of uncertainty in the current evidence base on decision‐relevant quantities. However, current uncertainty quantification methodologies, including probabilistic sensitivity analysis (PSA), require modelers to specify a precise probability distribution to represent the uncertainty of a model parameter. This study introduces a novel approach for representing and propagating parameter uncertainty, probability bounds analysis (PBA), where the uncertainty about the unknown probability distribution of a model parameter is expressed in terms of an interval bounded by lower and upper bounds on the unknown cumulative distribution function (p‐box) and without assuming a particular form of the distribution function. We give the formulas of the p‐boxes for common situations (given combinations of data on minimum, maximum, median, mean, or standard deviation), describe an approach to propagate p‐boxes into a black‐box mathematical model, and introduce an approach for decision‐making based on the results of PBA. We demonstrate the characteristics and utility of PBA vs PSA using two case studies. In sum, this study provides modelers with practical tools to conduct parameter uncertainty quantification given the constraints of available data and with the fewest assumptions.
... This information is particularly useful when the QoI represents a negative outcome (or a catastrophic event). [39] The p-box of a QoI suggests that a QoI will not be smaller (or larger) than a minimum (or maximum), which can be identified by the infimum (supremum) of the support of the UBF (LBF). The standard approach in DAM and CEA studies, i.e., the (over-)reliance on using "off-the-shelf" probability distributions for characterizing uncertainty about model parameters, may potentially lead to an underestimation of the probability of observing extreme values of QoI (Figure 3). ...
Preprint
Full-text available
Decisions about health interventions are often made using limited evidence. Mathematical models used to inform such decisions often include uncertainty analysis to account for the effect of uncertainty in the current evidence-base on decisional-relevant quantities. However, current uncertainty quantification methodologies require modelers to specify a precise probability distribution to represent the uncertainty of a model parameter. This study introduces a novel approach for propagating parameter uncertainty, probability bounds analysis (PBA), where the uncertainty about the unknown probability distribution of a model parameter is expressed in terms of an interval bounded by lower and upper bounds on the cumulative distribution function (p-box) and without assuming a particular form of the distribution function. We give the formulas of the p-boxes for common situations (given combinations of data on minimum, maximum, median, mean, or standard deviation), describe an approach to propagate p-boxes into a black-box mathematical model, and introduce an approach for decision-making based on the results of PBA. Then, we demonstrate an application of PBA using a case study. In sum, this study will provide modelers with tools to conduct parameter uncertainty quantification given constraints of available data with the fewest number of assumptions.
Article
Rock properties often contain epistemic uncertainties due to their subjective estimation or insufficient objective data, inhibiting their traditional uncertainty modelling via probability models. This paper proposes a hybrid reliability framework to consider the spatial variations of inputs, modelled via non-stochastic (fuzzy, intervals, p-boxes) and stochastic models based on their information levels. The proposed methodology assigns appropriate uncertainty models for inputs, subsequently generates their interval and random fields in FLAC-2D via a developed MATLAB-FLAC interface code. The random and interval field of inputs are generated via an advanced Karhunen–Loève method, based on their spatial dependency and correlation functions. The methodology is demonstrated for a mining rock slope, with inputs modelled via different uncertainty models to manifest its generalised nature. A comparative analysis is also performed with the traditional random field analysis. Present methodology propagates the input uncertainties invoked by insufficient information faithfully to estimate the intervals/fuzzy/p-boxes of probability of failure (𝑃𝑓). In contrast, the traditional method estimates a constant 𝑃𝑓 by assigning the probability models to the inputs with limited data (based on subjective judgements). Further, the estimated bound of 𝑃𝑓 was bounding its point estimate verifying the accuracy of the methodology and enabling the users for more informed decision-making.
Conference Paper
Full-text available
The unavailability of empirical information that is simultaneously relevant and complete enough to fully specify the analysis is a ubiquitous problem in probabilistic risk and safety assessments (PRAs and PSAs). There is commonly uncertainty about the distributions of input variables and their mutual dependencies. In some cases, even details about the mathematical model that should be used to combine the input variables may be in dispute. However, to obtain any answer at all from a traditional probabilistic assessment such as a Monte Carlo simulation, the assessment model, all the distribution shapes, parameters and dependence structures, and correlation coefficients must be fully specified at the start. Therefore, an analyst generally needs to make many assumptions that go considerably beyond what can be justified empirically. Naturally, these assumptions of convenience diminish the credibility of the final conclusions derived from the assessments. Probability bounds analysis allows assessors to sidestep uncertainty about the precise specifications of input variables, imperfect information about the correlation and dependency structure among the variables, and even dispute about the form of the risk model itself. The package RAMAS Risk Calc implements probability bounds analysis under the Windows operating system. The convenient interface offers a powerful array of standard functions, which are extensible by means of user-written programs. The uses of probability bounds analysis and the features of the software are described.
Article
In most examples of inference and prediction, the expression of uncertainty about unknown quantities y on the basis of known quantities x is based on a model M that formalizes assumptions about how x and y are related. M will typically have two parts: structural assumptions S, such as the form of the link function and the choice of error distribution in a generalized linear model, and parameters θ whose meaning is specific to a given choice of S. It is common in statistical theory and practice to acknowledge parametric uncertainty about θ given a particular assumed structure S; it is less common to acknowledge structural uncertainty about S itself. A widely used approach involves enlisting the aid of x to specify a plausible single ‘best’ choice S* for S, and then proceeding as if S* were known to be correct. In general this approach fails to assess and propagate structural uncertainty fully and may lead to miscalibrated uncertainty assessments about y given x. When miscalibration occurs it will often result in understatement of inferential or predictive uncertainty about y, leading to inaccurate scientific summaries and overconfident decisions that do not incorporate sufficient hedging against uncertainty. In this paper I discuss a Bayesian approach to solving this problem that has long been available in principle but is only now becoming routinely feasible, by virtue of recent computational advances, and examine its implementation in examples that involve forecasting the price of oil and estimating the chance of catastrophic failure of the US space shuttle.
Book
Self-taught mathematician and father of Boolean algebra, George Boole (1815–1864) published An Investigation of the Laws of Thought in 1854. In this highly original investigation of the fundamental laws of human reasoning, a sequel to ideas he had explored in earlier writings, Boole uses the symbolic language of mathematics to establish a method to examine the nature of the human mind using logic and the theory of probabilities. Boole considers language not just as a mode of expression, but as a system one can use to understand the human mind. In the first 12 chapters, he sets down the rules necessary to represent logic in this unique way. Then he analyses a variety of arguments and propositions of various writers from Aristotle to Spinoza. One of history's most insightful mathematicians, Boole is compelling reading for today's student of intellectual history and the science of the mind.