Numerical accuracy and efficiency in the propagation of epistemic and aleatory uncertainty
ABSTRACT The need to differentiate between epistemic and aleatory uncertainty is now well admitted by the risk analysis community. One way to do so is to model aleatory uncertainty by classical probability distributions and epistemic uncertainty by means of possibility distributions, and then propagate them by their respective calculus. The result of this propagation is a random fuzzy variable. When dealing with complex models, the computational cost of such a propa-gation quickly becomes too high. In this paper, we propose a numerical approach, the RaFu method, whose aim is to determine an optimal numerical strategy so that computational costs are reduced to their minimum, while using the theoretical framework mentioned above. We also give some means to take account of the resulting numerical error. The benefits of the RaFu method are shown by comparisons with previous methodologies.
-
Citations (0)
-
Cited In (0)
Page 1
November 17, 20099:41International Journal of General SystemsgGEN_ChojnackiAll
International Journal of General Systems
Vol. 00, No. 00, Month 200x, 1–22
RESEARCH ARTICLE
Numerical accuracy and efficiency in the propagation of epistemic
and aleatory uncertainty
Eric Chojnacki and Jean Baccou and Sébastien Destercke∗
Institut de radioprotection et de sûreté nucléaire (IRSN)
13115 St-Paul Lez Durance, France
(Received 00 Month 200x; final version received 00 Month 200x)
The need to differentiate between epistemic and aleatory uncertainty is now well admitted by
the risk analysis community. One way to do so is to model aleatory uncertainty by classical
probability distributions and epistemic uncertainty by means of possibility distributions, and
then propagate them by their respective calculus. The result of this propagation is a random
fuzzy variable. When dealing with complex models, the computational cost of such a propa-
gation quickly becomes too high. In this paper, we propose a numerical approach, the RaFu
method, whose aim is to determine an optimal numerical strategy so that computational costs
are reduced to their minimum, while using the theoretical framework mentioned above. We
also give some means to take account of the resulting numerical error. The benefits of the
RaFu method are shown by comparisons with previous methodologies.
Keywords: order statistics; epistemic uncertainty; sampling method; risk analysis; hybrid
calculus
1.Introduction
Taking uncertainties into account has become of prime importance in many indus-
trial applications. It is particularly true in safety studies, where misleading represen-
tations of uncertainties can lead to incautious and therefore potentially dangerous
decisions.
Nowadays, a large majority of uncertainty analysts uses probabilistic models to
represent uncertainties and Monte-Carlo simulations to propagate them through
a model. In such approaches, both aleatory uncertainties (i.e due to the natural
variability or randomness of an observed phenomenon) and epistemic uncertainties
(i.e. due to the imprecision or poverty of available information) are modeled by
probabilities. However, many arguments (Walley 1991, Helton and Oberkampf 2004,
Ferson and Ginzburg 1996) converge to the conclusions that classical probabilities
cannot adequately model epistemic uncertainties.
Therefore, recent works (Helton and Oberkampf 2004) have focused on method-
ologies able to handle both aleatory and epistemic uncertainties in an unified frame-
work. One such method, proposed and justified by various authors (Bardossy and
Fodor 2004, Ferson et al. 2003, Baudrit et al. 2006), consists in mixing probabilis-
tic convolution (for aleatory uncertainty) with fuzzy calculus (for epistemic un-
certainty) to model and propagate uncertainties. This theoretical approach, often
referred to as hybrid approach, is the one considered here. Recent works (Bau-
drit et al. 2008) show that such methods provides results different from classical
∗Corresponding author. Email: sdestercke@gmail.com
ISSN: 0308-1079 print/ISSN 1563-5104 online
c ? 200x Taylor & Francis
DOI: 10.1080/0308107YYxxxxxxxx
http://www.informaworld.com
Page 2
November 17, 20099:41International Journal of General SystemsgGEN_ChojnackiAll
2
2-D Monte-Carlo simulations, usually used to differentiate between aleatory and
epistemic uncertainties in classical probabilistic framework.
As propagating uncertainties with this approach often involves high computa-
tional costs, its domain of application has been limited so far to relatively simple
models. In order to apply it to fields (such as nuclear safety) where models can
be very complex and where computation costs constitute an important issue, more
efficient propagation methods are needed.
This is why we propose in this work a new numerical method to propagate uncer-
tainties with the above methodology. This method, based on sampling techniques,
intends to reduce computational costs. It also allows one to address numerical ac-
curacy issues, by using convergence results of Monte-Carlo methods and notions
of order statistics (Lecoutre and Tassi 1987, Conover 1999). The key point of the
method lies in the pre-processing of information related to the final desired result
of the propagation, rather than post-processing it (as usually suggested).
Although this paper focuses on issues regarding safety studies, and thus on the
estimation of uncertainties concerning threshold exceeding (i.e. cumulative distri-
bution and so-called survival functions), the method presented here is not confined
to such type of information.
This paper is organized as follows: Section 2 recalls theoretical bases used for rep-
resenting and propagating aleatory and epistemic uncertainties in hybrid method-
ologies. The resulting output is no longer a random variable (as in classical prob-
abilistic modelling) but a random fuzzy variable. In Section 3, we recall existing
post-processing methods that extract relevant information from the model output,
and discuss their computational cost. Section 4 introduces the proposed numerical
treatment of aleatory and epistemic uncertainties (called the RaFu method, RaFu
standing for Random/Fuzzy), that improves computational efficiency by avoiding
the construction of the whole random fuzzy variable when possible. Finally, the
RaFu method is illustrated on a simplified application in Section 5.
2.Representation and propagation of aleatory and epistemic uncertainties
In this section, we first recall basics about probability and possibility theories, the
former being used to represent aleatory uncertainty, and the latter to represent
epistemic uncertainty. Then, we explain how these two types of uncertainties are
propagated through a model into a random fuzzy variable. Since our work focuses
on the numerical treatment of hybrid-type approach (i.e. combining probability and
possibility calculi), we do not intend to deeply discuss the advantages and limits of
the two uncertainty theories. We refer to related works (Bardossy and Fodor 2004,
Ferson et al. 2003, Baudrit et al. 2006) for detailed discussions about theoretical
justifications.
2.1
Representing uncertainty
As mentioned previously, one can distinguish two main kinds of uncertainty.
Aleatory uncertainty is due to the natural variability or randomness of an ob-
served phenomenon. It can be, for instance, the variability inside a given population
(e.g. gaussian distribution to describe the weight of a given nationality, exponen-
tial distribution corresponding to time failures of some class of components) or the
variability of observed outcomes for a particular situation (e.g. dice tossing).
Epistemic uncertainty results from a lack of knowledge, of information. It can
come from systematical error (e.g. a measurement which is not fully reliable), from
Page 3
November 17, 2009 9:41International Journal of General Systems gGEN_ChojnackiAll
3
a poor quantity of data or from subjective uncertainty (e.g. an expert providing
imprecise valued quantities).
Recent works (Walley 1991, Helton and Oberkampf 2004, Ferson and Ginzburg
1996) have shown that classical probabilities tend to confuse the two kinds of un-
certainty and are not tailored to properly handle both of them. Other or more
general frameworks thus need to be developed to separately treat both uncertain-
ties. As already mentioned, we consider here that aleatory uncertainty is modeled
and propagated by using classical probability theory (Feller 1971), while epistemic
uncertainty is modeled and propagated with the help of possibility theory (Dubois
and Prade 1988).
2.1.1
Given a probability space (Ω,F,P), a probability measure P is defined as a
mapping from F to [0,1], such that P(Ω) = 1,P(∅) = 0 and for all A,B ∈ F,
P(A ∪ B) = P(A) + P(B) − P(A ∩ B). Here, we consider that F is either the
power set of Ω (when Ω is discrete) or the Borel Algebra when Ω = R, the real line,
therefore we will not mention F further on. From a probability measure P, we can
define its probability distribution function p as the mapping from the sample space
Ω (e.g. 1 to 6 in the case of a dice) to [0,1] such that for any ω ∈ Ω, p(ω) = P({ω}).
For any subset A ⊆ Ω, the probability measure is retrieved by
?
P(A) =
A
Aleatory uncertainty and probability theory
P(A) =
w∈A
p(w)dw ∀A measurable, (continuous case)
p(w) ∀A measurable, (discrete case) ,
?
and P(A) measures the likelihood of the event A
If X is a real random variable associated to P, the cumulative distribution func-
tion of X is a mapping FX: R → [0,1] defined for all x ∈ R as
FX(x) = P(X ≤ x) =
?x
−∞
p(w)dw
and which has a quasi-inverse given by F−1
[0,1], then it is well known that the random variable X = F−1
according to FX.
This means that we can simulate a random variable X by simulating an uniform
law on [0,1] and associate to each sampled value αi the corresponding element
x = F−1
X. If α is an uniform random variable on
X(α) is distributed
X(αi).
2.1.2
Imprecise knowledge about a variable having a precise value can be described
by the means of possibility theory (Dubois et al. 2000) . In particular, possibil-
ity distributions are well fitted to represent information about a variable given in
terms of nested confidence intervals (a natural way to express uncertainty about
variables, already considered by Cox (Cox 1958) and Birnbaum (Birnbaum 1961)).
A possibility distribution is defined as a mapping π : Ω → [0,1] which is here upper
semi-continuous and normalized (∃x ∈ Ω s.t. π(x) = 1). It is formally equivalent
to the fuzzy set µ(x) = π(x). Distribution π describes the more or less plausible
values of some uncertain variable X. To a possibility distribution are associated
Epistemic uncertainty and possibility theory
Page 4
November 17, 20099:41International Journal of General SystemsgGEN_ChojnackiAll
4
two measures, namely the possibility (Π) and necessity (N) measures, which read:
Π(A) = sup
x∈Aπ(x)
N(A) = inf
x?∈A(1 − π(x))
The possibility measure indicates to which extent the event A is plausible, while
the necessity measure indicates to which extent it is certain. They are dual, in the
sense that Π(A) = 1−N(A), with A the complement of A. They obey the following
axioms:
Π(A ∪ B) = max(Π(A),Π(B))
An α-cut of π is the interval [xα,xα] = {x,π(x) ≥ α}. The degree of certainty
that [xα,xα] contains the true value of X is N([xα,xα]) = 1 − α. Conversely, a
collection of nested sets Aiwith (lower) confidence levels λican be modeled as a
possibility distribution, since the α-cut of a (continuous) possibility distribution can
be understood as the probabilistic constraint P(X ∈ [xα,xα]) ≥ 1−α, thus linking
possibility distributions with imprecise probabilities (Dubois and Prade 1992, de
Cooman and Aeyels 1999). In this setting, degrees of necessity are equated to lower
probability bounds, and degrees of possibility to upper probability bounds.
As there is a one-to-one correspondence between levels α ∈ [0,1] and the cor-
responding α-cut [xα,xα], a possibility distribution can be simulated, similarly to
probability distributions, by sampling values from an uniform law on [0,1] and by
associating to each sampled value αithe corresponding α-cut [xαi,xαi].
N(A ∩ B) = min(N(A),N(B))
2.2
P-boxes
The main question of safety studies is often to know, given uncertainties on inputs,
whether or not the output value exceeds a given threshold. In a purely probabilistic
framework, if the value of this threshold is x, the uncertainty on the exceeding
of this threshold is given by the cumulative distribution function (CDF) F(x) =
P((−∞,x]).
If epistemic uncertainty is taken into account, the uncertainty over the exceeding
of a threshold is no longer precise, and is given by a pair of lower and upper
cumulative distribution functions [F,F], usually called probability boxes (Ferson
et al. 2003)1(p-boxes for short). The uncertainty on the exceeding of a threshold
x is then expressed by a pair of values [F(x),F(x)], bounding the potential values
of F(x) = P((−∞,x]). The width of the interval reflects our lack of information
concerning some input parameters.
2.3
Propagating both uncertainties into a random fuzzy variable
Hybrid numbers (i.e. random fuzzy variables) as a means to express conjointly
epistemic uncertainty and aleatory uncertainty were first proposed by Kaufmann
and Gupta (Kaufmann and Gupta 1985). Latter on, methods based on this idea were
proposed by Baudrit et al. (Baudrit et al. 2006), by Ferson and Ginzburg (Ferson
and Ginzburg 1996) and by Cooper et al. (Cooper et al. 1996).
We consider that uncertainty bearing on input variables X1,...,XN has to be
propagated through a model Y = T(X1,...,XN) with Y , the real-valued output.
1It must noted that, in the imprecise case, different sets of probabilities can be represented by the same
p-box, whereas in the precise case, one cumulative distribution corresponds to one precise probability
distribution, and inversely
Page 5
November 17, 20099:41International Journal of General Systems gGEN_ChojnackiAll
5
We consider that X1,...,Xkare random variables described by precise probabil-
ity distributions p1,...,pk, and Xk+1,...,XN are fuzzy variables (i.e. imprecisely
known variables) described by possibility distributions πk+1,...,πN, all assuming
values on the real line. Given this model, Kaufmann and Gupta originally proposed
to propagate both types of uncertainty according to their respective calculus: prob-
abilities by probabilistic convolution and possibility distributions by the means of
extension principle (Dubois et al. 2000). When variables X1,...,Xk take values
x1,...,xk, the extension principle reads, for any y ∈ R
πT(y) =sup
xk+1,...,xN,T(x1,...,xN)=y
min(πk+1(xk+1),...,πN(xN)).
(1)
This extension principle extends classical interval computation in the following
way: the distribution πT(y) can also be obtained by doing level-wise interval com-
putation (Moore 1979, Jaulin et al. 2001), since we have
[yα,yα] = T(x1,...,xk,[xα,xα]k+1,...,[xα,xα]N), ∀α ∈ [0,1]
This shows that extension principle assumes a complete correlation between α-cuts
(i.e. between confidence levels), and does not generally encompass the result of clas-
sical probabilistic convolution. There exists other extensions of interval computa-
tions (Regan et al. 2004) proposing to deal with epistemic uncertainty by the means
of imprecise probabilities. They usually provide more conservative results than the
extension principle and, when applied to complex models, present a computational
complexity even higher than the method considered here. Such extensions are not
studied here. We also assume that dependencies between probability distributions
p1,...,pkare well known, so that the joint distribution p(1:k)of ×i=1,...,kXiis well
defined.
As finding the analytical and exact solution of the propagation is impossible in
most situations, propagation is usually obtained by the following procedure:
(1) Generate Mpsamples x(1:k)i:= {x1,i,...,xk,i}, i = 1,...,Mpstemming
from the joint distribution p(1:k)of ×i=1,...,kXiby usual sampling techniques
(Monte-Carlo, LHS, MCMC, ...)
(2) For each sample x(1:k)i, i = 1,...,Mp, build a discretized approxima-
tion ˜ πT
computing (2) for a finite collection 0 ≤ α1< ... < αMπ≤ 1 of Mπα-cuts
(3) Assign a probability mass of1/Mp to each obtained distribution ˜ πT
1,...,Mp.
Values and intervals sampled from probability and possibility distributions are il-
lustrated in Figure 1. The result of the whole procedure is an hybrid number, that
is a probability distribution bearing on possibility distributions ˜ πT
alent to a random fuzzy variable. It is illustrated in Figure 2. For simplicity of
notation, we denote this random fuzzy variable, which describes our uncertainty on
Y resulting from the propagation, by (p(1:k), ˜ π)T. Note that this procedure requires
to achieve Mp×Mπinterval propagations, with the value Mπbeing usually around
20.
(2)
iof the propagated possibility distribution πT
i(see Equation (1)) by
i, i =
i, formally equiv-
3.Information extraction: existing post-processing techniques
Now that imprecision is explicitly modeled in our uncertainty representations, the
probability of an event resulting from this propagation is no longer precise, but is
Page 6
November 17, 20099:41International Journal of General SystemsgGEN_ChojnackiAll
6
0
1
FX1(x1)
x1
Variable X1
...
Aleatory uncertainty : K random variables
0
1
FXk(xk)
xk
Variable Xk
0
1
αk+1
α cut of Xk+1
Variable Xk+1
...
Epistemic uncertainty : N − K fuzzy variables
0
1
αN
α cut of XN
Variable XN
Figure 1. Sampling of random and fuzzy variables.
0
1
˜ πT
i
α1
α2
[yα1,yα1]i
Figure 2. Random fuzzy variable.
instead delimited by lower and upper bounds. As analyzing the intrinsic informa-
tion conveyed by the full random fuzzy variable is very difficult, it is necessary to
propose some way to summarize or extract the useful information from the ran-
dom fuzzy variable (p(1:k), ˜ π)T. For this reason, Ferson and Ginzburg (Ferson and
Ginzburg 1996) and Baudrit et al. (Baudrit et al. 2006) have proposed different
post-processing of (p(1:k), ˜ π)Tso that the resulting summary would be in the shape
of one or multiple p-boxes.
Denote [yα,yα]ithe α-cut of the ithfuzzy set ˜ πT
value α ∈ [0,1], we thus have a collection of Mpintervals. If we order and reindex
the Mp values yαsuch that yi
probability mass1/Mp, we can build the associated cumulative distribution function
Fαsuch that Fα(yi
an upper distribution Fα. This can be done for every value α ∈ [0,1], and since
the α-cuts of a fuzzy set are nested, we have that yi
implying that Fα(x) ≥ Fβ(x) (Fα(x) ≤ Fβ(x)) if α ≥ β. This shows how we can
extract a collection of p-boxes [Fα,Fα] from the random fuzzy variable (p(1:k), ˜ π)T
(see Figure 3).
Although the information conveyed by the collection of p-boxes [Fα,Fα] is poorer
than the information contained in the whole random fuzzy variable (information is
i, with i = 1,...,Mp. For each
α≤ yj
αiff i ≤ j, and assign to each of them a
α) =1/Mp. Upper values yjcan be treated likewise to obtain
α≥ yi
β(yi
α≤ yi
β) if α ≥ β,
Page 7
November 17, 20099:41International Journal of General Systems gGEN_ChojnackiAll
7
0
1
Fα2
Fα1
Fα1
Fα2
Figure 3.
variable of figure 2 (α1≥ α2).
Pairs of lower and upper cumulative distribution functions extracted from the random fuzzy
lost by projecting the structure on events of the type (−∞,x]), it is sufficient in
most applications encountered in safety or reliability studies.
Nevertheless, the whole collection of p-boxes [Fα,Fα] is still a complex repre-
sentation, and in order to be useful to a decision maker, it should be summarized
further. This is the objective of post-treatments recalled in the next section and
proposed by Ferson and Ginzburg (Ferson and Ginzburg 1996) and by Baudrit et
al (Baudrit et al. 2006).
3.1
Ferson’s post-treatment
Ferson proposes to fix one or multiple confidence levels α and then to build the lower
and upper cumulative distributions [Fα,Fα] associated to this (these) particular
value(s).
For example, choosing the value α = 1 and the p-box [F1,F1] corresponds to
an "optimistic" behavior regarding epistemic uncertainty, since the imprecision of
the result is minimized, while choosing the value α = 0 and the p-box [F0,F0]
corresponds to "pessimistic" behavior, imprecision being maximized in this case. All
other p-boxes [Fα,Fα] are between these pairs and represent intermediate behavior
(Figure 4).
0
1
F0
F1
Fα
F1
Fα
F0
Figure 4.
(0<α<1).
Pairs of lower and upper cumulative distribution functions associated to Ferson’s post-treatment
Note that, even in the most optimistic case, there can still remain some impre-
cision, unless the α-cut of level 1 of every possibility distribution πk+1,...,πNis a
single number (e.g. reference value, mode, ...).
3.2
Baudrit et al.’s post-treatment
In the post-processing of Baudrit et al., called "homogeneous" post-processing, only
one lower and one upper cumulative distributions, here respectively denoted Fav
and Fav, are built. The resulting p-box [Fav,Fav] corresponds to the average taken
over all p-boxes [Fα,Fα], namely:
Fav=
?1
0
FαdαFav=
?1
0
Fαdα
Page 8
November 17, 20099:41International Journal of General SystemsgGEN_ChojnackiAll
8
and the p-box [Fav,Fav] is always between the p-boxes [F0,F0] and [F1,F1] (Fig-
ure 5).
0
1
F0
F1
Fα
F1
Fα
F0
?· dα
0
1
Fav
Fav
Figure 5. Pair of lower and upper cumulative distribution functions associated to Baudrit et al.’s post-
treatment.
Both Ferson’s and Baudrit et al.’s post-treatments require to first build the whole
random fuzzy variable (as described in Section2.3). As mentioned before, this strat-
egy can be computationally costly: let us suppose that Mp= 100 samplings are done
on the k first parameters and that for each of them the corresponding fuzzy num-
ber is approximated by taking Mπ = 21 α-cuts (α = (0,0.05,...,1)). Then, 2100
interval calculations are needed to build the final result.
In many applications, assuming one can afford so much computations is unreal-
istic. This is particularly true in fields such as nuclear safety, spatial exploration or
aeronautics (Oberguggenberger et al. 2007), where very complex models are often
used. Moreover, although they both propose to deal with complex models by using
numerical discretization, neither Baudrit et al. nor Ferson consider the question of
numerical accuracy. That is why we propose in the next section a new numerical
method (called RaFu method, for Random/Fuzzy) that addresses the problem of
evaluating numerical accuracy and allows one to reduce the number of required
computation to reach a given result by pre-processing rather than post-processing
a part of the information.
4.The RaFu method
The RaFu method uses the same theoretical framework as the one considered in
Section 2. It intends to minimize the number of required computations to reach a
given response. As pointed above, building the whole random fuzzy variable can be
very costly and, in those situations where we are only interested in some specific
features of the information contained in it, unnecessary.
Briefly, given the input distributions p1,...,pk,πk+1,...,πN and a final desired
response, the method consists in sampling from these distributions in an optimized
way, so that a minimal amount of samples is propagated in order to reach the
desired response with a given numerical accuracy. The method is fully detailed in
the sequel.
Page 9
November 17, 20099:41International Journal of General SystemsgGEN_ChojnackiAll
9
F−1
X1(α1,1) ··· F−1
...
F−1
...
F−1
Xk(αk,1) [xαk+1,1,xαk+1,1]k+1 ··· [xαN,1,xαN,1]N
...
X1(α1,i) ··· F−1
...
X1(α1,M) ··· F−1
............
Xk(αk,i)
...
Xk(α1,M) [xαk+1,M,xαk+1,M]k+1··· [xαN,M,xαN,M]N
[xαk+1,i,xαk+1,i]k+1 ··· [xαN,i,xαN,i]N
.........
T
⇒
[y,y]1
...
?y,y?
?y,y?
i
...
M
[y,y]i= T(F−1
X1(α1,i),...,F−1
Xk(αk,i),[xαk+1,i,xαk+1,i]k+1,...,[xαN,i,xαN,i]N)
Figure 6. Illustration of sample matrix
4.1
Pre-processing rather than post-processing
In practical applications, the quantity that has to be evaluated is often known
before propagating. It can be the potential mean value of the output, the value of
a particular percentile or the probability of exceeding a given value. In other cases,
one has an idea about the behavior to adopt. For example, if the safety study
concerns a critical issue (e.g. transfer of dangerous polluting elements), it is natural
to adopt a very conservative and cautious attitude (i.e. in our case, an α close to
zero), while in situations where the behavior to adopt is more ambiguous, one can
choose to adopt a balanced attitude (i.e. use Baudrit et al.’s post-processing).
The key point of the RaFu method is to replace the classical post-processing step
by a pre-processing one. In this pre-processing, the decision maker (DM)1provides
a triplet of parameters (γS,γE,γA) that corresponds to the quantity he’s interested
in and the numerical precision he wants to reach.
Once these parameters (γS,γE,γA) have been specified, the RaFu method con-
sists in building an optimal sampling strategy that allows one to reach the desired
quantity with a minimal amount of calculations. This strategy corresponds to a
number M of specific samples. Each sample consists of N values, the k first being
single values sampled from p1,...,pkaccording to γS, while the N −k other values
are α-cuts (generally intervals) sampled from πk+1,...,πNaccording to parameter
γE. The result, illustrated by Figure 6, is a matrix of M samples that must then
be propagated through the model T.
4.1.1
γS is the parameter related to the aleatory uncertainty. It consists of two sub-
parameters:
• γSiconcerns the dependence structure between input variables X1,...,Xk
and determines how values are to be sampled from p1,...,pk. In other words, it
specifies the values α1,i,...,αk,i, i = 1,...,M in Figure 6.
Usual dependence structures that can be reproduced by numerical sampling and
can be specified in γSiare:
• stochastic independence between X1,...,Xk,
• rank correlations between variables (Iman and Conover 1982),
• specifications of copulas (Nelsen 2005),
• direct specification of joint distributions.
Parameter γS
1This decision maker can assume many forms, it can be a committee, an official guideline, a single indi-
vidual, ...
Page 10
November 17, 20099:41International Journal of General Systems gGEN_ChojnackiAll
10
The k first values of each sample are then sampled (or reordered after sampling)
accordingly to the specified dependence structure. Note that the computational
cost of applying the above type of dependence structures is negligible, especially
when compared to calculation time of complex models.
• γSoconcerns the stochastic quantity the DM is interested in. It provides which
information must be extracted from propagated data [y,y]i, i = 1,...,M.
In safety studies, stochastic quantities that the DM usually wants to evaluate are
typically:
• the mean and/or the variance (γSo:= {E(Y ),V (Y )}),
• the value of a given percentile (γSo:= {q%}),
• the uncertainty of exceeding a given threshold (γSo:= {F(x)}, with x the
threshold value),
• the whole cumulative distribution function (γSo:= {F(x)} ∀x).
γS thus corresponds to information used in usual sampling methods where un-
certainty is modeled entirely by classical probabilities.
4.1.2
γEis the parameter related to epistemic uncertainty and to the DM behavior with
regard to this uncertainty. It determines how α-cuts or values from πk+1,...,πN
are to be sampled.
First, we assume (without loss of generality) that Mpvalues have been sampled
from the joint distribution p(1:k)of ×i=1,...,kXi, and we denote them
x(1:k)i:= {F−1
in the RaFu method for the choice of parameter γEare, for instance:
Strategy 1 Fixed α (γE := {α}): associate to any x(1:k)ithe cuts of fixed
level α of distributions πk+1,...,πN. This comes down to consider, in
Figure 6, that αj,i= α for j = k + 1,...,N, i = 1,...,Mp. For the
DM, this choice is equivalent to adopt a given behavior with respect to
epistemic uncertainty, ranging from total pessimism (γE:= {α = 0})
to total optimism (γE:= {1}).
Strategy 2 Vector ˜ α = (α1,...,αMπ): duplicate the Mp samples x(1:k)iMπ
times, thus resulting in Mp× Mπsamples x(1:k)i,j, i = 1,...,Mp, j =
1,...,Mπ. For a fixed i, x(1:k)i,jis constructed by associating to sample
x(1:k)ithe α-cuts [xαk+1,j,xαk+1,j]k+1,...,[xαN,j,xαN,j]Nwith αk+1,j=
... = αN,j= αj, the jthelement of vector ˜ α. This is equivalent to apply
the previous strategy Mπtimes. For example, the vector of values can
be the couple total pessimism/optimism (γE := { α = (0,1)}) or
a given number of discretization steps of each distribution (γE :=
{ α = (0/n,...,i/n,...,n/n = 1)}, with n the number of steps). In
this last case, we retrieve the usual propagation method described in
Section 2.3.
Strategy 3 Partially randomized α: for each sample x(1:k)i, sample (inde-
pendently) a value αrfrom a uniform law on [0,1] and associate
to x(1:k)ithe α-cuts [xαk+1,i,xαk+1,i]k+1,...,[xαN,i,xαN,i]N such that
αk+1,i= ... = αN,i= αr. As we shall see later, this kind of sampling
allows to "average" over all α-cuts.
Strategy 4 Totallyrandomized
αr1,...,αrN−kbe N − k values sampled from independent uni-
form laws on [0,1], this strategy associates to x(1:k)ithe cuts
[xαk+1,i,xαk+1,i]k+1,...,[xαN,i,xαN,i]N with αk+1,i = αr1,...,αN,i =
αrN−k. This kind of sampling simulates the so-called notion of random
Parameter γE
X1(α1,i),...,F−1
Xk(αk,i)}, i = 1,...,Mp. Then, typical strategies used
α:foreachsample
x(1:k)i,let
Page 11
November 17, 20099:41International Journal of General SystemsgGEN_ChojnackiAll
11
set independence in imprecise probabilities. It can be interpreted
as an assumption of independence between the sources evaluating
the epistemic uncertainties (sensors, experts, ...), or, if possibility
distributions are to be interpreted as sets of probabilities (Dubois and
Prade 1992), as a means to simulate stochastic independence among
the probabilities in these sets (Couso et al. 2000). Some dependencies
can also be assumed between the uniform laws on which α-cuts
are sampled (Alvarez 2006). Nevertheless, how such dependency
structures between possibility distributions can be interpreted is still
unclear, and requires further research.
Each of the above strategies suggests different choices of {αi,j}i=k+1,...,N;j=1,...,M
in matrix of Figure 6. The choice of one of them also influences the final number
of samples to be propagated: this number will be Mpin the case of Strategies 1,3
and 4 and Mp×Mπin the case of Strategy 2 (where Mπis the number of different
α-levels chosen by the DM). The strategy selection before the propagation is one
of the main advantage of the RaFu method. In many situations, it leads to less
propagation than the usual Mp× Mπ propagations required to build the whole
random fuzzy variable (see Section 2.3).
Note that, since Monte-Carlo sampling is primarily a numerical tool allowing
to estimate complex integrals, any quantity that can be expressed in term of an
integral over p1,...,pk,πk+1,...,πN can, in principle, be estimated by the right
sampling strategy.
Remark 1: γSiand γEdefine two separate “dependence” structures respectively
related to aleatory and epistemic uncertainties. Considering dependencies between
random and fuzzy variables still remains an open question and is not considered in
this paper.
4.1.3 Parameter γA
Numerical approximation always means approximation error. One of the interest
of the Monte-Carlo sampling is that convergence theorems allow one to quantify
this error. Parameter γAis related to this numerical error and has a direct effect
on the number Mpof samples to be propagated. It can be used in two ways: either
the DM specifies a goal in term of numerical accuracy and the number of samples
required to reach this accuracy is then determined, or the DM specifies a maximal
number of samples that can be made (in accordance with available ressources), and
the reachable numerical accuracy is evaluated.
Sometimes, it is possible to determine before propagating the required number of
samples to reach a given accuracy, or the reachable accuracy with a given number
of samples. In this case, the DM can fix his final choice before anything is done.
When numerical accuracy can only be determined after the propagation, a simple
strategy consists in making a first propagation with a low number of samples, and
then to increase this number until the numerical accuracy satisfies the DM or until
the maximal number of affordable propagations is reached. In this paper, we focus
on a method evaluating numerical accuracy by the use of order statistics (Lecoutre
and Tassi 1987, Conover 1999)1, and which pertains to the cases where numerical
accuracy can be determined beforehand.
Let us note Xqthe q percentile of a random variable X. From a sample of size
N, the use of order statistics consists in considering the ordered values x(1)≤ ... ≤
x(N)drawn from the random variable X. If the N values are drawn randomly and
1Often quoted as the use of Wilk’s formula (Wilks 1962)
Page 12
November 17, 20099:41International Journal of General SystemsgGEN_ChojnackiAll
12
independently, the following equation
P(X(K)< Xq) =
N
?
i=K
?
i
N
?
qi(1 − q)N−i
(3)
holds. This is equivalent to saying that the random variable FX(X(K)) follows a
beta law of parameters K and N−K+1. The interest of this result is that FX(X(K))
does not depend of X distribution. This allows the derivation of confidence intervals
bounding a percentile with a numerical accuracy without knowing neither the values
X(i)nor X distribution. For instance, if a DM wants a conservative upper bound of
the 95% percentile that covers it with a confidence of at least 95%, then, by using
equation (3), we see that at least 59 computations will be required, since with 58
samples, P(X(58)< X95) = (0.95)58= 5.1% (i.e. a confidence of 94.9 %), while
with 59 samples, P(X(59)< X95) = (0.95)59= 4.8%.
In the above case, numerical accuracy has to be expressed as a confidence to cover
the true value with the estimation evaluated from the samples. In other situations,
the above results cannot be used and numerical accuracy cannot be evaluated be-
forehand: for example, the DM expresses the desired numerical accuracy as the
minimal width of a confidence interval bounding a statistical quantity. This statis-
tical quantity can be a percentile, but also, for example, the mean value (in this
case, MC methods converge towards the true value at rate σ/√N, where σ is the
standard deviation).
Since in the RaFu methods, each propagated sample results in an interval with
lower and upper bounds, numerical accuracy and confidence intervals have to be
given for both of them. After having integrated numerical accuracy in the process
by the means of γA, we thus end up with two confidence intervals bounding a lower
and an upper estimation of the statistical quantity defined by γSo.
Figure 7 illustrates the whole procedure by a flowchart. It shows where the DM
can act upon the values of parameters and fix them in function of the final desired
result. Propagation is then done accordingly, with the minimal number of samples
meeting DM requirements. As said before, we focus here on the case where numerical
accuracy or number of samples can be determined beforehand (i.e. Yes path in the
first diamond).
In order to illustrate our methodology, we provide in Table 1 the minimal sam-
ple size given by the RaFu method for various choices of (γS,γE,γA). As stated
previously, we focus on percentiles, since percentile is the most relevant statistical
quantity in many safety studies. The minimal sample size is derived thanks to the
use of order statistics. For example, if the DM wants to have an upper limit of the
response 95% percentile assuming stochastic independence between the k random
variables, to be hyper-cautious about epistemic uncertainty (i.e. concentrate on α-
cuts [x0,x0]) and to have a numerical certainty of 99% to cover the true value, he or
she chooses the triplet (γS,γE,γA) = (0.95/Stochastic independence,0,0.99). The
RaFu method derives the minimal sampling size to satisfy the DM’s choice, here
90, and the nature of this sampling. Eventually, if 90 calculations are too costly, the
DM can choose to lower the numerical accuracy to 95 %, thus reducing the number
of required computations to 59. These two examples are in bold in Table 1.
Page 13
November 17, 20099:41International Journal of General SystemsgGEN_ChojnackiAll
13
Inputs
p1,...,pk,
πk+1,...,πN
Choose γE,γS
Choose γA (#
samples and/or
desired num. acc.)
Num. acc. or
required #
samples can be
evaluated before
propagation?
Determine num.
acc. or # samples
corresponding to γA
Propagation with
fixed # samples
DM wants to
revise γA?
Determine
confidence int.
Propagation
DM satisfied with
obtained
accuracy?
Build Result
(according
to γSo,γE)
Increase # samples
and propagate them
yes
no
no
yes
yes
no
Figure 7. RaFu method : flowchart (# samples: number of samples).
4.2
Relations with previous post-treatments
It is interesting to note that the results of post-processing methods in (Ferson and
Ginzburg 1996, Baudrit et al. 2006) recalled in Section 3, since they are equivalent
to evaluate particular integrals over p1,...,pk, πk+1,...,πN, can be reached by
specific instances of parameter γE. We begin by the post-treatment proposed by
Baudrit et al..
Proposition 4.1: The result of the post-treatment giving [Fav,Fav] can be inter-
preted as the following choices over γS,γE:
Page 14
November 17, 20099:41International Journal of General SystemsgGEN_ChojnackiAll
14
γS
γE
# samples
γA= 95%
29
29 ×Mπ
29
29
59
59 ×Mπ
59
59
299
299 ×Mπ
299
299
γA= 90%
22
22 ×Mπ
22
22
45
45 ×Mπ
45
45
230
230 ×Mπ
230
230
γA= 99%
44
44 ×Mπ
44
44
90
90 ×Mπ
90
90
459
459 ×Mπ
459
459
90%
α
Stochastic
independence
(α1,...,αMπ)
Randomized α for each sample
Randomized α for each α-cut
α
(α1,...,αMπ)
Randomized α for each sample
Randomized α for each α-cut
α
(α1,...,αMπ)
Randomized α for each sample
Randomized α for each α-cut
95%
Stochastic
independence
99%
Stochastic
independence
Table 1.
quantity γS is a percentile.
Minimal sample size derived by the RaFu method for various choices of (γS,γE,γA). The statistical
• γS0= F(x) ,∀x (whole cumulative distribution) and γSi= Stochastic inde-
pendence between X1, ..., Xk.
• γE= randomized α for each sample.
Proof: Let us consider the model T and the lower probability on Y , P([−∞,y]) =
FY(y), associated to Baudrit et al.’s post-treatment. This lower probability corre-
sponds to the lower expectation (also called lower prevision in (Walley 1991)) of
the indicator function of the event [−∞,y]. This lower expectation is given by the
following formula:
P([−∞,y])=
1 R
κ=0
1 R
α1=0...
1 R
αk=0
I(T(F−1
X1(α1),...,F−1
Xk(αk),[xκ,xκ]k+1,...,[xκ,xκ]N)⊂[−∞,y])dκdα1...dαk
(4)
where distributions P1,...,Pk are independent (cf. γSi) and I(A) is the indica-
tor function of the event A. Note that the integration of eventual dependencies
mentioned in Section 4.1.1 can be easily done and do not modify the present result.
Performing a Monte-Carlo sampling with parameters (γS= F(x) ,∀x) and (γE=
randomized α) for each sample, propagating and then computing the associated
lower probability P([−∞,y]) is obviously equivalent to a numerical evaluation of
the integral given by equation (4). As both Baudrit et al.’s approach and the RaFu
method are discretized numerical evaluation of the same integral, they converge to
the same value.
Since this holds for all values y ∈ Y , the two resulting p-boxes will converge to
the same p-box, thus showing that the two methods converge towards the same
final result.
For the upper probability on Y , P([−∞,y]) = FY(y), associated to Baudrit et
al.’s post-treatment, the reasoning is similar, except that equation (4) becomes
P([−∞,y])=
1 R
κ=0
1 R
α1=0...
1 R
αk=0
I(T(F−1
X1(α1),...,F−1
Xk(αk),[xκ,xκ]k+1,...,[xκ,xκ]N)∩[−∞,y]?=∅)dκdα1...dαk
(5)
?
This ends the proof.
We now consider the post-treatment proposed by Ferson. We will consider that
Page 15
November 17, 20099:41International Journal of General Systems gGEN_ChojnackiAll
15
a single value α has been chosen (extension to any number of different values for α
is straightforward)
Proposition 4.2: The result of the post-treatment giving [Fκ,Fκ] can be inter-
preted as the following choices over γS,γE:
• γS= F(x) ,∀x (whole cumulative distribution) and γSi= Stochastic indepen-
dence between X1, ..., Xk.
• γE= κ.
Proof:
We can use a reasoning similar to the one used in the previous proof, except that
now the integral becomes
P([−∞,y])=
1 R
α1=0...
1 R
αk=0
I(T(F−1
X1(α1),...,F−1
Xk(αk),[xκ,xκ]k+1,...,[xκ,xκ]N)⊂[−∞,y])dα1...dαk
(6)
and, in particular,
P([−∞,y])=
1 R
α1=0...
1 R
αk=0
I(T(F−1
X1(α1),...,F−1
Xk(αk),[x0,x0]k+1,...,[x0,x0]N)⊂[−∞,y])dα1...dαk
(7)
for [F0,F0], and
P([−∞,y])=
1 R
α1=0...
1 R
αk=0
I(T(F−1
X1(α1),...,F−1
Xk(αk),[x1,x1]k+1,...,[x1,x1]N)⊂[−∞,y])dα1...dαk
(8)
for [F1,F1].
Even if the RaFu method treats aleatory and epistemic uncertainty with the same
theoretical framework as Baudrit et al.’s and Ferson’s approaches, the required
number of samples (i.e. of computations) leading to the same results can be very
different. Table 2 compares the numerical requirements of the various approaches
for the particular example given at the end of Section 3, in order to compute the
resulting p-boxes of each post-processing. This table illustrates the main advan-
tage of the proposed method versus usual post-processings: since it concentrates
exclusively on the desired final answer, it only propagates the core information
needed to reach this answer. It allows, in this situation, to divide the number of
required propagation by 20 (resp. 10) compared to Baudrit et al. (resp. Ferson) or
by a number proportional to Mπ in a more general case. From a computational
efficiency standpoint, this is an important improvement keeping in mind that in-
dustrial applications involve complex computer codes.
?
Post-processing
(with fixed γS, γA)
Baudrit et al
Usual propagation
(build the whole RFV)
# samples: 2100
Mp= 100,Mπ= 21
# samples: 2100
Mp= 100,Mπ= 21
RaFu Method
γE(Strat 3)
# samples: 100
Mp= 100
# samples: 200
Mp= 100,Mπ= 2
Ferson
γE(Strat. 2)
= {α = (0,1)}
Table 2. Comparison between classical post-processings and the RaFu method.
View other sources
Hide other sources
-
Available from Sebastien Destercke · 9 Oct 2012
-
Available from free.fr