Content uploaded by Arnald Puy

Author content

All content in this area was uploaded by Arnald Puy on Jul 25, 2021

Content may be subject to copyright.

A comprehensive comparison of total-order estimators for

global sensitivity analysis

Arnald Puy∗1,2, William Becker3, Samuele Lo Piano4, and Andrea Saltelli5

1Department of Ecology and Evolutionary Biology, M31 Guyot Hall, Princeton University, New Jersey 08544, USA. E-Mail:

apuy@princeton.edu

2Centre for the Study of the Sciences and the Humanities (SVT), University of Bergen, Parkveien 9, PB 7805, 5020 Bergen, Norway.

3European Commission, Joint Research Centre, Via Enrico Fermi, 2749, 21027 Ispra VA, Italy

4School of the Built Environment, JJ Thompson Building, University of Reading, Whiteknights Campus, Reading, RG6 6AF, United

Kingdom

5Open Evidence Research, Universitat Oberta de Catalunya (UOC), Barcelona, Spain.

Abstract1

Sensitivity analysis helps identify which model inputsconvey the most uncertainty to the model output.2

One of the most authoritative measures in global sensitivity analysis is the Sobol’ total-order index, which3

can be computed with several diﬀerent estimators. Although previous comparisons exist, it is hard to4

know which estimator performs best since the results are contingent on the benchmark setting deﬁned by5

the analyst (the sampling method, the distribution of the model inputs, the number of model runs, the test6

function or model and its dimensionality, the weight of higher order eﬀects or the performance measure7

selected). Here we compare several total-order estimators in an eight-dimension hypercube where these8

benchmark parameters are treated as random parameters. This arrangement signiﬁcantly relaxes the9

dependency of the results on the benchmark design. We observe that the most accurate estimators are10

Razavi and Gupta’s, Jansen’s or Janon/Monod’s for factor prioritization, and Jansen’s, Janon/Monod’s11

or Azzini and Rosati’s for approaching the “true” total-order indices. The rest lag considerably behind.12

Our work helps analysts navigate the myriad of total-order formulae by reducing the uncertainty in the13

selection of the most appropriate estimator.14

Keywords: Uncertainty analysis; sensitivity analysis; modeling; Sobol’ indices; variance decomposi-15

tion, benchmarking analysis16

1 Introduction17

Sensitivity analysis, i.e. the assessment of how much uncertainty in a given model output is conveyed by

each model input, is a fundamental step to judge the quality of model-based inferences [1–3]. Among the

∗Corresponding author

1

many sensitivity indices available, variance-based indices are widely regarded as the gold standard because

they are model-free (no assumptions are made about the model), global (they account for interactions

between the model inputs) and easy to interpret [4–6]. Given a model of the form y=f(x),x=

(x1,x2,...,xi,...,xk)∈Rk, where yis a scalar output and x1,...,xkare the kindependent model inputs, the

variance of yis decomposed into conditional terms as

V(y)=

k

!

i=1

Vi+!

i!

i<j

Vij +... +V1,2,...,k,(1)

where

Vi=Vxi"Ex∼i(y|xi)#Vij =Vxi,xj"Ex∼i,j(y|xi,xj)#

−Vxi"Ex∼i(y|xi)#

−Vxj"Ex∼j(y|xj)#

(2)

and so on up to the k-th order. The notation x∼imeans all-but-xi. By dividing each term in Equation 118

by the unconditional model output variance V(y), we obtain the ﬁrst-order indices for single inputs (Si),19

pairs of inputs (Sij), and for all higher-order terms. First-order indices thus provide the proportion of V(y)20

caused by each term and are widely used to rank model inputs according to their contribution to the model21

output uncertainty, a setting known as factor prioritization [1].22

Homma and Saltelli [7] also proposed the calculation of the total-order index Ti, which measures the23

ﬁrst-order eﬀect of a model input jointly with its interactions up to the k-th order:24

Ti=1−Vx∼i"Exi(y|x∼i)#

V(y)=

Ex∼i"Vxi(y|x∼i)#

V(y).(3)

When Ti≈0, it can be concluded that xihas a negligible contribution to V(y). For this reason, total-25

order indices have been applied to distinguish inﬂuential from non-inﬂuential model inputs and reduce the26

dimensionality of the uncertain space, a setting known as factor-ﬁxing [1].27

The most direct computation of Tiis via Monte Carlo (MC) estimation because it does not impose any28

assumption on the functional form of the response function, unlike metamodeling approaches [8,9]. The29

Fourier Amplitude Sensitivity Test (FAST) may also be used to calculate Ti, which involves transforming30

input variables into periodic functions of a single frequency variable, sampling the model and analysing the31

sensitivity of input variables using Fourier analysis in the frequency domain [10,11]. While an innovative32

approach, FAST is sensitive to the characteristic frequencies assigned to input variables, and is not a very33

intuitive method - for these reasons it has mostly been superseded by Monte Carlo approaches, or by34

metamodels when computational expense is a serious issue. In this work we focus on the former.35

MC methods require generating a (N,2k)base sample matrix with either random or quasi-random36

numbers (e.g. Latin Hypercube Sampling, Sobol’ quasi-random numbers [12,13]), where each row is a37

sampling point and each column a model input. The ﬁrst kcolumns are allocated to an Amatrix and the38

remaining kcolumns to a Bmatrix, which are known as the “base sample matrices”. Any point in either39

Aor Bcan be indicated as xvi , where vand irespectively index the row (from 1 to N) and the column40

(from 1 to k). Then, kadditional A(i)

B(B(i)

A) matrices are created, where all columns come from A(B)41

2

except the i-th column, which comes from B(A). The numerator in Equation 3 is ﬁnally estimated using42

the model evaluations obtained from the A(B) and A(i)

B(B(i)

A) matrices. Some estimators may also use a43

third or Xbase sample matrices (i.e. A,B,C,...,X), although the use of more than three matrices has44

been recently proven ineﬃcient by Lo Piano et al. [14].45

1.1 Total-order estimators and uncertainties in the benchmark settings46

The search for eﬃcient and robust total-order estimators is an active ﬁeld of research [1,7,15–20]. Although47

some works have compared their asymptotic properties (i.e. [16]), most studies have promoted empirical48

comparisons where diﬀerent estimators are benchmarked against known test functions and speciﬁc sample49

sizes. However valuable these empirical studies may be, Becker [21] observed that their results are very50

much conditional on the choice of model, its dimensionality and the selected number of model runs. It51

is hard to say from previous studies whether an estimator outperforming another truly reﬂects its higher52

accuracy or simply its better performance under the narrow statistical design of the study. Below we extend53

the list of factors which Becker [21] regards as inﬂuential in a given benchmarking exercise and discuss54

how they aﬀect the relative performance of sensitive estimators.55

•The sampling method: The creation of the base sample matrices can be done using Monte-Carlo (MC)56

or quasi Monte-Carlo (QMC) methods [12,13]. Compared to MC, QMC allows to more eﬀectively57

map the input space as it leaves smaller unexplored volumes (Fig. S1). However, Kucherenko et58

al. [22] observed that MC methods might help obtain more accurate sensitivity indices when the59

model under examination has important high-order terms. Both MC and QMC have been used when60

benchmarking sensitivity indices [15,23].61

•The form of the test function: some of the most commonly used functions in SA are the Ishigami62

and Homma [24]’s, the Sobol’ G and its variants [23,25], the Bratley and Fox [26]’s or the set of63

functions presented in Kucherenko et al. [22][14,16,18,23]. Despite being analytically tractable,64

these functions capture only one possible interval of model behaviour, and the eﬀects of nonlinearities65

and nonadditivities is typically unknown in real models. This black-box nature of models has become66

more of a concern in the last decades due to the increase in computational power and code complexity67

(which prevents the analyst from intuitively grasping the model’s behaviour [27]), and to the higher68

demand for model transparency [3,28,29]. This renders the functional form of the model similar to69

a random variable [21], something not accounted for by previous works [14,16,18,23].70

•The function dimensionality: many studies focus on low-dimensional problems, either by using test71

functions that only require a few model inputs (e.g. the Ishigami function, where k=3), or by using72

test functions with a ﬂexible dimensionality, but setting kat a small value of e.g. k≤8(Sobol’73

[25]’s G or Bratley and Fox [26] functions). This approach trades computational manageability for74

comprehensiveness: by neglecting higher dimensions, it is diﬃcult to tell which estimator might work75

best in models with tens or hundreds of parameters. Examples of such models can be readily found76

3

in the Earth and Environmental Sciences domain [30], including the Soil and Water Assessment77

Tool (SWAT) model, where k=50 [31], or the Modélisation Environmentale-Surface et Hydrologie78

(MESH) model, where k=111 [32].79

•The distribution of the model inputs: the large majority of benchmarking exercises assume uniformly-80

distributed inputs p(x)∈U(0,1)k[14,16,23,33]. However, there is evidence that the accuracy of Ti

81

estimators might be sensitive to the underlying model input distributions, to the point of overturning82

the model input ranks [34,35]. Furthermore, in uncertainty analysis – e.g. in decision theory, the83

analysts may use distributions with peaks for the most likely values derived, for instance, from an84

experts elicitation stage.85

•The number of model runs: sensitivity test functions are generally not computationally expensive86

and can be run without much concern for computational time. This is frequently not the case for87

real models, whose high dimensionality and complexity might set a constraint on the total number88

of model runs available. Under such restrictions, the performance of the estimators of the total-order89

index depends on their eﬃciency (how accurate they are given the budget of runs that can be allocated90

to each model input). There are no speciﬁc guidelines as to which total-order estimator might work91

best under these circumstances [21].92

•The performance measure selected: typically, a sensitivity estimator has been considered to outper-93

form the rest if, on average, it displays a smaller mean absolute error (MAE), computed as94

MAE =

1

p

p

!

v=1$%k

i=1|Ti−ˆ

Ti|

k&,(4)

where pis the number of replicas of the sample matrix, and Tiand ˆ

Tithe analytical and the estimated95

total-order index of the i-th input. The MAE is appropriate when the aim is to assess which estimator96

better approaches the true total-order indices, because it averages the error for both inﬂuential and97

non-inﬂuential indices. However, the analyst might be more interested in using the estimated indices98

ˆ

T={ˆ

T1,ˆ

T2,..., ˆ

Ti,..., ˆ

Tk}to accurately rank parameters or screen inﬂuential from non-inﬂuential99

model inputs [1]. In such context, the MAE may be best substituted or complemented with a measure100

of rank concordance between the vectors rand ˆr, which reﬂect the ranks in Tand

ˆ

Trespectively,101

such as the Spearman’s ρor the Kendall’s Wcoeﬃcient [21,36,37]. It can also be the case that102

disagreements on the exact ranking of low-ranked parameters may have no practical importance103

because the interest lies in the correct identiﬁcation of top ranks only [30]. Savage [38] scores or104

other measures that emphasize this top-down correlation are then a more suitable choice.105

Here we benchmark the performance of eight diﬀerent MC-based formulae available to estimate Ti

106

(Table 1). While the list is not exhaustive, they reﬂect the research conducted on Tiover the last 20107

years: from the classic estimators of Saltelli et al. [1], Homma and Saltelli [7], and Jansen [15] up to108

the new contributions by Janon et al. [16], Glen and Isaacs [17], Azzini and Rosati [33] and Razavi109

4

and Gupta [20,39]. In order to reduce the inﬂuence of the benchmarking design in the assessment of110

the estimators’ accuracy, we treat the sampling method τ, the underlying model input distribution φ, the111

number of model runs Nt, the test function ε, its dimensionality and degree of non-additivity (k,k2,k3)112

and the performance measure δas random parameters. This better reﬂects the diversity of models and113

sensitivity settings available to the analyst. By relaxing the dependency of the results on these benchmark114

parameters1, we deﬁne an unprecedentedly large setting where all formulae can prove their accuracy. We115

therefore extend Becker [21]’s approach by testing a wider set of Monte Carlo estimators, by exploring a116

wider range of benchmarking assumptions and by performing a formal SA on these assumptions. The aim117

is therefore to provide a much more global comparison of available MC estimators than is available in the118

existing literature, and investigate how the benchmarking parameters may aﬀect the relative performance119

of estimators. Such information can help point to estimators that are not only eﬃcient on a particular case120

study, but eﬃcient and robust to a wide range of practical situations.121

2 Assessment of the uncertainties in the benchmarking parameters122

In this section we formulate the benchmarking parameters as random variables and assess how the per-123

formance of estimators is dependent on them by performing a sensitivity analysis. In essence this is a124

sensitivity analysis of sensitivity analyses [42], and a natural extension of a similar uncertainty analysis in a125

recent work by Becker [21]. The use of global sensitivity analysis tools to better understand the properties of126

estimators can give insights into how estimators behave in diﬀerent scenarios that are not available through127

analytical approaches.128

2.1 The setting129

The variability in the benchmark settings (τ,Nt,k,k2,k3,φ,&,δ) is described by probability distributions130

(Table 2). We assign uniform distributions (discrete or continuous) to each parameter. In particular, we131

choose τ∼DU(1,2)to check how the performance of Tiestimators is conditioned by the use of Monte-132

Carlo (τ=1) or Quasi Monte-Carlo (τ=2) methods in the creation of the base sample matrices. For133

τ=2we use the Sobol’ sequence scrambled according to Owen [43] to avoid repeated coordinates at134

the beginning of the sequence. The total number of model runs and inputs is respectively described as135

Nt∼DU(10,1000)and k∼DU(3,100)to explore the performance of the estimators in a wide range of136

Nt,kcombinations. Given the sampling constraints set by the estimators’ reliance on either a B,B(i)

A,A(i)

B

137

or C(i)

Bmatrices (Table 1), we modify the space deﬁned by (Nt,k) to a non-rectangular domain (we provide138

more information on this adjustment in Section 2.2).139

For φwe set φ∼DU(1,8)to ensure an adequate representation of the most common shapes in the140

(0,1)kdomain. Besides the normal distribution truncated at (0,1)and the uniform distribution, we also take141

1We refer to the set of benchmarking assumptions as benchmarking parameters or parameters. This is intended to distinguish

them from the inputs of each test function generated by the metafunction, which we refer to as inputs.

5

Table 1: Formulae to compute Ti.f0and V(y)are estimated according to the original papers. For estimators 2 and 5, f0=

1

N%N

v=1f(A)v. For estimators 1, 2 and 5, V(y)=1

N%N

v=1[f(A)v−f0]2[1, Eq. 4.16, 7, Eqs. 15, 20]. For estimator 3,

f0=1

N%N

v=1

f(A)v+f(A(i)

B)v

2and V(y)=1

N%N

v=1

f(A)2

v+f(A(i)

B)2

v

2−f2

0[16, Eq. 15]. In estimator 4, &f(A)v'is the mean of

f(A)v. We use a simpliﬁed version of the Glen and Isaacs estimator because spurious correlations are zero by design. As for

estimator 7, we refer to it as pseudo-Owen given its use of a Cmatrix and its identiﬁcation with Owen [40] in Iooss et al. [41], where

we retrieve the formula from. V(y)in Estimator 7 is computed as in Estimator 3 following Iooss et al. [41], whereas V(y)in Estimator

8 is computed as in Estimator 1.

NžEstimator Author

1

1

2N%N

v=1'f(A)v−f(A(i)

B)v(2

V(y)Jansen [15]

2V(y)−1

N%N

v=1f(A)vf(A(i)

B)v+f2

0

V(y)Homma and Saltelli

[7]

31−

1

N%N

v=1f(A)vf(A(i)

B)v−f2

0

V(y)Janon et al. [16]

Monod et al. [19]

41−

1

N−1%N

v=1

[f(A)v−&f(A)v']'f(A(i)

B)v−,f(A(i)

B)v-(

.V[f(A)v]V'f(A(i)

B)v(

Glen and Isaacs [17]

51−

1

N%N

v=1f(B)vf(B(i)

A)v−f2

0

V(y)Saltelli et al. [1]

6%N

v=1[f(B)v−f(B(i)

A)v]2+[f(A)v−f(A(i)

B)v]2

%N

v=1[f(A)v−f(B)v]2+[f(B(i)

A)v−f(A(i)

B)v]2Azzini et al. [18] and

Azzini and Rosati [33]

7V(y)−'1

N%N

v=12'f(B)v−f(C(i)

B)v('f(B(i)

A)v−f(A)v(3(

V(y)pseudo-Owen

8Ex∗

∼i[γx∗∼i(hi)]+Ex∗∼i[Cx∗∼i(hi)]

V(y)Razavi and Gupta [20,

39] (see SM).

into account four beta distributions parametrized with distinct αand βvalues and a logitnormal distribution142

(Fig. 1a). The aim is to check the response of the estimators under a wide range of probability distributions,143

including U-shaped distributions and distributions with diﬀerent degrees of skewness.144

We link each distribution in Fig. 1a to an integer value from 1 to 7. For instance, if φ=1, the joint145

probability distribution of the model inputs is described as p(x1,. ..,xk)=U(0,1)k. If φ=8, we create a146

vector φ={φ1,φ

2,...,φ

i,...,φ

k}by randomly sampling the seven distributions in Fig. 1a, and use the i-th147

distribution in the vector to describe the uncertainty of the i-th input. This last case examines the behavior148

of the estimators when several distributions are used to characterize the uncertainty in the model input149

space.150

6

Table 2: Summary of the parameters and their distributions. DU stands for discrete uniform.

Parameter Description Distribution

τSampling method DU(1,2)

NtTotal number of model runs DU(10,1000)

kNumber of model inputs DU(3,100)

φProbability distribution of the model inputs DU(1,8)

εRandomness in the test function DU(1,200)

k2Fraction of pairwise interactions U(0.3,0.5)

k3Fraction of three-wise interactions U(0.1,0.3)

δSelection of the performance measure DU(1,2)

2.1.1 The test function151

The parameter εoperationalizes the randomness in the form and execution of the test function. Our test152

function is an extended version of Becker [21]’s metafunction, which randomly combines punivariate153

functions in a multivariate function of dimension k. Here we consider the 10 univariate functions listed in154

Fig. 1b, which represent common responses observed in physical systems and in classic SA test functions155

(see Becker [21] for a discussion on this point). We note that an alternative approach would be to construct156

orthogonal basis functions which could allow analytical evaluation of true sensitivity indices for each157

generated function; however, this extension is left for future work.158

We construct the test function as follows:159

1. Let us consider a sample matrix such as160

M=

x11 x12 ··· x1i··· x1k

x21 x22 ··· x2i··· x2k

.

.

..

.

.....

.

.....

.

.

xv1xv2··· xvi ··· xvk

.

.

..

.

.....

.

.....

.

.

xN1xN2··· xNi ··· xNk

(5)

where every point xv=xv1,xv2,...,xvk represents a given combination of values for the kinputs

161

and xiis a model input whose distribution is deﬁned by φ.162

2. Let u={u1,u2,...,uk}be a k-length vector formed by randomly sampling with replacement the ten163

functions in Fig. 1b. The i-th function in uis then applied to the i-th model input: for instance, if k=4164

and u={u3,u4,u8,u1}, then f3(x1)=ex1−1

e−1,f4(x2)=(10 −1

1.1)−1(x2+0.1)−1,f8(x3)=sin(2πx3)

2,165

and f1(x4)=x3

4. The elements in uthus represent the ﬁrst-order eﬀects of each model input.166

7

0

3

6

9

12

0.00 0.25 0.50 0.75 1.00

x

PDF

Distribution

U(0, 1)

NT(0.5, 0.15, 0, 1)

Beta(8, 2)

Beta(2, 8)

Beta(2, 0.8)

Beta(0.8, 2)

Logitnormal(0, 3.16)

a

−0.5

0.0

0.5

1.0

1.5

0.00 0.25 0.50 0.75 1.00

x

y

Function

f1(x)=x3

f2(x)=1 if x >1 2, otherwise 0

f3(x)=(ex−1) (e−1)

f4(x)=(10 −1 1.1)−1(x+0.1)−1

f5(x)=x

f6(x)=0

f7(x)=4 (x−0.5)2

f8(x)=sin(2 π x)2

f9(x)=x2

f10(x)=cos(x)

b

Figure 1: The metafunction approach. a) Probability distributions incuded in φ.NTstands for truncated normal distribution. b)

Univariate functions included in the metafunction (f1(x)=cubic, f2(x)=discontinuous, f3(x)=exponential, f4(x)=inverse, f5(x)=

linear, f6(x)=no eﬀect, f7(x)=non-monotonic, f8(x)=periodic, f9(x)=quadratic, f10(x)=trigonometric).

3. Let Vbe a (n,2)matrix, for n=k!

2!(k−2)!, the number of pairwise combinations between the kinputs167

of the model. Each row in Vthus speciﬁes an interaction between two columns in M. In the case168

of k=4and the same elements in uas deﬁned in the previous example,169

V=

12

13

14

23

24

34

(6)

e.g., the ﬁrst row promotes f3(x1)· f4(x2), the second row f3(x1)· f8(x3), and so on until the n-th170

row. In order to follow the sparsity of eﬀects principle (most variations in a given model output171

should be explained by low-order interactions [44]), the metafunction activates only a fraction of172

these eﬀects: it randomly samples !k2n"rows from V, and computes the corresponding interactions173

8

in M.!k2n"is thus the number of pairwise interactions present in the function. We make k2an174

uncertain parameter described as k2∼U(0.3,0.5)in order to randomly activate only between 30%175

and 50% of the available second-order eﬀects in M.176

4. Same as before, but for third-order eﬀects: let Wbe a (m,3) matrix, for m=k!

3!(k−3)!, the number of177

three-wise combinations between the kinputs in M. For k=4and uas before,178

W=

123

124

134

234

(7)

e.g. the ﬁrst row leads to f3(x1)· f4(x2)· f8(x3), and so on until the m-th row. The metafunction then179

randomly samples !k3m"rows from Wand computes the corresponding interactions in M.!k3m"180

is therefore the number of three-wise interaction terms in the function. We also make k3an uncertain181

parameter described as k3∼U(0.1,0.3)to activate only between 10% and 30% of all third-order182

eﬀects in M. Note that k2>k3because third-order eﬀects tend to be less dominant than two-order183

eﬀects (Table 2).184

5. Three vectors of coeﬃcients (α,β,γ) of length k,nand mare deﬁned to represent the weights of the185

ﬁrst, second and third-order eﬀects respectively. These coeﬃcients are generated by sampling from186

a mixture of two normal distributions Ψ=0.3N(0,5)+0.7N(0,0.5). This coerces the metafunction187

into replicating the Pareto [45] principle (around 80% of the eﬀects are due to 20% of the parameters),188

found to widely apply in SA [1,46].189

6. The metafunction can thus be formalized as

y=

k

!

i=1

αifuiφi(xi)

+

!k2n"

!

i=1

βifuVi,1φi(xVi,1)fuVi,2φi(xVi,2)

+

!k3m"

!

i=1

γifuWi,1φi(xWi,1)fuWi,2φi(xWi,2)fuWi,3φi(xWi,3).

(8)

Note that there is randomness in the sampling of φ, the univariate functions in uand the coeﬃcients190

in (α,β,γ). The parameter εassesses the inﬂuence of this randomness by ﬁxing the starting point191

of the pseudo-random number sequence used for sampling the parameters just mentioned. We use192

ε∼U(1,200)to ensure that the same seed does not overlap with the same value of Nt,kor any193

other parameter, an issue that might introduce determinism in a process that should be stochastic. In194

Figs. S2–S3 we show the type of Tiindices generated by this metafunction.195

Finally, we describe the parameter δas δ∼DU(1,2). If δ=1, we compute the Kendall τ-b correlation

coeﬃcient between ˆrand r, the estimated and the “true” ranks calculated from

ˆ

Tand Trespectively.

9

This aims at evaluating how well the estimators in Table 1 rank all model inputs. If δ=2, we compute

the Pearson correlation between rand ˆrafter transforming the ranks to Savage scores [38]. This setting

examines the performance of the estimators when the analyst is interested in ranking only the most important

model inputs. Savage scores are given as

Sai=

k

!

j=i

1

j,(9)

where jis the rank assigned to the jth element of a vector of length k. If x1>x2>x3, the Savage scores196

would then be Sa1=1+1

2+1

3,Sa2=1

2+1

3, and Sa3=1

3. The parameter δthus assesses the accuracy of the197

estimators in properly ranking the model inputs; in other words, when they are used in a factor prioritization198

setting [1].199

In order to examine also how accurate the estimators are in approaching the “true” indices, we run an

extra round of simulations with the MAE as the only performance measure, which we compute as

MAE =%k

i=1|Ti−ˆ

Ti|

k.(10)

Note that, unlike Equation 4, Equation 10 does not make use of replicas. This is because the eﬀect of200

the sampling is averaged out in our design by simultaneously varying all parameters in many diﬀerent201

simulations.202

2.2 The execution of the algorithm203

We examine how sensitive the performance of total-order estimators is to the uncertainty in the benchmark204

parameters τ,Nt,k,k2,k3,φ,&,δby means of a global SA. We create an A,Band k−1A(i)

Bmatrices, each205

of dimension (211,k), using Sobol’ quasi-random numbers. In these matrices each column is a benchmark206

parameter described with the probability distributions of Table 2 and each row is a simulation with a speciﬁc207

combination of τ,Nt,k,.. . values. Note that we use k−1A(i)

Bmatrices because we group Ntand kand208

treat them like a single benchmark parameter given their correlation (see below).209

Our algorithm runs rowwise over the A,Band k−1A(i)

Bmatrices, for v=1,2,.. .,18,432 rows. In210

the v-th row it does the following:211

1. It creates ﬁve (Ntv,kv)matrices using the sampling method deﬁned by τv. The need for these ﬁve212

sub-matrices responds to the ﬁve speciﬁc sampling designs requested by the estimators of our study213

(Table 1). We use these matrices to compute the vector of estimated indices

ˆ

Tifor each estimator:214

(a) An Amatrix and kvA(i)

Bmatrices, each of size (Nv,kv),Nv=!Ntv

kv+1"(Estimators 1–4 in215

Table 1).216

(b) An A,Band kvA(i)

Bmatrices, each of size (Nv,kv),Nv=!Ntv

kv+2"(Estimator 5 in Table 1).217

(c) An A,Band kvA(i)

Band B(i)

Amatrices, each of size (Nv,kv),Nv=!Ntv

2kv+2"(Estimator 6 in218

Table 1).219

10

(d) An A,Band kvB(i)

Aand C(i)

Bmatrices, each of size (Nv,kv),Nv=!Ntv

2kv+2"(Estimator 7 in220

Table 1).221

(e) A matrix formed by Nvstars, each of size kv(1

∆h−1)+1. Given that we set ∆hat 0.2 (see222

Supplementary Materials), Nv=#Ntv

4k+1$(Estimator 8 in Table 1).223

The diﬀerent sampling designs and the value for kvconstrains the total number of runs Ntvthat can224

be allocated to each estimator. Furthermore, given the probability distributions selected for Ntand225

k(Table 2), speciﬁc combinations of (Ntv,kv) lead to Nv≤1, which is computationally unfeasible.226

To minimize these issues we force the comparison between estimators to approximate the same Ntv

227

value. Since the sampling design structure of Razavi and Gupta is the most constraining, we use228

Nv=2(4k+1)

k+1(for estimators 1–4), Nv=2(4k+1)

k+2(for estimator 5) and Nv=2(4k+1)

2k+2(for estimators 6–7)229

when Nv≤1in the case of Razavi and Gupta. This compels all estimators to explore a very similar230

portion of the (Nt,k) space, but Ntand kbecome correlated, which contradicts the requirement of231

independent inputs characterizing variance-based sensitivity indices [1]. This is why we treat (Nt,k)232

as a single benchmark parameter in the SA.233

2. It creates a sixth matrix, formed by an Aand kvA(i)

Bmatrices, each of size (211,kv). We use this234

sub-matrix to compute the vector of “true” indices T, which could not be calculated analytically due235

to the wide range of possible functional forms created by the metafunction. Following Becker [21],236

we assume that a fairly accurate approximation to Tcould be achieved with a large Monte Carlo237

estimation.238

3. The distribution of the model inputs in these six sample matrices is deﬁned by φv.239

4. The metafunction runs over these six matrices simultaneously, with its functional form, degree of240

active second and third-order eﬀects as set by εv,k2vand k3vrespectively.241

5. It computes the estimated sensitivity indices

ˆ

Tvfor each estimator and the “true” sensitivity indices242

Tvusing the Jansen [15] estimator, which is currently best practice in SA.243

6. It checks the performance of the estimators. This is done in two ways:244

(a) If δ=1, we compute the correlation between ˆrvand rv(obtained respectively from

ˆ

Tvand Tv)245

with Kendall tau, and if δ=2we compute the correlation between ˆrvand rvon Savage scores.246

The model output in both cases is the correlation coeﬃcient r, with higher rvalues indicating247

a better performance in properly ranking the model inputs.248

(b) We compute the MAE between

ˆ

Tvand Tv. In this case the model output is the MAE, with249

lower values indicating a better performance in approaching the “true” total-order indices.250

11

3 Results251

3.1 Uncertainty analysis252

Under a factor prioritization setting (e.g. when the aim is to rank the model inputs in terms of their253

contribution to the model output variance), the most accurate estimators are Jansen, Razavi and Gupta,254

Janon/Monod and Azzini and Rosati. The distribution of rvalues (the correlation between estimated and255

"true" ranks) when these estimators are used is highly negatively skewed, with median values of ≈0.9.256

Glen and Isaacs, Homma and Saltelli, Saltelli and pseudo-Owen lag behind and display median rvalues of257

≈0.35, with pseudo-Owen ranking last (r≈0.2). The range of values obtained with these formulae is much258

more spread out and include a signiﬁcant number of negative rvalues, suggesting that they overturned the259

true ranks in several simulations (Figs. 2a, S4).260

−1.0

−0.5

0.0

0.5

1.0

Jansen

Razavi and Gupta

Janon/Monod

Azzini and Rosati

Glen and Isaacs

Homma and Saltelli

Saltelli

pseudo−Owen

r

a

10−2

100

102

104

Janon/Monod

Jansen

Azzini and Rosati

Glen and Isaacs

pseudo−Owen

Homma and Saltelli

Saltelli

Razavi and Gupta

MAE

b

Figure 2: Boxplots summarizing the results of the simulations. a) Correlation coeﬃcient between ˆrand r, the vector of estimated

and “true” ranks. b) Mean Absolute Error (MAE).

When the goal is to approximate the “true” indices, Janon/Monod, Jansen and Azzini and Rosati also261

oﬀer the best performance. The median MAE obtained with these estimators is generally smaller than262

Glen and Isaacs’ and pseudo-Owen’s, and the distribution of MAE values is much more narrower than263

that obtained with Homma and Saltelli, Saltelli or Razavi and Gupta. These three estimators are the least264

accurate and produce several MAE values larger than 102in several simulations (Fig. 2b). The volatility265

of Razavi and Gupta under the MAE is reﬂected in the numerous outliers produced and sharply contrasts266

with its very good performance in a factor prioritization setting (Fig. 2a).267

To obtain a ﬁner insight into the structure of these results, we plot the total number of model runs268

Ntagainst the function dimensionality k(Fig. 3). This maps the performance of the estimators in the269

input space formed by all possible combinations of Ntand kgiven the speciﬁc design constraints of each270

12

formulae. Under a factor prioritization setting, almost all estimators perform reasonably well at a very271

small dimensionality (k≤10,r>0.7), regardless of the total number of model runs available. However,272

some diﬀerences unfold at higher dimensions: Saltelli, Homma and Saltelli, Glen and Isaacs and especially273

pseudo-Owen swiftly become inaccurate for k>10, even with large values for Nt. Azzini and Rosati274

display a very good performance overall except in the upper Nt,kboundary, where most of the orange dots275

concentrate. The estimators of Jansen, Janon/Monod and Razavi and Gupta rank the model inputs almost276

ﬂawlessly regardless of the region explored in the Nt,kdomain (Fig. 3a).277

With regards to the MAE, Janon/Monod, Jansen and Azzini and Rosati maintain their high performance278

regardless of the Nt,kregion explored. The accuracy of Razavi and Gupta, however, drops at the upper-279

leftmost part of the Nt,kboundary, where most of the largest MAE scores are located (MAE >10). In280

the case of Saltelli and Homma and Saltelli, the largest MAE values concentrate in the region of small k281

regardless of the total number of model runs, a domain in which they achieved a high performance when282

the focus was on properly ranking the model inputs.283

The presence of a non-negligible proportion of model runs with r<0suggests that some estimators284

signiﬁcantly overturned the true ranks (Figs 3a, S4). To better examine this phenomenon, we re-plot Fig 3b285

with just the simulations yielding r<0(Fig. S5). We observe that r<0values not only appear in the286

region of small Nt, a foreseeable miscalculation derived from allocating an insuﬃcient number of model287

runs to each model input: they also emerge at a relatively large Ntand low kin the case of pseudo-Owen,288

Saltelli and Homma and Saltelli. The Saltelli estimator actually concentrates in the k<10 zone most of289

the simulations with the lowest negative rvalues (Fig. S5). This suggests that rank reversing is not an290

artifact of our study design as much as a by-product of the volatility of these estimators when stressed by291

the sources of computational uncertainty listed in Table 2. Such strain may lead these estimators to produce292

a signiﬁcant fraction of negative indices or indices beyond 1, thus eﬀectively promoting r<0.293

We calculate the proportion of Ti<0and Ti>1in each simulation that yielded r<0. In the294

case of Glen and Isaacs and Homma and Saltelli, r<0values are caused by the production of a large295

proportion of Ti<0(25%–75%, the xaxis in Fig. 4). Pseudo-Owen and Saltelli suﬀer this bias too and296

in several simulations they also generate a large proportion of Ti>1(up to 100% of the model inputs,297

the yaxis in Fig. 4). The production of Ti<0and Ti>1is caused by numerical errors and fostered by298

the values generated at the numerator of Equation 3: Ti<0may either derive from Ex∼i"Vxi(y|x∼i)#<0299

(e.g. Homma and Saltelli and pseudo-Owen) or Vx∼i"Exi(y|x∼i)#>V(y)(e.g. Saltelli), whereas Ti>1300

from Ex∼i"Vxi(y|x∼i)#>V(y)(e.g. Homma and Saltelli and pseudo-Owen) or Vx∼i"Exi(y|x∼i)#<0(e.g.301

Saltelli).302

To better examine the eﬃciency of the estimators, we summarized their performance as a function of303

the number of runs available per model input Nt/k[21] (Fig. 5, S6). This information is especially relevant304

to take an educated decision on which estimator to use in a context of a high-dimension, computationally305

expensive model. Even when the budget of runs per input is low [(Nt/k)∈[2,20]], Razavi and Gupta,306

Jansen and Janon/Monod are very good at properly ranking model inputs (r≈0.9), and are followed very307

13

Saltelli

Razavi and Gupta

pseudo−Owen

Jansen

Janon/Monod

Homma and Saltelli

Glen and Isaacs

Azzini and Rosati

0 500 1000

0

50

100

0

50

100

0

50

100

0

50

100

0

50

100

0

50

100

0

50

100

0

50

100

Nt

k

−1.0 −0.5 0.0 0.5 1.0

r

a

Saltelli

Razavi and Gupta

pseudo−Owen

Jansen

Janon/Monod

Homma and Saltelli

Glen and Isaacs

Azzini and Rosati

0 500 1000

0

50

100

0

50

100

0

50

100

0

50

100

0

50

100

0

50

100

0

50

100

0

50

100

Nt

0.001 0.1 10

MAE

b

Figure 3: Number of runs Ntagainst the function dimensionality k. Each dot is a simulation with a speciﬁc combination of the

benchmark parameters in Table 2. The greener (blacker) the color, the better (worse) the performance of the estimator. a) Accuracy

of the estimators when the goal is to properly rank the model inputs, e.g. a factor prioritization setting. b) Accuracy of the estimators

when the goal is to approach the “true” total-order indices.

close by Azzini and Rosati (r≈0.8). Saltelli, Homma and Saltelli and Glen and Isaacs come after (r≈0.3),308

with pseudo-Owen scoring last (r≈0.2). When the Nt/kratio is increased, all estimators improve their309

ranking accuracy and some quickly reach the asymptote: this is the case of Razavi and Gupta, Janon/Monod310

and Jansen, whose performance becomes almost ﬂawless from (Nt/k)∈[40,60]onwards, and of Azzini311

14

pseudo−Owen

Saltelli

Glen and Isaacs

Homma and Saltelli

0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Proportion of Ti<0

Proportion of Ti>1

−1.00−0.75−0.50−0.25

r

Figure 4: Scatterplot of the proportion of Ti<0against the proportion of Ti>1mapped against the model output r. Each dot is a

simulation. Only simulations with r<0are displayed.

and Rosati, which reaches its optimum at (Nt/k)∈[60,80]. The accuracy of the other estimators does312

not seem to fully stabilize within the range of ratios examined. In the case of Homma and Saltelli and313

Saltelli, their performance oscillates before plummeting at (Nt/k)∈[200,210],(Nt/k)∈[240,260]and314

(Nt/k)∈[260,280]due to several simulations yielding large r<0values (Fig. 5a).315

Janon/Monod and Jansen are also the most eﬃcient estimators when the MAE is the measure of316

choice, followed closely by Azzini and Rosati, Razavi and Gupta and Glen and Isaacs. Saltelli and317

Homma and Saltelli gain accuracy at higher Nt/kratios yet their precision diminishes all the same from318

(Nt/k)∈[200,210]onwards (Fig. 5b).319

3.2 Sensitivity analysis320

When the aim is to rank the model inputs, the selection of the performance measure (δ) has the highest321

ﬁrst-order eﬀect in the accuracy of the estimators (Fig. 6a). The parameter δis responsible for between322

20% (Azzini and Rosati) and 30% (Glen and Isaacs) of the variance in the ﬁnal rvalue. On average, all323

estimators perform better when the rank is conducted on Savage scores (δ=2), i.e. when the focus is on324

ranking the most important model inputs only (Figs. S8–S15). As for the distribution of the model inputs325

(φ), it has a ﬁrst-order eﬀect in the accuracy of Azzini and Rosati (≈10%), Jansen and Janon / Monod326

(≈15%) and Razavi and Gupta (≈20%) regardless of whether the aim is a factor prioritization (r) or327

approaching the “true” indices (MAE). The performance of these estimators drops perceptibly when the328

model inputs are distributed as Beta(8,2)or Beta(2,8)(φ=3and φ=4, Figs. S8-S23), suggesting that329

15

Estimator

Azzini and Rosati Glen and Isaacs Homma and Saltelli

Janon/Monod Jansen pseudo−Owen

Razavi and Gupta Saltelli

−0.5

0.0

0.5

1.0

10 30 100 300

Ntk

median(r)

a

10−2

10−1

100

101

102

10 30 100 300

Ntk

median(MAE)

b

Figure 5: Scatterplot of the model output ragainst the number of model runs allocated per model input (Nt/k). See Fig. S6 for a

visual display of all simulations and Fig. S7 for an assessment of the number of model runs that each estimator has in each Nt/k

compartment.

they may be especially stressed by skewed distributions. The selection of random or quasi-random numbers330

during the construction of the sample matrix (τ) also directly conditions the accuracy of several estimators.331

If the aim is to approach the “true” indices (MAE), τconveys from 17% (Azzini and Rosati) to ≈30%332

(Glen and Isaacs) of the model output variance, with all estimators except Razavi and Gupta performing333

better on quasi-random numbers (τ=2, Figs. S16–S23). In a factor prioritization setting, τis mostly334

inﬂuential through interactions. Interestingly, the proportion of active second and third-order interactions335

(k2,k3) does not alter the performance of any estimator in any of the settings examined.336

To better understand the structure of the sensitivities, we compute Sobol’ indices after grouping indi-337

vidual parameters in three clusters, which we deﬁne based on their commonalities: the ﬁrst group includes338

(δ,τ)and reﬂects the inﬂuence of those parameters that can be deﬁned by the sensitivity analyst during339

the setting of the benchmark exercise. The second combines (ε,k2,k3,φ) and examines the overall impact340

of the model functional form, referred to as f(x), which is often beyond the analyst’s grasp. Finally, the341

third group includes (Nt,k)only and assesses the inﬂuence of the sampling design in the accuracy of the342

estimators (we assume that the total number of model runs, besides being conditioned by the computing343

resources at hand, is also partially determined by the joint eﬀect of the model dimensionality and the use344

of either a B,A(i)

B),B(i)

Aor C(i)

Bmatrices) (Fig 6b).345

The uncertainty in the functional form of the model [ f(x)] is responsible for approximately 20% of346

the variance in the performance of Azzini and Rosati, Janon/Monod or Jansen in a factor prioritization347

setting. For Glen and Isaacs, Homma and Saltelli, pseudo-Owen or Saltelli, f(x)is inﬂuential only through348

interactions with the other clusters. When the MAE is the performance measure of interest, f(x)has a349

16

Sobol' indices SiTi

r

MAE

Azzini and Rosati

Glen and Isaacs

Homma and Saltelli

Janon/Monod

Jansen

pseudo−Owen

Razavi and Gupta

Saltelli

δεk2k3φτεk2k3φτ

0.0

0.5

0.0

0.5

0.0

0.5

0.0

0.5

0.0

0.5

0.0

0.5

0.0

0.5

0.0

0.5

Sobol' index

a

r

MAE

Azzini and Rosati

Glen and Isaacs

Homma and Saltelli

Janon/Monod

Jansen

pseudo−Owen

Razavi and Gupta

Saltelli

δ τf(x)Nt kf(x)Nt k

0.0

0.5

0.0

0.5

0.0

0.5

0.0

0.5

0.0

0.5

0.0

0.5

0.0

0.5

0.0

0.5

b

Figure 6: Sobol’ indices. a) Individual parameters. b) Clusters of parameters. The cluster f(x)includes all parameters that describe

the uncertainty in the functional form of the model ($,k2,k3,φ). Ntand kare assessed simultaneously due to their correlation. Note

that the MAE facet does not include the group (δτ) because δ(the performance measure used) is no longer an uncertain parameter in

this setting.

much stronger inﬂuence in the accuracy of the estimators than the couple (Nt,k), especially in the case350

of Glen and Isaacs (≈40%). In any case, the accuracy of the estimators is signiﬁcantly conditioned by351

interactions between the benchmark parameters. The sum of all individual Siindices plus the Siindex352

17

of the (Nt,k)cluster only explains from ≈45% (Saltelli) to ≈70% (Glen and Isaacs) of the estimators’353

variance in ranking the model inputs, and from ≈24% (pseudo-Owen) to ≈60% (Razavi and Gupta) of the354

variance in approaching the “true” indices.355

4 Discussion and conclusions356

Here we design an eight-dimension background for variance-based total-order estimators to confront and357

prove their value in an unparalleled range of SA scenarios. By randomizing the parameters that condition358

their performance, we obtain a comprehensive picture of the advantages and disadvantages of each estimator359

and identify which particular benchmark factors make them more prone to error. Our work thus provides360

a thorough empirical assessment of state-of-the-art total-order estimators and contributes to deﬁne best361

practices in variance-based SA. The study also aligns with previous works focused on testing the robustness362

of the tools available to sensitivity analysts, a line of inquiry that can be described as a sensitivity analysis363

of a sensitivity analysis (SA of SA) [42].364

Our results provide support to the assumption that the scope of previous benchmark studies is limited365

by the plethora of non-unique choices taken during the setting of the analysis [21]. We have observed366

that almost all decisions have a non-negligible eﬀect: from the selection of the sampling method to the367

choice of the performance measure, the design prioritized by the analyst can inﬂuence the performance of368

the estimator in a non-obvious way, namely through interactions. The importance of non-additivities in369

conditioning performance suggests that the benchmark of sensitivity estimators should no longer rely on370

statistical designs that change one parameter at a time (usually the number of model runs and, more rarely,371

the test function [14,16,18,20,23,33,39,40,42]). Such setting reduces the uncertain space to a minimum372

and misses the eﬀects that the interactions between the benchmark parameters have in the ﬁnal accuracy of373

the estimator. If global SA is the recommended practice to fully explore the uncertainty space of models,374

sensitivity estimators, being algorithms themselves, should be likewise validated [42].375

Our approach also compensates the lack of studies on the theoretical properties of estimators in the376

sensitivity analysis literature (see for instance [15,47]), and allows a more detailed examination of their377

performance than theoretical comparisons. Empirical studies like ours mirror the numerical character378

of sensitivity analysis when the indices can not be analytically calculated, which is most of the time in379

“real-world” mathematical modeling.380

Two recommendations emerge from our work: the estimators by Razavi and Gupta, Jansen, Janon/Monod381

or Azzini and Rosati should be preferred when the aim is to rank the model inputs. Jansen, Janon/Monod382

or Azzini and Rosati should also be prioritized if the goal is to estimate the “true” total-order indices. The383

drop in performance of Razavi and Gupta in the second setting may be explained by a bias at a lower sample384

sizes, i.e. a consistent over-estimation of all total-order indices. This is because their estimator relies on a385

constant mean assumption whose validity degrades with larger values of ∆h[20,39]. In order to remove386

this bias, ∆hshould take very small values (e.g., ∆h=0.01), which may not be computationally feasible.387

18

Since the direction of this bias is the same for all parameters it only aﬀects the calculation of the “true”388

total-order indices, not the capacity of the estimator to properly rank the model inputs.389

It is also worth stating that Razavi and Gupta is the only estimator studied here that require the analyst390

to deﬁne a tuning parameter, ∆h. In this paper we have set ∆h=0.2after some preliminary trials with391

the estimator; other works have used diﬀerent values (e.g. ∆h=0.002,∆h=0.1,∆h=0.3;[20,21,392

39]). Selecting the most appropriate value for a given tuning parameter is not an obvious choice and this393

uncertainty can make an estimator volatile, as shown by Puy et al. [42] in the case of the PAWN index.394

The fact that Glen and Isaacs, Homma and Saltelli, Saltelli and pseudo-Owen do not perform as well in395

properly ranking the model inputs and approaching the “true” total-order indices may be partially explained396

by their less eﬃcient computation of elementary eﬀects: by allowing the production of negative terms in397

the numerator these estimators also permit the production of negative total-order indices, thus leading to398

biased rankings or sensitivity indices. In the case of Saltelli, the use of a Bmatrix at the numerator and an399

Amatrix at the denominator exacerbates its volatility (Table 1, Nž5). Such inconsistency was corrected in400

Saltelli et al. [23].401

The consistent robustness of Jansen, Janon/Monod and Azzini and Rosati makes their sensitivity to the402

uncertain parameters studied here almost negligible. They are already highly optimized estimators with403

not much room for improvement. Most of their performance is conditioned by the ﬁrst and total-order404

eﬀects of the model form jointly with the underlying probability distributions ( f(x)in Fig. 6b), as well as405

by their sampling design (Nt,k), which are in any case beyond the analyst’s control. As for the rest, their406

accuracy might be enhanced by allocating a larger number of model runs per input (if computationally407

aﬀordable), and especially in the case of Homma and Saltelli, Saltelli and Glen and Isaacs, by restricting408

their use to low-dimensional models (k<10) and sensitivity settings that only require ranking the most409

important parameters (a restricted factor prioritisation setting; [1]). Nevertheless, their substantial volatility410

is considerably driven by non-additivities, a combination that makes them hard to tame and should raise411

caution about their use in any modeling exercise.412

Our results slightly diﬀer from Becker [21]’s, who observed that Jansen outperformed Janon/Monod413

under a factor prioritization setting. We did not ﬁnd any signiﬁcant diﬀerence between these estimators.414

Although our metafunction approach is based on Becker [21]’s, our study tests the accuracy of estimators415

in a larger uncertain space as we also account for the stress introduced by changes in the sampling method416

τ, the underlying probability distributions φor the performance measure selected δ. These diﬀerences may417

account for the slightly diﬀerent results obtained between the two papers.418

Our analysis can be extended to other sensitivity estimators (i.e. moment-independent like entropy-419

based [48]; the δ-measure [49]; or the PAWN index, [50,51]). Moreover, it holds potential to be used overall420

as a standard crash test every time a new sensitivity estimator is introduced to the modeling community. One421

of its advantages is its ﬂexibility: Becker [21]’s metafunction can be easily extended with new univariate422

functions or probability distributions, and the settings modiﬁed to check performance under diﬀerent423

degrees of non-additivities or in a larger (Nt,k)space. With some slight modiﬁcations it should also allow424

19

to produce functions with dominant low-order or high-order terms, labeled as Type B and C by Kucherenko425

et al. [22]. This should prompt developers of sensitivity indices to severely stress their estimators so the426

modeling community and decision-makers fully appraise how they deal with uncertainties.427

5 Code availability428

The Rcode to replicate our results is available in Puy [52] and in GitHub (https://github.com/429

arnaldpuy/battle_estimators). The uncertainty and sensitivity analysis have been carried out with430

the Rpackage sensobol [53], which also includes the test function used in this study.431

6 Acknowledgements432

We thank Saman Razavi for his insights on the behavior of the Razavi and Gupta estimator. This work433

has been funded by the European Commission (Marie Skłodowska-Curie Global Fellowship, grant number434

792178 to A.P.).435

References436

[1] A. Saltelli, M. Ratto, T. Andres, F. Campolongo, J. Cariboni, D. Gatelli, M. Saisana, and S. Tarantola.437

Global Sensitivity Analysis. The Primer. Chichester, UK: John Wiley & Sons, Ltd, Dec. 2008. doi:438

10.1002/9780470725184.439

[2] A. Jakeman, R. Letcher, and J. Norton. “Ten iterative steps in development and evaluation of440

environmental models”. Environmental Modelling & Software 21.5 (May 2006), 602–614. doi:441

10.1016/j.envsoft.2006.01.004.442

[3] S. Eker, E. Rovenskaya, M. Obersteiner, and S. Langan. “Practice and perspectives in the validation of443

resource management models”. Nature Communications 9.1 (2018), 1–10. doi:10.1038/s41467-444

018-07811-9.445

[4] A. Saltelli. “Sensitivity analysis for importance assessment”. Risk Analysis 22.3 (June 2002), 579–446

590. doi:10.1111/0272-4332.00040.447

[5] B. Iooss and P. Lemaître. “A Review on Global Sensitivity Analysis Methods”. Uncertainty Man-448

agement in Simulation-Optimization of Complex Systems. Operations Research/Computer Science449

Interfaces Series, vol 59. Ed. by G. Dellino and C. Meloni. Boston: Springer, 2015, 101–122. doi:450

10.1007/978-1-4899-7547-8_5. arXiv: 1404.2405.451

[6] W. Becker and A. Saltelli. “Design for sensitivity analysis”. Handbook of Design and Analysis of452

Experiments. Ed. by A. Dean, M. Morris, J. Stufken, and D. Bingham. Boca Ratón: CRC Press,453

Taylor & Francis, 2015, 627–674. doi:10.1201/b18619.454

20

[7] T. Homma and A. Saltelli. “Importance measures in global sensitivity analysis of nonlinear models”.455

Reliability Engineering & System Safety 52 (1996), 1–17. doi:10.1016/0951-8320(96)00002-6.456

[8] L. Le Gratiet, S. Marelli, and B. Sudret. “Metamodel-Based Sensitivity Analysis: Polynomial Chaos457

Expansions and Gaussian Processes”. Handbook of Uncertainty Quantiﬁcation. Cham: Springer458

International Publishing, 2017, 1289–1325. doi:10.1007/978-3- 319-12385- 1_38.459

[9] A. Saltelli, S. Tarantola, and K. P.-S. Chan. “A quantitative model-independent method for global460

sensitivity analysis of model output”. Technometrics 41.1 (Feb. 1999), 39. doi:10.2307/1270993.461

[10] R. I. Cukier, C. M. Fortuin, K. E. Shuler, A. G. Petschek, and J. H. Schaibly. “Study of the sensitivity462

of coupled reaction systems to uncertainties in rate coeﬃcients. I Theory”. The Journal of chemical463

physics 59.8 (1973), 3873–3878.464

[11] R. I. Cukier, H. B. Levine, and K. E. Shuler. “Nonlinear sensitivity analysis of multiparameter model465

systems”. Journal of computational physics 26.1 (1978), 1–42.466

[12] I. M. Sobol’. “On the distribution of points in a cube and the approximate evaluation of integrals”.467

USSR Computational Mathematics and Mathematical Physics 7.4 (Jan. 1967), 86–112. doi:10.468

1016/0041-5553(67)90144-9.469

[13] I. M. Sobol’. “Uniformly distributed sequences with an additional uniform property”. USSR Compu-470

tational Mathematics and Mathematical Physics 16.5 (Jan. 1976), 236–242. doi:10.1016/0041-471

5553(76)90154-3.472

[14] S. Lo Piano, F. Ferretti, A. Puy, D. Albrecht, and A. Saltelli. “Variance-based sensitivity analysis: The473

quest for better estimators and designs between explorativity and economy”. Reliability Engineering474

& System Safety 206.October 2020 (Feb. 2021), 107300. doi:10.1016/j.ress.2020.107300.475

[15] M. Jansen. “Analysis of variance designs for model output”. Computer Physics Communications476

117.1-2 (Mar. 1999), 35–43. doi:10.1016/S0010-4655(98)00154-4.477

[16] A. Janon, T. Klein, A. Lagnoux, M. Nodet, and C. Prieur. “Asymptotic normality and eﬃciency478

of two Sobol index estimators”. ESAIM: Probability and Statistics 18.3 (2014), 342–364. doi:479

10.1051/ps/2013040. arXiv: arXiv:1303.6451v1.480

[17] G. Glen and K. Isaacs. “Estimating Sobol sensitivity indices using correlations”. Environmental481

Modelling and Software 37 (2012), 157–166. doi:10.1016/j.envsoft.2012.03.014.482

[18] I. Azzini, T. Mara, and R. Rosati. “Monte Carlo estimators of ﬁrst-and total-orders Sobol’ indices”483

(2020). arXiv: 2006.08232.484

[19] H. Monod, C. Naud, and D. Makowski. Uncertainty and sensitivity analysis for crop models. Ed. by485

D. Wallach, D. Makowski, and J. W. Jones. Elsevier, 2006, 35–100.486

[20] S. Razavi and H. V. Gupta. “A new framework for comprehensive, robust, and eﬃcient global487

sensitivity analysis: 2. Application”. Water Resources Research 52.1 (Jan. 2016), 440–455. doi:488

10.1002/2015WR017558. arXiv: 2014WR016527 [10.1002].489

21

[21] W. Becker. “Metafunctions for benchmarking in sensitivity analysis”. Reliability Engineering and490

System Safety 204 (2020), 107189. doi:10.1016/j.ress.2020.107189.491

[22] S. Kucherenko, B. Feil, N. Shah, and W. Mauntz. “The identiﬁcation of model eﬀective dimensions492

using global sensitivity analysis”. Reliability Engineering & System Safety 96.4 (Apr. 2011), 440–493

449. doi:10.1016/j.ress.2010.11.003.494

[23] A. Saltelli, P. Annoni, I. Azzini, F. Campolongo, M. Ratto, and S. Tarantola. “Variance based495

sensitivity analysis of model output. Design and estimator for the total sensitivity index”. Computer496

Physics Communications 181.2 (Feb. 2010), 259–270. doi:10.1016/j.cpc.2009.09.018.497

[24] T. Ishigami and T. Homma. “An importance quantiﬁcation technique in uncertainty analysis for com-498

puter models”. Proceedings. First International Symposium on Uncertainty Modeling and Analysis499

12 (1990), 398–403.500

[25] I. M. Sobol’. “On quasi-Monte Carlo integrations”. Mathematics and Computers in Simulation 47.2-5501

(Aug. 1998), 103–112. doi:10.1016/S0378-4754(98)00096-2.502

[26] P. Bratley and B. L. Fox. “ALGORITHM 659: implementing Sobol’s quasirandom sequence gener-503

ator”. ACM Transactions on Mathematical Software (TOMS) 14.1 (1988), 88–100.504

[27] E. Borgonovo and E. Plischke. “Sensitivity analysis: A review of recent advances”. European Journal505

of Operational Research 248.3 (2016), 869–887. doi:10.1016/j.ejor.2015.06.032.506

[28] A. Saltelli. “A short comment on statistical versus mathematical modelling”. Nature Communications507

10.1 (2019), 8–10. doi:10.1038/s41467-019-11865-8.508

[29] A. Saltelli, G. Bammer, I. Bruno, E. Charters, M. Di Fiore, E. Didier, W. Nelson Espeland, J. Kay, S.509

Lo Piano, D. Mayo, R. Pielke Jr, T. Portaluri, T. M. Porter, A. Puy, I. Rafols, J. R. Ravetz, E. Reinert,510

D. Sarewitz, P. B. Stark, A. Stirling, J. van der Sluijs, and P. Vineis. “Five ways to ensure that models511

serve society: a manifesto”. Nature 582.7813 (June 2020), 482–484. doi:10.1038/d41586- 020-512

01812-9.513

[30] R. Sheikholeslami, S. Razavi, H. V. Gupta, W. Becker, and A. Haghnegahdar. “Global sensitivity514

analysis for high-dimensional problems: How to objectively group factors and measure robustness and515

convergence while reducing computational cost”. Environmental Modelling and Software 111.June516

2018 (2019), 282–299. doi:10.1016/j.envsoft.2018.09.002.517

[31] F. Sarrazin, F. Pianosi, and T. Wagener. “Global Sensitivity Analysis of environmental models:518

Convergence and validation”. Environmental Modelling and Software 79 (2016), 135–152. doi:519

10.1016/j.envsoft.2016.02.005.520

[32] A. Haghnegahdar, S. Razavi, F. Yassin, and H. Wheater. “Multicriteria sensitivity analysis as a diag-521

nostic tool for understanding model behaviour and characterizing model uncertainty”. Hydrological522

Processes 31.25 (2017), 4462–4476. doi:10.1002/hyp.11358.523

22

[33] I. Azzini and R. Rosati. “The IA-Estimator for Sobol’ sensitivity indices”. Ninth International524

Conference on Sensitivity Analysis of Model Output. Barcelona, 2019.525

[34] M. J. Shin, J. H. A. Guillaume, B. F. W. Croke, and A. J. Jakeman. “Addressing ten questions about526

conceptual rainfall-runoﬀmodels with global sensitivity analyses in R”. Journal of Hydrology 503527

(2013), 135–152. doi:10.1016/j.jhydrol.2013.08.047.528

[35] L. Paleari and R. Confalonieri. “Sensitivity analysis of a sensitivity analysis: We are likely overlooking529

the impact of distributional assumptions”. Ecological Modelling 340 (2016), 57–63. doi:10.1016/530

j.ecolmodel.2016.09.008.531

[36] C. Spearman. “The proof and measurement of association between two things”. The American532

Journal of Psychology 15.1 (Jan. 1904), 72. doi:10.2307/1412159.533

[37] M. G. Kendall and B. B. Smith. “The problem of m rankings”. The Annals of Mathematical Statistics534

10.3 (Sept. 1939), 275–287. doi:10.1214/aoms/1177732186.535

[38] I. R. Savage. “Contributions to the theory of rank order statistics - the two sample case”. Annals of536

Mathematical Statistics 27 (1956), 590–615.537

[39] S. Razavi and H. V. Gupta. “A new framework for comprehensive, robust, and eﬃcient global538

sensitivity analysis: 1. Theory”. Water Resources Research 52.1 (Jan. 2016), 423–439. doi:10 .539

1002/2015WR017559.540

[40] A. B. Owen. “Better estimation of small sobol’ sensitivity indices”. ACM Transactions on Modeling541

and Computer Simulation 23.2 (2013), 1–17. doi:10.1145/2457459.2457460.542

[41] B. Iooss, A. Janon, G. Pujol, with contributions from Baptiste Broto, K. Boumhaout, S. D. Veiga,543

T. Delage, R. E. Amri, J. Fruth, L. Gilquin, J. Guillaume, L. Le Gratiet, P. Lemaitre, A. Marrel,544

A. Meynaoui, B. L. Nelson, F. Monari, R. Oomen, O. Rakovec, B. Ramos, O. Roustant, E. Song, J.545

Staum, R. Sueur, T. Touati, and F. Weber. sensitivity: Global Sensitivity Analysis of Model Outputs.546

R package version 1.22.1, 2020.547

[42] A. Puy, S. Lo Piano, and A. Saltelli. “A sensitivity analysis of the PAWN sensitivity index”. Envi-548

ronmental Modelling and Software 127 (2020), 104679. doi:10.1016/j.envsoft.2020.104679.549

arXiv: 1904.04488.550

[43] A. B. Owen. “Randomly permuted (t, m, s)-nets and (t, s)-sequences”. Monte Carlo and Quasi-Monte551

Carlo Methods in Scientiﬁc Computing. Lecture Notes in Statistics, vol. 106. 1995, 299–317.552

[44] G. E. P. Box, J. S. Hunter, and W. G. Hunter. Statistics for Experimenters: Design, Innovation, and553

Discovery. Wiley, 2005.554

[45] V. Pareto. Manuale di Economia Politica. Vol. 13. Societa Editrice, 1906.555

[46] G. E. P. Box and R. D. Meyer. “An analysis for unreplicated fractional factorials”. Technometrics556

28.1 (1986), 11–18. doi:10.1080/00401706.1986.10488093.557

23

[47] I. Azzini, G. Listorti, T. A. Mara, and R. Rosati. “Uncertainty and Sensitivity Analysis for policy558

decision making. An introductory guide”. Luxembourg, 2020.559

[48] H. Liu, W. Chen, and A. Sudjianto. “Relative entropy based method for probabilistic sensitivity560

analysis in engineering design”. Journal of Mechanical Design, Transactions of the ASME 128.2561

(2006), 326–336. doi:10.1115/1.2159025.562

[49] E. Borgonovo. “A new uncertainty importance measure”. Reliability Engineering and System Safety563

92.6 (2007), 771–784. doi:10.1016/j.ress.2006.04.015.564

[50] F. Pianosi and T. Wagener. “A simple and eﬃcient method for global sensitivity analysis based on565

cumulative distribution functions”. Environmental Modelling and Software 67 (2015), 1–11. doi:566

10.1016/j.envsoft.2015.01.004.567

[51] F. Pianosi and T. Wagener. “Distribution-based sensitivity analysis from a generic input-output568

sample”. Environmental Modelling and Software 108 (2018), 197–207. doi:10.1016/j.envsoft.569

2018.07.019.570

[52] A. Puy. R code of "A comprehensive comparison of total-order estimators for global sensitivity571

analysis". 2020. doi:10.5281/zenodo.4946559.572

[53] A. Puy, S. L. Piano, A. Saltelli, and S. A. Levin. “sensobol: an R package to compute variance-based573

sensitivity indices” (Jan. 2021). arXiv: 2101.10103.574

24

A comprehensive comparison of total-order

estimators for global sensitivity analysis

Supplementary Materials

Arnald Puy⇤1,2, William Becker3, Samuele Lo Piano

4, and Andrea

Saltelli2,3

1Department of Ecology and Evolutionary Biology, M31 Guyot Hall, Princeton University,

New Jersey 08544, USA. E-Mail: apuy@princeton.edu

2Centre for the Study of the Sciences and the Humanities (SVT), University of Bergen,

Parkveien 9, PB 7805, 5020 Bergen, Norway.

3European Commission, Joint Research Centre, Via Enrico Fermi, 2749, 21027 Ispra VA,

Italy

4University of Reading, School of the Built Environment, JJ Thompson Building,

Whiteknights Campus, Reading, RG6 6AF, United Kingdom

Contents

1 Razavi and Gupta’s estimator (VARS) 2

2 Figures 3

⇤Corresponding author

1

1 Razavi and Gupta’s estimator (VARS)

Unlike the other total-order estimators examined in our paper, Razavi and

Gupta’s VARS (for Variogram Analysis of Response Surfaces [1,2]) relies on

the variogram (.) and covariogram C(.) functions to compute what they call

the VARS-TO, for VARS Total-Order index.

Let us consider a function of factors x=(x1,x

2,...,x

k)2Rk.IfxAand

xBare two generic points separated by a distance h, then the variogram is

calculated as

(xAxB)=1

2V[y(xA)y(xB)] (1)

and the covariogram as

C(xAxB)=COV [y(xA),y(xB)] (2)

Note that

V[y(xA)y(xB)] = V[y(xA)] + V[y(xB)] 2COV [y(xA),y(xB)] (3)

and since V[y(xA)] = V[y(xB)], then

(xAxB)=V[y(x)] C(xA,xB) (4)

In order to obtain the total-order e↵ect Ti, the variogram and covariogram

are computed on all couples of points spaced hialong the xiaxis, with all other

factors being kept ﬁxed. Thus equation 4 becomes

x⇤

⇠i(hi)=V(y|x⇤

⇠i)Cx⇤

⇠i(hi) (5)

where x⇤

⇠iis a ﬁxed point in the space of non-xi. Razavi and Gupta [1,

2] suggest to take the mean value across the factors’ space on both sides of

equation 5, thus obtaining

Ex⇤

⇠i⇥x⇤

⇠i(hi)⇤=Ex⇤

⇠i[V(y|x⇤

⇠i)] Ex⇤

⇠i⇥Cx⇤

⇠i(hi)⇤(6)

which can also be written as

Ex⇤

⇠i⇥x⇤

⇠i(hi)⇤=V(y)TiEx⇤

⇠i⇥Cx⇤

⇠i(hi)⇤(7)

and therefore

Ti=Ex⇤

⇠i⇥x⇤

⇠i(hi)⇤+Ex⇤

⇠i⇥Cx⇤

⇠i(hi)⇤

V(y)(8)

The sampling scheme for VARS does not rely on A,B,A(i)

B... matrices, but

on star centers and cross sections. Star centers are Nrandom points sampled

across the input space. For each of these stars, kcross sections of points spaced

hapart are generated, including and passing through the star center. Overall,

the computational cost of VARS amounts to Nt=N[k((1/h)1) + 1].

2

2 Figures

Monte−Carlo

Quasi Monte−Carlo

0.0 0.5 1.0 0.0 0.5 1.0

0.00

0.25

0.50

0.75

1.00

X1

X2

Figure S1: Examples of Monte-Carlo and Quasi Monte-Carlo sampling in two dimensions.

N= 200.

3

∑

i=1

k

Si

1

k ∑

i=1

k(Ti>0.05)

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

0

50

100

150

200

Proportion

Count

Figure S2: Proportion of the total sum of ﬁrst-order e↵ects and of the active model inputs

(deﬁned as Ti>0.05) after 1000 random metafunction calls with k2(3,100). Note how

the sum of ﬁrst-order e↵ects clusters around 0.8 (thus evidencing the production of non-

additivities) and how, on average, the number of active model inputs revolves around 10–20%,

thus reproducing the Pareto principle.

0.0

0.2

0.4

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17

Sobol' index

Sobol' indices SiTi

Figure S3: Sobol’ Tiindices obtained after a run of the metafunction with the following

parameter settings: N=10

4,k=17,k2=0.5, k3=0.2, "= 666. The error bars reﬂect

the 95% conﬁdence intervals after bootstrapping (R=10

2). The indices have been computed

with the Jansen [3] estimator.

4

Razavi and Gupta

Janon/Monod

Jansen

Azzini and Rosati

Glen and Isaacs

Saltelli

Homma and Saltelli

pseudo−Owen

0.00 0.05 0.10 0.15

1

N ∑

v=1

N

rv<0

Figure S4: Proportion of model runs yielding r<0.

Jansen

pseudo−Owen

Saltelli

Azzini and Rosati

Glen and Isaacs

Homma and Saltelli

Janon/Monod

0 500 1000 0 500 1000 0 500 1000

0 500 1000

0

50

100

0

50

100

Nt

k

−1.00−0.75−0.50−0.25

r

Figure S5: Scatter of the total number of model runs Ntagainst the function dimensionality

konly for r<0.

5

Saltelli

Razavi and Gupta

pseudo−Owen

Jansen

Janon/Monod

Homma and Saltelli

Glen and Isaacs

Azzini and Rosati

10 30 100 300

−1

0

1

−1

0

1

−1

0

1

−1

0

1

−1

0

1

−1

0

1

−1

0

1

−1

0

1

Ntk

r

a

Saltelli

Razavi and Gupta

pseudo−Owen

Jansen

Janon/Monod

Homma and Saltelli

Glen and Isaacs

Azzini and Rosati

10 30 100 300

10−2

100

102

104

10−2

100

102

104

10−2

100

102

104

10−2

100

102

104

10−2

100

102

104

10−2

100

102

104

10−2

100

102

104

10−2

100

102

104

Ntk

MAE

b

Figure S6: Scatterplot of the correlation between Tiand

ˆ

Ti(r) against the number of model

runs allocated per model input (Nt/k).

6

1

10

100

1000

10 30 50 70 90 110 130 150 170 190 210 230 250 270 310

Mean Nt/k ratio

Nº of simulations

Estimator

Azzini and Rosati Glen and Isaacs

Homma and Saltelli Janon/Monod

Jansen pseudo−Owen

Razavi and Gupta Saltelli

Figure S7: Bar plot with the number of simulations conducted in each of the Nt/k compar-

ments assessed. All estimators have approximately the same number of simulations in each

compartment.

7

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

−0.25

0.00

0.25

0.50

0.75

1.00

−0.25

0.00

0.25

0.50

0.75

1.00

Value

r

Azzini and Rosati

Figure S8: Scatterplots of the model inputs against the model output. The red dots show the

mean value in each bin (we have set the number of bins arbitrarily at 30).

8

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

−0.25

0.00

0.25

0.50

0.75

1.00

−0.25

0.00

0.25

0.50

0.75

1.00

Value

r

Glen and Isaacs

Figure S9: Scatterplots of the model inputs against the model output. The red dots show the

mean value in each bin (we have set the number of bins arbitrarily at 30).

9

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

Value

r

Homma and Saltelli

Figure S10: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

10

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

−0.25

0.00

0.25

0.50

0.75

1.00

−0.25

0.00

0.25

0.50

0.75

1.00

Value

r

Janon/Monod

Figure S11: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

11

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

−0.25

0.00

0.25

0.50

0.75

1.00

−0.25

0.00

0.25

0.50

0.75

1.00

Value

r

Jansen

Figure S12: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

12

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

−0.5

0.0

0.5

1.0

−0.5

0.0

0.5

1.0

Value

r

pseudo−Owen

Figure S13: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

13

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Value

r

Razavi and Gupta

Figure S14: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

14

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

Value

r

Saltelli

Figure S15: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

15

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

0.001

0.010

0.100

0.001

0.010

0.100

Value

MAE

Azzini and Rosati

Figure S16: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

16

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

0.01

0.03

0.10

0.30

0.01

0.03

0.10

0.30

Value

MAE

Glen and Isaacs

Figure S17: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

17

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

1e−01

1e+01

1e+03

1e−01

1e+01

1e+03

Value

MAE

Homma and Saltelli

Figure S18: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

18

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

0.001

0.010

0.100

0.001

0.010

0.100

Value

MAE

Janon/Monod

Figure S19: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

19

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

0.001

0.010

0.100

1.000

0.001

0.010

0.100

1.000

Value

MAE

Jansen

Figure S20: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

20

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

0.03

0.10

0.30

1.00

3.00

0.03

0.10

0.30

1.00

3.00

Value

MAE

pseudo−Owen

Figure S21: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

21

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

1e−01

1e+02

1e+05

1e−01

1e+02

1e+05

Value

MAE

Razavi and Gupta

Figure S22: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

22

φ

δ

τ

k2

k3

ε

2 4 6 8 1.0 1.5 2.0 1.0 1.5 2.0

0.3 0.4 0.5 0.1 0.2 0.3 0 100 200

1e−01

1e+01

1e+03

1e+05

1e−01

1e+01

1e+03

1e+05

Value

MAE

Saltelli

Figure S23: Scatterplots of the model inputs against the model output. The red dots show

the mean value in each bin (we have set the number of bins arbitrarily at 30).

23

References

[1] S. Razavi and H. V. Gupta. “A new framework for comprehensive, robust,

and eﬃcient global sensitivity analysis: 2. Application”. Water Resources

Research 52.1 (Jan. 2016), 440–455. doi:10.1002/2015WR017558. arXiv:

2014WR016527 [10.1002].

[2] S. Razavi and H. V. Gupta. “A new framework for comprehensive, robust,

and eﬃcient global sensitivity analysis: 1. Theory”. Water Resources Re-

search 52.1 (Jan. 2016), 423–439. doi:10.1002/2015WR017559.

[3] M. Jansen. “Analysis of variance designs for model output”. Computer

Physics Communications 117.1-2 (Mar. 1999), 35–43. doi:10.1016/S0010-

4655(98)00154-4.

24