PreprintPDF Available

The battle of total-order sensitivity estimators

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Sensitivity analysis helps decision-makers to understand how a given model output responds when there is variation in the model inputs. One of the most authoritative measures in global sensitivity analysis is the Sobol' total-order index ($T_i$), which can be computed with several different estimators. Although previous comparisons exist, it is hard to know which estimator performs best since the results are contingent on several benchmark settings: the sampling method ($\tau$), the distribution of the model inputs ($\phi$), the number of model runs ($N_t$), the test function or model ($\varepsilon$) and its dimensionality ($k$), the weight of higher order effects (e.g. second, third, $k_2,k_3$), or the performance measure selected ($\delta$). Here we break these limitations and simultaneously assess all total-order estimators in an eight-dimension hypercube where $(\tau, \phi, N_t, \varepsilon, k, k_2, k_3, \delta)$ are treated as random parameters. This design allows to create an unprecedentedly large range of benchmark scenarios. Our results indicate that, in general, the preferred estimator should be Razavi and Gupta's, followed by that of Jansen, or Janon/Monod. The remainder lag significantly behind in performance. Our work helps analysts navigate the myriad of total-order formulae by effectively eliminating the uncertainty in the selection of the best estimator.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Variance-based sensitivity indices have established themselves as a reference among practitioners of sensitivity analysis of model outputs. A variance-based sensitivity analysis typically produces the first-order sensitivity indices S_j and the so-called total-effect sensitivity indices T_j for the uncertain factors of the mathematical model under analysis. Computational cost is critical in sensitivity analysis. This cost depends upon the number of model evaluations needed to obtain stable and accurate values of the estimates. While efficient estimation procedures are available for S_j (Tarantola et al., 2006), this availability is less the case for T_j (Iooss and Lemaître, 2015). When estimating these indices, one can either use a sample-based approach whose computational cost depends on the number of factors or use approaches based on meta modelling/emulators (e.g., Gaussian processes). The present work focuses on sample-based estimation procedures for T_j for independent inputs and tests different avenues to achieve an algorithmic improvement over the existing best practices. To improve the exploration of the space of the input factors (design) and the formula to compute the indices (estimator), we propose strategies based on the concepts of economy and explorativity. We then discuss how several existing estimators perform along these characteristics. Numerical results are presented for a set of seven test functions corresponding to different settings (few important factors with low cross-factor interactions, all factors equally important with low cross-factor interactions, and all factors equally important with high cross-factor interactions). We conclude the following from these experiments: a) sample-based approaches based on the use of multiple matrices to enhance the economy are outperformed by designs using fewer matrices but with better explorativity; b) among the latter, asymmetric designs perform the best and outperform symmetric designs having corrective terms for spurious correlations; c) improving on the existing best practices is fraught with difficulties; and d) ameliorating the results comes at the cost of introducing extra design parameters.
Article
Full-text available
Comparison studies of global sensitivity analysis (GSA) approaches are limited in that they are performed on a single model or a small set of test functions, with a limited set of sample sizes and dimensionalities. This work introduces a flexible ‘metafunction’ framework to benchmarking which randomly generates test problems of varying dimensionality and functional form using random combinations of plausible basis functions, and a range of sample sizes. The metafunction is tuned to mimic the characteristics of real models, in terms of the type of model response and the proportion of active model inputs. To demonstrate the framework, a comprehensive comparison of ten GSA approaches is performed in the screening setting, considering functions with up to 100 dimensions and up to 1000 model runs. The methods examined range from recent metamodelling approaches to elementary effects and Monte Carlo estimators of the Sobol’ total effect index. The results give a comparison in unprecedented depth, and show that on average and in the setting investigated, Monte Carlo estimators, particularly the VARS estimator, outperform metamodels. Indicatively, metamodels become competitive at around 10-20 runs per model input, but at lower ratios sampling-based approaches are more effective as a screening tool.
Article
Full-text available
Pandemic politics highlight how predictions need to be transparent and humble to invite insight, not blame. Pandemic politics highlight how predictions need to be transparent and humble to invite insight, not blame.
Article
Full-text available
The PAWN index is gaining traction among the modelling community as a sensitivity measure. However, the robustness to its design parameters has not yet been scrutinized: the size (N) and sampling (ε) of the model output, the number of conditioning intervals (n) or the summary statistic (θ). Here we fill this gap by running a sensitivity analysis of a PAWN-based sensitivity analysis. We compare the results with the design uncertainties of the Sobol’ total-order index (STi∗). Unlike in STi∗, the design uncertainties in PAWN create non-negligible chances of producing biased results when ranking or screening inputs. The dependence of PAWN upon (N,n,ε,θ) is difficult to tame, as these parameters interact with one another. Even in an ideal setting in which the optimum choice for (N,n,ε,θ) is known in advance, PAWN might not allow to distinguish an influential, non-additive model input from a truly non-influential model input.
Article
Full-text available
While the crisis of statistics has made it to the headlines, that of mathematical modelling hasn’t. Something can be learned comparing the two, and looking at other instances of production of numbers.Sociology of quantification and post-normal science can help.
Article
Full-text available
Quantitative modelling is commonly used to assist the policy dimension of sustainability problems. Validation is an important step to make models credible and useful. To investigate existing validation viewpoints and approaches, we analyse a broad academic literature and conduct a survey among practitioners. We find that empirical data plays an important role in the validation practice in all main areas of sustainability science. Qualitative and participatory approaches that can enhance usefulness and public reliability are much less visible. Data-oriented validation is prevalent even when models are used for scenario exploration. Usefulness regarding a given task is more important for model developers than for users. As the experience of modellers and users increases, they tend to better acknowledge the decision makers’ demand for clear communication of assumptions and uncertainties. These findings provide a reflection on current validation practices and are expected to facilitate communication at the modelling and decision-making interface.
Article
Full-text available
Dynamical earth and environmental systems models are typically computationally intensive and highly parameterized with many uncertain parameters. Together, these characteristics severely limit the applicability of Global Sensitivity Analysis (GSA) to high-dimensional models because very large numbers of model runs are typically required to achieve convergence and provide a robust assessment. Paradoxically, only 30 percent of GSA applications in the environmental modelling literature have investigated models with more than 20 parameters, suggesting that GSA is under-utilized on problems for which it should prove most useful. We develop a novel grouping strategy, based on bootstrap-based clustering, that enables efficient application of GSA to high-dimensional models. We also provide a new measure of robustness that assesses GSA stability and convergence. For two models, having 50 and 111 parameters, we show that grouping-enabled GSA provides results that are highly robust to sampling variability, while converging with a much smaller number of model runs.
Article
Full-text available
Complex hydrological models are being increasingly used nowadays for many purposes such as studying the impact of climate and land-use change on water resources. However, building a high-fidelity model, particularly at large scales, remains a challenging task, due to complexities in model functioning and behavior and uncertainties in model structure, parameterization, and data. Global Sensitivity Analysis (GSA), which characterizes how the variation in the model response is attributed to variations in its input factors (e.g., parameters, forcing data), provides an opportunity to enhance the development and application of these complex models. In this paper, we advocate using GSA as an integral part of the modelling process by discussing its capabilities as a tool for diagnosing model structure and detecting potential defects, identifying influential factors, characterizing uncertainty, and selecting calibration parameters. Accordingly, we conduct a comprehensive GSA of a complex land surface-hydrology model, Modélisation Environmentale–Surface et Hydrologie (MESH), which combines the Canadian Land Surface Scheme (CLASS) with a hydrological routing component, WATROUTE. Various GSA experiments are carried out using a new technique, called Variogram Analysis of Response Surfaces (VARS), for alternative hydroclimatic conditions in Canada using multiple criteria, various model configurations, and a full set of model parameters. Results from this study reveal that, in addition to different hydroclimatic conditions and SA criteria, model configurations can also have a major impact on the assessment of sensitivity. GSA can identify aspects of the model internal functioning that are counter-intuitive, and thus, help the modeler to diagnose possible model deficiencies and make recommendations for improving development and application of the model. As a specific outcome of this work, a list of the most influential parameters for the MESH model is developed. This list, along with some specific recommendations, is expected to assist the wide community of MESH and CLASS users, to enhance their modelling applications.
Article
In a previous paper we introduced a distribution-based method for Global Sensitivity Analysis (GSA), called PAWN, which uses cumulative distribution functions of model outputs to assess their sensitivity to the model's uncertain input factors. Over the last three years, PAWN has been employed in the environmental modelling field as a useful alternative or complement to more established variance-based methods. However, a major limitation of PAWN up to now was the need for a tailored sampling strategy to approximate the sensitivity indices. Furthermore, this strategy required three tuning parameters whose optimal choice was rather unclear. In this paper, we present an alternative approximation procedure that tackles both issues and makes PAWN applicable to a generic sample of inputs and outputs while requiring only one tuning parameter. The new implementation therefore allows the user to estimate PAWN indices as complementary metrics in multi-method GSA applications without additional computational cost.
Article
Two types of sampling plans are examined as alternatives to simple random sampling in Monte Carlo studies. These plans are shown to be improvements over simple random sampling with respect to variance for a class of estimators which includes the sample mean and the empirical distribution function.