Figure 3 - uploaded by Korbinian Breinl

Content may be subject to copyright.

# Box plots of percentage relative error for MS, MA, MM and all candidate models, with Wakeby-3 as parent model. Symbols as in Figure 1.

Source publication

This study compares model averaging and model selection methods to estimate design floods, while accounting for the observation error that is typically associated with annual maximum flow data. Model selection refers to methods where a single distribution function is chosen based on prior knowledge or by means of selection criteria. Model averaging...

## Similar publications

The system identification toolbox in MATLAB has been successfully used to compare model identification of a first order system subjected to high and low disturbances. The model structures used are FIR, ARX, AMX, OE and BJ. The obtained Model was validated using data generated from the actual process. It shows that the more the variance of the noise...

## Citations

... The stacking model also showed good performance, especially because the WKS data varied greatly in the different numerical ranges. Furthermore, the stacking model utilizes a more flexible and extensive selection and combination of base classifiers [15,34,35]. In conclusion, the ensemble model for the WKS prediction (i.e., the stacking model) supersedes the three other base models despite the limitations of the dataset size and the considerable differences in the spatial distribution of each sample because of geographic, economic, and political factors. ...

The frequent occurrence of extreme weather and the development of urbanization have led to the continuously worsening climate-related disaster losses. Socioeconomic exposure is crucial in disaster risk assessment. Social assets at risk mainly include the buildings, the machinery and the equipment, and the infrastructure. In this study, the wealth capital stock (WKS) was selected as an indicator for measuring social wealth. However, the existing WKS estimates have not been gridded accurately, thereby limiting further disaster assessment. Hence, the multisource remote sensing and the POI data were used to disaggregate the 2012 prefecture-level WKS data into 1000 m × 1000 m grids. Subsequently, ensemble models were built via the stacking method. The performance of the ensemble models was verified by evaluating and comparing the three base models with the stacking model. The stacking model attained more robust prediction results (RMSE = 0.34, R2 = 0.9025), and its prediction spatially presented a realistic asset distribution. The 1000 m × 1000 m WKS gridded data produced by this research offer a more reasonable and accurate socioeconomic exposure map compared with existing ones, thereby providing an important bibliography for disaster assessment. This study may also be adopted by the ensemble learning models in refining the spatialization of the socioeconomic data.

... Model uncertainties can be tackled by using all candidate probability distributions for estimating design floods [13], where the final estimate is an average of all individual estimates. The model averaging can be performed either by taking the arithmetic mean of the design floods from the candidate probabilistic models (MM), or by attributing weights to the design floods of each individual candidate probability distribution (MA), depending on how best the probability model fits the data. ...

... The model averaging can be performed either by taking the arithmetic mean of the design floods from the candidate probabilistic models (MM), or by attributing weights to the design floods of each individual candidate probability distribution (MA), depending on how best the probability model fits the data. A modification of the arithmetic averaging method, presented in Okoli et al. [13] was proposed by Bento et al. [14]. Such method consists of considering only the distributions that ensure good performance on both goodness-of-fit tests and graphical methods in the definition of the design floods (hereinafter referred to as modified MM). ...

The scouring phenomenon can pose a serious threaten to bridge serviceability and users' safety, as well. In extreme circumstances, it can lead to the bridge's structural collapse. Despite efforts to reduce the scour's unfavorable effects in the vicinity of bridge foundations, this issue remains a significant challenge. Many uncertainties affect the design process of bridge foundations, namely the associated hydrological and hydraulic parameters. Past and recent flood records strengthen bridges' vulnerability by reducing scouring estimation uncertainties. Therefore, the present study applies a semi-quantitative methodology of scour risk assessmentto a Portuguese bridge case study, accounting for those uncertainties. The risk-based methodology comprises three main steps towards the assignment of the bridge's scour risk rating. The methodology constitutes a potential key tool for risk management activities, assisting bridge's owners and managers in decision-making.

... Although it appears to be an easy and well-established task, this procedure may be associated with a high degree of uncertainty (Salinas et al. 2014a(Salinas et al. , 2014b. Several factors can affect the performance of this analysis, such as the quality of the observed data, the choice of the sampling technique (annual maximum series (AMS) or peak-over-threshold (POT)), and the selection of a suitable probability model and its parameter estimation methodology (Gaume 2018, Okoli et al. 2018. Regarding model selection, many probability distributions are used to represent hydrological extreme events, with no general agreement on the best model , Nguyen et al. 2017, Papalexiou 2018). ...

... The common practice is to select a suitable distribution according to its performance on one or more goodness-of-fit statistical tests that assess its descriptive ability (Nguyen et al. 2017, Okoli et al. 2018. This approach, however, is associated with two main issues. ...

The popular approach to select a suitable distribution to characterize extreme rainfall events relies on the assessment of its descriptive performance. This study examines an alternative approach to this task that evaluates, in addition to the descriptive performance of the models, their performance in estimating out-of-sample events (predictive performance). With a numerical experiment and a study case in São Paulo state, Brazil, we evaluated the adequacy of seven probability distributions widely used in hydrological analysis to characterize extreme events in the region and compared the selection process of both popular and altenative frameworks. The results indicate that (1) the popular approach is not capable of selecting distributions with good predictive performance and (2) combining different predictive and descriptive tests can improve the reliability of extreme event prediction. The proposed framework allowed the assessment of model suitability from a regional perspective, identifying the Generalized Extreme Value (GEV) distribution as the most adequate to characterize extreme rainfall events in the region.

... Lin and Kuo (2016) state that AMA is appropriate to use if all the candidate models have similar prediction powers. Okoli et al. (2018) in a study of estimating designs associated with flooding obtained the same performance for AMA and weighted MA when the AIC values of candidate model were almost similar. ...

... In multi-model ensemble learning, outputs received from multiple classifiers are combined to improve the classification accuracy. The ensemble developed by model averaging addresses the issue of uncertainty in the choice of probability distribution functions by combining all model estimates (Okoli et al. 2018). Model averaging technique is used by several researchers to demonstrate its use in dealing with model structure uncertainty (Bodo and Unny 1976;Tung and Mays 1981a, b;Laio et al. 2009;Najafi et al. 2011;Najafi and Moradkhani 2015;Yan and Moradkhani 2016). ...

Avalanche forecasting is carried out using physical as well as statistical models. All these models have certain limitations associated with their mathematical formulation that enable them to perform variably with respect to forecast of an avalanche event and associated danger. To overcome limitations of each individual model, a multi-model decision support system (MM-DSS) has been developed for forecasting of avalanche danger in Chowkibal–Tangdhar (C-T) region of North-West Himalaya. The MM-DSS has been developed for two different altitude zones of the C-T region by integrating four avalanche forecasting models-Hidden Markov model (HMM), nearest neighbour (NN), artificial neural network (ANN) and snow cover model-HIM-STRAT to deliver avalanche forecast with a lead time of three days. Weather variables for these models have been predicted using ANN. Root mean square error of predicted weather variables is computed by using leave one out cross-validation method. Snow and meteorological data of 22 winters (1992–2014) of the lower C-T region and 8 winters (2008–2016) of the higher C-T region have been used to develop avalanche forecasting models for these two sub-regions. All the avalanche forecasting models have been validated by true skill score (TSS), Heidke skill score (HSS), per cent correct (PC), probability of detection (POD), bias and false alarm rate (FAR) using data of five winters (2014–19) for the lower C-T region and three winters (2016–19) for the upper C-T region. In both the C-T regions, for day-1, day-2 and day-3, the HSS of MM-DSS lies between 0.26 and 0.4 and the POD between 0.64 and 0.86.

... Statistical tests are commonly used to select the model that best fits the given time series data from a number of candidate probabilistic distribution models. The selection of a single best distribution model (i.e., model selection, MS) represents an implicit assumption that the selected model can adequately describe the frequency of the observed and future flood flow events (Okoli et al., 2018). Despite the well-established practice of using model selection (MS) in the field of flood frequency analysis, the technique itself does not take into account the inherent uncertainties (Okoli et al., 2019). ...

... Despite the good performance of the Gamma (2p) distribution, the current study also considered a modification of the arithmetic average (MM method, Okoli et al., 2018) of the considered candidate models in order to reduce the model uncertainty. The available record length of only 31 hydrological years, the non-usual adoption of the Gamma (2p) distribution for flood peaks (Rizwan et al., 2018) and the good performance of the other three distribution models highlighted the need of the herein proposed hydrological methodology. ...

... In the present investigation, a modification of the original MM method (Okoli et al., 2018), is introduced and proposed, which consists of considering only the distributions that ensured good performance on both goodness-of-fit tests and graphical methods. The modified MM method considered the arithmetic mean of the design flood estimates from the aforementioned four probabilistic distributions: the two-and three-parameter LogNormal (2p and 3p), Gumbel and Gamma (2p). ...

Understanding the risks associated with the likelihood of extreme events and their respective consequences for the stability of hydraulic infrastructures is essential for flood forecasting and engineering design purposes. Accordingly, a hydrological methodology for providing reliable estimates of extreme discharge flows approaching hydraulic infrastructures was developed. It is composed of a preliminary assessment of missing data, quality and reliability for statistically assessing the frequency of flood flows, allied to parametric and non-parametric methods. Model and parameter uncertainties are accounted for by the introduced and proposed modified model averaging (modified MM) approach in the extreme hydrological event's prediction. An assessment of the parametric methods accuracy was performed by using the non-parametric Kernel Density Estimate (KDE) as a benchmark model. For demonstration and validity purposes, this methodology was applied to estimate the design floods approaching the case study ‘new Hintze Ribeiro bridge’, located in the Douro river, one of the three main rivers in Portugal, and having one of Europe's largest river flood flows. Given the obtained results, the modified MM is considered a better estimation method.

... Hydrological modeling aims at obtaining a reliable estimate of extreme discharge flows and their occurrence probabilities [35] that might occur at a given location, namely at a bridge site. To estimate such extreme discharge flows (hereinafter referred as "design floods"), statistical methods, generally referred as "flood frequency analysis", are commonly considered [36-38, and references therein]. ...

... The selection of a single best distribution function (i.e. the model selection, MS) in the field of flood frequency analysis, represents an implicit assumption that the selected model can adequately describe the frequency of observed and future floods, including the extreme events [35]. Nevertheless, any model faces uncertainty and its quantification is crucial for ensuring data quality and usability [45]. ...

... Nevertheless, any model faces uncertainty and its quantification is crucial for ensuring data quality and usability [45]. According to Okoli et al. [35], one of the possibilities to deal with model uncertainty is to use all candidate probability distributions for the estimation of the design floods, where the final estimate is an average of all individual estimates, known as model averaging [46,47]. The model averaging can be performed either by taking the arithmetic mean of the design flood estimates from the candidate probabilistic models (i.e. the arithmetic model averaging, MM) or by attributing weights to the design floods of each individual candidate probability distribution (i.e. the weighted model averaging, MA) depending on how best the probability model fits the data [48,49]. ...

The collapse of bridges inevitably leads to economical losses and may also be responsible for human fatalities. A bridge may fail due to several reasons, with local scouring around its foundation being the most common. Despite decades of scouring research, there are still many uncertainties affecting the design process of bridge piers. The most critical and least explored are the hydrological and hydraulic variables. The recent intensification of floods may also increase the vulnerability of bridges to scour effects. Therefore, the present work aims to propose a risk-based methodology for considering scour at bridge foundations. It is composed of three main steps: (i) assessing extreme hydrological events (hazards); (ii) modeling river behavior through the computation of flow characteristics and bridge scour depths; and (iii) assessing bridge scour risk by associating its scour depth to foundation depth ratio with the priority factor (vulnerability) and assigning a qualitative evaluation of the scour risk rating (level of risk). The hydrological modeling incorporates uncertainty with an averaging approach in the design floods definition. The flow characteristics are simulated with the HEC-RAS model, which also contains a scour module for bridge scour assessment. However, other empirical estimates are considered for simple and pile-supported foundations. This study ends with a qualitative assessment of how the scouring phenomenon affects bridge vulnerability and its safety. The proposed risk-based methodology - validated through a case study, the new Hintze Ribeiro bridge in Portugal - can be potentially incorporated into regular bridge inspection schedules as a useful tool for risk management measures, assisting in catastrophic events’ prevention.

... Among the methods discussed therein that are appropriate for probabilistic hydrological modelling are PDF combination methods. Simple PDF averaging has been exploited to some degree in hydrological contexts (see e.g., Okoli et al. 2018). ...

This thesis falls into the scientific areas of stochastic hydrology, hydrological modelling and hydroinformatics. It contributes with new practical solutions, new methodologies and large-scale results to predictive modelling of hydrological processes, specifically to solving two interrelated technical problems with emphasis on the latter. These problems are:
(A) hydrological time series forecasting by exclusively using endogenous predictor variables (hereafter, referred to simply as “hydrological time series forecasting”); and
(B) stochastic process-based modelling of hydrological systems via probabilistic post-processing (hereafter, referred to simply as “probabilistic hydrological post-processing”).
For the investigation of these technical problems, the thesis forms and exploits a novel predictive modelling and benchmarking toolbox. This toolbox is consisted of:
(i) approximately 6 000 hydrological time series (sourced from larger freely available datasets),
(ii) over 45 ready-made automatic models and algorithms mostly originating from the four major families of stochastic, (machine learning) regression, (machine learning) quantile regression, and conceptual process-based models,
(iii) seven flexible methodologies (which together with the ready-made automatic models and algorithms consist the basis of our modelling solutions), and
(iv) approximately 30 predictive performance evaluation metrics.
Novel model combinations coupled with different algorithmic argument choices result in numerous model variants, many of which could be perceived as new methods. All the utilized models (i.e., the ones already available in open software, as well as those automated and proposed in the context of the thesis) are flexible, computationally convenient and fast; thus, they are appropriate for large-sample (even global-scale) hydrological investigations. Such investigations are implied by the (mainly) algorithmic nature of the methodologies of the thesis. In spite of this nature, the thesis also provides innovative theoretical supplements to its practical and methodological contribution.
Technical problem (A) is examined in four stages. During the first stage, a detailed framework for assessing forecasting techniques in hydrology is introduced. Complying with the principles of forecasting and contrary to the existing hydrological (and, more generally, geophysical) time series forecasting literature (in which forecasting performance is usually assessed within case studies), the introduced framework incorporates large-scale benchmarking. The latter relies on big hydrological datasets, large-scale time series simulation by using classical stationary stochastic models, many automatic forecasting models and algorithms (including benchmarks), and many forecast quality metrics. The new framework is exploited (by utilizing part of the predictive modelling and benchmarking toolbox of the thesis) to provide large-scale results and useful insights on the comparison of stochastic and machine learning forecasting methods for the case of hydrological time series forecasting at large temporal scales (e.g., the annual and monthly ones), with emphasis on annual river discharge processes. The related investigations focus on multi-step ahead forecasting.
During the second stage of the investigation of technical problem (A), the work conducted during the previous stage is expanded by exploring the one-step ahead forecasting properties of its methods, when the latter are applied to non-seasonal geophysical time series. Emphasis is put on the examination of two real-world datasets, an annual temperature dataset and an annual precipitation dataset. These datasets are examined in both their original and standardized forms to reveal the most and least accurate methods for long-run one-step ahead forecasting applications, and to provide rough benchmarks for the one-year ahead predictability of temperature and precipitation.
The third stage of the investigation of technical problem (A) includes both the examination-quantification of predictability of monthly temperature and monthly precipitation at global scale, and the comparison of a large number of (mostly stochastic) automatic time series forecasting methods for monthly geophysical time series. The related investigations focus on multi-step ahead forecasting by using the largest real-world data sample ever used so far in hydrology for assessing the performance of time series forecasting methods.
With the fourth (and last) stage of the investigation of technical problem (A), the multiple-case study research strategy is introduced −in its large-scale version− as an innovative alternative to conducting single- or few-case studies in the field of geophysical time series forecasting. To explore three sub-problems associated with hydrological time series forecasting using machine learning algorithms, an extensive multiple-case study is conducted. This multiple-case study is composed by a sufficient number of single-case studies, which exploit monthly temperature and monthly precipitation time series observed in Greece. The explored sub-problems are lagged variable selection, hyperparameter handling, and comparison of machine learning and stochastic algorithms.
Technical problem (B) is examined in three stages. During the first stage, a novel two-stage probabilistic hydrological post-processing methodology is developed by using a theoretically consistent probabilistic hydrological modelling blueprint as a starting point. The usefulness of this methodology is demonstrated by conducting toy model investigations. The same investigations also demonstrate how our understanding of the system to be modelled can guide us to achieve better predictive modelling when using the proposed methodology.
During the second stage of the investigation of technical problem (B), the probabilistic hydrological modelling methodology proposed during the previous stage is validated. The validation is made by conducting a large-scale real-world experiment at monthly timescale. In this experiment, the increased robustness of the investigated methodology with respect to the combined (by this methodology) individual predictors and, by extension, to basic two-stage post-processing methodologies is demonstrated. The ability to “harness the wisdom of the crowd” is also empirically proven.
Finally, during the third stage of the investigation of technical problem (B), the thesis introduces the largest range of probabilistic hydrological post-processing methods ever introduced in a single work, and additionally conducts at daily timescale the largest benchmark experiment ever conducted in the field. Additionally, it assesses several theoretical and qualitative aspects of the examined problem and the application of the proposed algorithms to answer the following research question: Why and how to combine process-based models and machine learning quantile regression algorithms for probabilistic hydrological modelling?

... This is not a peculiarity of the examined records but a generalized statistical effect (Koutsoyiannis and Baloutsos 2000). We also applied model selection using the Akaike information criterion (AIC c ) for short sample sizes (e.g., Burnham and Anderson 2004;Okoli et al. 2018) for the Gumbel and GEV distributions. The AIC c analysis can be found in appendix B. Based on the analysis of AIC c and the studies by Koutsoyiannis (2004a,b) and Koutsoyiannis and Baloutsos (2000), we used the GEV distribution for all stations (periods, durations, area sizes). ...

We estimate areal reduction factors (ARFs, the ratio of catchment rainfall and point rainfall) varying in space and time using a fixed-area method for Austria and link them to the dominating rainfall processes in the region. We particularly focus on two sub-regions in the West and East of the country, where stratiform and convective rainfall processes dominate, respectively. ARFs are estimated using a rainfall dataset of 306 rain gauges with hourly resolution for five durations between 1 hour and 1 day. Results indicate that the ARFs decay faster with area in regions of increased convective activity than in regions dominated by stratiform processes. Low ARF values occur where and when lightening activity (as a proxy for convective activity) is high, but some areas with reduced lightning activity exhibit also rather low ARFs as, in summer, convective rainfall can occur in any part of the country. ARFs tend to decrease with increasing return period, possibly because the contribution of convective rainfall is higher. The results of this study are consistent with similar studies in humid climates, and provide new insights regarding the relationship of ARFs and dominating rainfall processes.

... Liu and Kuo (2016) state that AMA is appropriate to use if all the candidate models have similar prediction powers. Okoli et al. (2018) in a study of estimating designs associated with flooding obtained the same performance for AMA and weighted MA when the Akaike's information criterion (AIC) values of candidate models were almost similar. ...