Figure 1 - uploaded by Korbinian Breinl

Content may be subject to copyright.

# | The framework used for comparing two estimation methods (statistical versus continuous simulation) based on four different types of numerical experiments.

Source publication

We compare statistical and hydrological methods to estimate design floods by proposing a framework that is based on assuming a synthetic scenario considered as ‘truth’ and use it as a benchmark for analysing results. To illustrate the framework, we used probability model selection and model averaging as statistical methods, while continuous simulat...

## Context in source publication

## Similar publications

Robust model estimation aims to estimate the parameters of a given geometric model, and then separate the outliers and inliers belonging to different model instances into different groups based on the estimated parameters. Robust model estimation is a fundamental task in computer vision and artificial intelligence, and mainly contains two component...

Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also lead to other illnesses such as heart attacks, kidne...

In this paper, quantized residual preference is proposed to represent the hypotheses and the points for model selection and inlier segmentation in multi-structure geometric model fitting. First, a quantized residual preference is proposed to represent the hypotheses. Through a weighted similarity measurement and linkage clustering, similar hypothes...

Quantile regression is a statistical method for estimating conditional quantiles of a response variable. In addition, for mean estimation, it is well known that quantile regression is more robust to outliers than $l_2$-based methods. By using the fused lasso penalty over a $K$-nearest neighbors graph, we propose an adaptive quantile estimator in a...

Quantile regression is a statistical method for estimating conditional quantiles of a response variable. In addition, for mean estimation, it is well known that quantile regression is more robust to outliers than $l_2$-based methods. By using the fused lasso penalty over a $K$-nearest neighbors graph, we propose an adaptive quantile estimator in a...

## Citations

... Several algorithms have been proposed to complete the lost data as (a) linear regressions (LR), (b) non-parametric regression (NPR), (c) principal component analysis (PCA), (d) imputation-based methods, and (e) neural network algorithms (NNA), among others. For Okoli et al. (2019), they developed a systematic and comparative review of statistical methods and created a model using synthetic data and assigning an uncertainty parameter and limiting the error, which allows estimating flooding, as well as involving flood risk reduction experts. The loss of data that is captured by the meteorological stations, and the transmission of the same has a probability of loss due to the deficiencies in the sensors and the connection and transmission of data, for this situation has to find a way to establish algorithms that allow to reconstruct the time series of data. ...

The time data series of weather stations are a source of information for floods. The study of the previous wintertime series allows knowing the behavior of the variables and the result that will be applied to analysis and simulation models that feed variables such as flow and level of a study area. One of the most common problems is the acquisition and transmission of data from weather stations due to atypical values and lost data; this generates difficulties in the simulation process. Consequently, it is necessary to propose a numerical strategy to solve this problem. The data source for this study is a real database where these problems are presented with different variables of weather. This study is based on comparing three methods of time series analysis to evaluate a multivariable process offline. For the development of the study, we applied a method based on the discrete Fourier transform (DFT), and we contrasted it with methods such as the average and linear regression without uncertainty parameters to complete missing data. The proposed methodology entails statistical values, outlier detection, and the application of the DFT. The application of DFT allows the time series completion, based on its ability to manage various gap sizes and replace missing values. In sum, DFT led to low error percentages for all the time series (1% average). This percentage reflects what would have likely been the shape or pattern of the time series behavior in the absence of misleading outliers and missing data.

... Generally speaking, common approaches for flood estimation can be categorized into statistical and deterministic (or hydrological) methods as well as combinations thereof (for an overview and evaluation, see e.g. Rogger et al., 2012;Okoli et al., 2019). Statistical approaches are widely used (see e.g. ...

Estimates for rare to very rare floods are limited by the relatively short streamflow records available. Often, pragmatic conversion factors are used to quantify such events based on extrapolated observations, or simplifying assumptions are made about extreme precipitation and resulting flood peaks. Continuous simulation (CS) is an alternative approach that better links flood estimation with physical processes and avoids assumptions about antecedent conditions. However, long-term CS has hardly been implemented to estimate rare floods (i.e. return periods considerably larger than 100 years) at multiple sites in a large river basin to date. Here we explore the feasibility and reliability of the CS approach for 19 sites in the Aare River basin in Switzerland (area: 17 700 km2) with exceedingly long simulations in a hydrometeorological model chain. The chain starts with a multi-site stochastic weather generator used to generate 30 realizations of hourly precipitation and temperature scenarios of 10 000 years each. These realizations were then run through a bucket-type hydrological model for 80 sub-catchments and finally routed downstream with a simplified representation of main river channels, major lakes and relevant floodplains in a hydrologic routing system. Comprehensive evaluation over different temporal and spatial scales showed that the main features of the meteorological and hydrological observations are well represented and that meaningful information on low-probability floods can be inferred. Although uncertainties are still considerable, the explicit consideration of important processes of flood generation and routing (snow accumulation, snowmelt, soil moisture storage, bank overflow, lake and floodplain retention) is a substantial advantage. The approach allows for comprehensively exploring possible but unobserved spatial and temporal patterns of hydrometeorological behaviour. This is of particular value in a large river basin where the complex interaction of flows from individual tributaries and lake regulations are typically not well represented in the streamflow observations. The framework is also suitable for estimating more frequent floods, as often required in engineering and hazard mapping.

... Generally, the characterization of extreme events relies on statistical frequency analysis (Nguyen and Nguyen 2019). This process involves the fit of a suitable probability distribution to the temporal series of the variable of interest (Baratti et al. 2012, Okoli et al. 2019). The fitted model is then extrapolated to associate events of a specific magnitude to a return period, which expresses the average recurrence time of the extreme event, representing its rareness (Volpi 2019). ...

... LP3 is used if the transformed logarithm of the random variable (rainfall) is distributed according to PE3. Both distributions are widely used in hydrological studies (Kao and Govindaraju 2007, Okoli et al. 2019, Liu et al. 2020 and are considered the base method for frequency analysis in the US (Nguyen et al. 2017). ...

The popular approach to select a suitable distribution to characterize extreme rainfall events relies on the assessment of its descriptive performance. This study examines an alternative approach to this task that evaluates, in addition to the descriptive performance of the models, their performance in estimating out-of-sample events (predictive performance). With a numerical experiment and a study case in São Paulo state, Brazil, we evaluated the adequacy of seven probability distributions widely used in hydrological analysis to characterize extreme events in the region and compared the selection process of both popular and altenative frameworks. The results indicate that (1) the popular approach is not capable of selecting distributions with good predictive performance and (2) combining different predictive and descriptive tests can improve the reliability of extreme event prediction. The proposed framework allowed the assessment of model suitability from a regional perspective, identifying the Generalized Extreme Value (GEV) distribution as the most adequate to characterize extreme rainfall events in the region.

... Additionally, according to Michaelides et al. (2009), no theoretical distribution can be considered that can characterize exclusively the rainfall profile. Therefore, the selection of a probability distribution that gives the best fit to the observed rainfall or flood data is an important research topic in the field of statistical hydrology (Okoli, Mazzoleni, Breinl & Di Baldassarre, 2019). ...

Kyrenia region is in the northern part of Northern Cyprus that is environmentally fragile and susceptible to natural disasters. Thus, the study of frequency analysis is essential to find the most suitable model that could detect the region’s risk in certain natural phenomena such as rainfall, flood, and so on. The objective of this research is to determine the best fit probability distribution in the case of average daily rainfall and total rainfall using 22 years of data (1995-2016) from the Kyrenia region in Northern Cyprus by using 37 probability distribution models. The best-fit probability distribution in the case of maximum annual daily rainfall is determined using various distribution types. Three goodness-of-fit test statistics were applied. Beta, Dagum, Wakeby, Paretoa, Log-Pearson 3, Gen. Extreme Value, and Gen. Gamma (4P) showed the largest number of best-fit results. The results of this study can be used to develop more accurate models of flooding risk and damage. Keywords: Distribution function; goodness-of-fit tests; Northern Cyprus; rainfall.

... The selection of a single best distribution model (i.e., model selection, MS) represents an implicit assumption that the selected model can adequately describe the frequency of the observed and future flood flow events (Okoli et al., 2018). Despite the well-established practice of using model selection (MS) in the field of flood frequency analysis, the technique itself does not take into account the inherent uncertainties (Okoli et al., 2019). ...

... The model averaging can be undertaken by using the arithmetic mean of the design flood estimates (i.e., arithmetic model averaging, MM) or by attributing weights to the design floods of each individual candidate probability model (i.e., weighted model averaging, MA), depending on how best each model fits the time series data. According to Okoli et al. (2019), the selection of the best fitting probability distribution is associated with several outliers in comparison with multiple probability distributions, especially when facing short sample sizes (30-50 hydrological years). ...

... In order to take into account the parameter uncertainties, all design floods were estimated at a 95% confidence interval. Contrarily to what performed by Okoli et al. (2019), herein the proposed modified MM method considered solely the contribution of the probability distributions that fitted well the Scenario 3 data, attributing equal contribution of the candidate probabilistic models on the estimation of the design flood events (four of the distribution models analysed). An additional evaluation of the parameter's estimation methods, namely MoM and MLE, was also considered. ...

Understanding the risks associated with the likelihood of extreme events and their respective consequences for the stability of hydraulic infrastructures is essential for flood forecasting and engineering design purposes. Accordingly, a hydrological methodology for providing reliable estimates of extreme discharge flows approaching hydraulic infrastructures was developed. It is composed of a preliminary assessment of missing data, quality and reliability for statistically assessing the frequency of flood flows, allied to parametric and non-parametric methods. Model and parameter uncertainties are accounted for by the introduced and proposed modified model averaging (modified MM) approach in the extreme hydrological event's prediction. An assessment of the parametric methods accuracy was performed by using the non-parametric Kernel Density Estimate (KDE) as a benchmark model. For demonstration and validity purposes, this methodology was applied to estimate the design floods approaching the case study ‘new Hintze Ribeiro bridge’, located in the Douro river, one of the three main rivers in Portugal, and having one of Europe's largest river flood flows. Given the obtained results, the modified MM is considered a better estimation method.

... More generally, the elasticity of extremes may also be used as a benchmark tool for stochastic rainfall-runoff modelling frameworks, for example weather generators coupled with hydrological models (e.g. Bennett et al., 2019;Gao et al., 2020;Müller-Thomy & Sikorska, 2019;Okoli et al., 2019). ...

The aim of this paper is to explore how rainfall mechanisms and catchment characteristics shape the relationship between rainfall and flood probabilities. We propose a new approach of comparing intensity-duration-frequency statistics of maximum annual rainfall with those of maximum annual streamflow in order to infer the catchment behavior for runoff extremes. We calibrate parsimonious intensity-duration-frequency scaling models to data from 314 rain gauges and 428 stream gauges in Austria, and analyze the spatial patterns of the resulting distributions and model parameters. Results indicate that rainfall extremes tend to be more variable in the dry lowland catchments dominated by convective rainfall than in the mountainous catchments where annual rainfall is higher and rainfall mechanisms are mainly orographic. Flood frequency curves are always steeper than the corresponding rainfall frequency curves with the exception of glaciated catchments. Based on the proposed approach of combined intensity-duration-frequency statistics we analyze elasticities as the percent change of flood discharge for a 1% change in extreme rainfall through comparing rainfall and flood quantiles. In wet catchments, the elasticities tend to unity, i.e. rainfall and flood frequency curves have similar steepness, due to persistently high soil moisture levels. In dry catchments, elasticities are much higher, implying steeper frequency curves of floods than those of rainfall, which is interpreted in terms of more skewed distributions of event runoff coefficients. While regional differences in the elasticities can be attributed to both dominating regional rainfall mechanisms and regional catchment characteristics, our results suggest that catchment characteristics are the dominating controls. With increasing return period, elasticities tend towards unity, which is consistent with various runoff generation concepts. Our findings may be useful for process-based flood frequency extrapolation and climate impact studies, and further studies are encouraged to explore the tail behavior of elasticities.

... • Improved extreme water management through a complex modeling approach to better prepare for the impact of extreme high and low flow changes are the best long-and short-term plans and strategies to combat and minimize the risk in water-related sectors of the local economy. The accuracy and reliability of river flow predictions are essential for extreme water management, policy-making, and water allocation as well as integrated and sustainable water resource management practices (disaster management, irrigation management, hydropower regulation, environmental flow) (Meresa, 2019;Okoli et al., 2019). The precision and trustworthiness of river flow model predictions and extreme frequencies are affected by different factors. ...

... In recent decades, various advanced statistical and stochastic approaches have been developed for climate predictions by incorporating the seasonal and annual fluctuations (Breinl et al., 2017;Khazaei & Ahmadi, 2013;Wu et al., 2011). These studies were highly exposed to uncertainty due to the complex stochastic approach (Vesely et al., 2019) and a number of parameters (Breinl et al., 2017;Okoli et al., 2019). The ensemble realization of precipitation and temperature were generated using a statistical approach (Equations 1 and 2) for assessing the input data uncertainty. ...

... The result confirms that the frequency distribution uncertainty range (from three distribution types) also significantly contributes to the flood design magnitude. Okoli et al. (2019) found a similar result. These uncertainty sources play an essential role in water resource management and planning, food security, flood risk reduction, poverty reduction, and biodiversity conservation. ...

Quantifying possible sources of uncertainty in simulations of hydrological extreme events is very important for better risk management in extreme situations and water resource planning. The main objective of this research work is to identify and address the role of input data quality and hydrological parameter sets, and uncertainty propagation in hydrological extremes estimation. This includes identifying and estimating their contribution to flood and low flow magnitude using two objective functions (NSE for flood and LogNSE for low flow), 20,000 Hydrologiska Byråns Vattenbalansavdelning (HBV) hydrological parameter sets, and three frequency distribution models (Log-Normal, Pearson-III, and Generalized Extreme Value). The influence of uncertainty on the simulated flow is not uniform across all the selected three catchments due to different flow regimes and runoff generation mechanisms. The result shows that the uncertainty in high flow frequency modeling mainly comes from the input data quality. In the modeling of low flow frequency, the main contributor to the total uncertainty is model parameterization. The total uncertainty of QT90 (extreme peak flow quantile at 90-year return period) quantile shows that the interaction of input data and hydrological parameter sets have a significant role in the total uncertainty. In contrast, in the QT10 (extreme low flow quantile at 10-year return period) estimation, the input data quality and hydrological parameters significantly impact the total uncertainty. This implies that the primary factors and their interactions may cause considerable risk in water resources management and flood and drought risk management. Therefore, neglecting these factors and their interaction in disaster risk management, water resource planning, and evaluation of environmental impact assessment is not feasible and may lead to considerable risk.
Recommendations for Water Resource Managers
• The role of hydrological parameters and climate input data is significant in flood and low flow estimations and significantly impacts water resources and extremes management.
• Input data dominantly controlled flood magnitude and frequency, whilst the low flow magnitude and frequency were dominantly affected by both input data quality and hydrological parameters.
• It is crucial to consider the main features that cause considerable risk in water resource management and extreme risk management.
• Neglecting the primary factors and their interaction in disaster risk management, water resource planning, and evaluation of environmental impact assessment is not feasible and may lead to considerable risk.
• Improved extreme water management through a complex modeling approach to better prepare for the impact of extreme high and low flow changes are the best long- and short-term plans and strategies to combat and minimize the risk in water-related sectors of the local economy.

... The accuracy and reliability of river flow extreme predictions are essential for extreme water management, policy-making, water allocation, and integrated and sustainable water resource management practices (Meresa 2019;Okoli et al. 2019). Different sources of uncertainty influence the precision and trustworthiness of river flow model predictions and extreme frequencies that among others, the sources of uncertainty may also result from input variables (Bae et al. 2018), conceptual model structures (Mockler et al. 2016;Vetter et al. 2016), model parameters (Ajami et al. 2007), and extreme frequency distributions (Meresa and Romanowicz 2017). ...

... In recent decades, various advanced statistical and stochastic approaches have been developed for climate predictions by incorporating the seasonal and annual fluctuations (Breinl et al. 2017). These studies were highly exposed to uncertainty due to the complex stochastic approach (Vesely et al. 2019) and the number of parameters ( (Breinl et al. 2017;Okoli et al. 2019). Therefore, in this study, for the input uncertainty analysis, ensemble realization of precipitation and temperature were generated using a statistical approach (Eq. ...

... The result confirms that the frequency distribution uncertainty range (from three distribution types) also has a significant contribution to the flood design magnitude. Okoli et al. (2019) found a similar result. These uncertainty sources play a major role in water resource management and planning, food security, flood risk reduction, poverty reduction, and biodiversity conservation. ...

Evaluation of possible sources of uncertainty and their influence on water resource planning and extreme hydrological characteristics are very important for extreme risk reduction and management. The main objective is to identify and holistically address the uncertainty propagation from the input data to the frequency of hydrological extremes. This novel uncertainty estimation framework has four stages that comprise hydrological models, hydrological parameter sets, and frequency distribution types. The influence of uncertainty on the simulated flow is not uniform across all the selected eight catchments due to different flow regimes and runoff generation mechanisms. The result shows that uncertainty in peak flow frequency simulation mainly comes from the input data quality. Whereas, in the low flow frequency, the main contributor to the total uncertainty is model parameterization. The total uncertainty in the estimation of QT90 (extreme peak flow quantile at 90-year return period) quantile shows the interaction of input data and extreme frequency models has significant influence. In contrast, the hydrological models and hydrological parameters have a substantial impact on the QT10 (extreme low flow quantile at 10-year return period) estimation. This implies that the four factors and their interactions may cause significant risk in water resource management and flood and drought risk management. Therefore, neglecting these factors in disaster risk management, water resource planning, and evaluation of environmental impact assessment is not feasible and may lead to significant impact.

... Stochastic WGs were developed in the 1980s with the aim of generating long-enough synthetic weather time series with specific statistical properties to be used in hydrologic models and risk assessment studies, and to extend and simulate weather time series at locations with no observed data (Wilks and Wilby 1999). The long time series of WGs have been used in many applications such as flood type classification (Turkington et al. 2016) and simulation of extreme floods (Okoli et al. 2019, Výleta and Valent 2019, Winter et al. 2019, Sikorska-Senoner et al. 2020 up to the probable maximum flood (Chen et al. 2015), as well as in many climate change studies (Semenov and Barrow 1997, Keller et al. 2017, Steinschneider et al. 2019, Ahn 2020. Many other studies have focused on improving WGs to represent higher statistical moments of both univariate and multivariate statistics of observed weather data (Hayhoe 2000, Hansen and Mavromatis 2001, Kyselý and Dubrovský 2005, Brissette et al. 2007, Chen et al. 2012a. ...

Resampling historical time series remains one of the main approaches used to generate long-term probabilistic streamflow forecasts, while there is a need to develop more flexible approaches taking into account non-stationarities. One possible approach is to use a modelling chain consisting of a stochastic weather generator and a hydrological model. However, the ability of this modelling chain to generate adequate probabilistic streamflows must first be evaluated. The aim of this paper is to compare the performance of a stochastic weather generator against resampling historical meteorological time series in order to produce ensemble streamflow forecasts. The comparison framework is based on 30 years of forecasts for a single Canadian watershed. Forecasts resulting from the two methods are evaluated using the continuous ranked probability score (CRPS) and rank histograms. Results indicate that while there are differences between the methods, they nevertheless perform similarly, thus showing that weather generators can be used as substitutes for resampling the historical past.

... However, it has also been established that the FEMA FIS estimates were all obtained using varying methods (FEMA, 2011(FEMA, , 2013(FEMA, , 2013, which could explain why the other methods in this researched produced estimates either greater or less than those presented in the FIS. Research conducted by Okoli et al. (2019), compared statistical and hydrological methods for the estimation of design floods based on 10,000 years of synthetically generated weather and discharge data. Although their hydrologic modeling did not reflect any real applications and was intended as a baseline for discussion for comparison of results, their ultimate findings suggest that more than one flood estimate should be obtained and the maximum value (within reason) should be selected to minimize the likelihood of underestimating the design flood (Okoli, 2019). ...

... Research conducted by Okoli et al. (2019), compared statistical and hydrological methods for the estimation of design floods based on 10,000 years of synthetically generated weather and discharge data. Although their hydrologic modeling did not reflect any real applications and was intended as a baseline for discussion for comparison of results, their ultimate findings suggest that more than one flood estimate should be obtained and the maximum value (within reason) should be selected to minimize the likelihood of underestimating the design flood (Okoli, 2019). ...

The St. Johns River, located in northeast Florida, USA, is a large watershed characterized by relatively flat topography, porous soils, and increasing urbanization. The city of Jacksonville,
Florida is located near the downstream terminus of the river near the Atlantic Ocean. The lower portion of the watershed located downstream of Lake George is subjected to tidal exchange and
storm surge from tropical storms and hurricanes as well as extra-tropical winter storms. Extreme flood events in the Lower St. Johns River can be caused by rain-driven runoff, high tide, storm surge
or any combination of the three. This study examines the range of potential extreme flood discharges caused by rain-driven runoff within six larger sub-basins located in the Lower St. Johns River with
special emphasis on the Pablo Creek sub-basin. The study uses multiple methods including published flood insurance data, two statistical hydrology methods, and model simulations to estimate an array of flood discharges at varying return frequencies. The study also examines the potential effects on flood discharges from future land use changes and the temporal distribution of rainfall. The
rain-driven flood discharge estimates are then fit to a normal distribution to convey the overall risk and uncertainty associated with the flood estimates. Overall, the study revealed that a wide range of
reasonable rainfall-driven flood estimates are possible using the same data sets. The wide range of estimates will help inform future resiliency projects planned in the study area by providing a more
realistic set of bounds with which planning can proceed. The estimates derived herein for the Pablo Creek sub-basin can be combined with the independent or dependent effects of tide and storm surge in order to characterize the total flood resiliency risk of the region.