## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

Predictive hydrological uncertainty can be quantified by using ensemble methods. If properly formulated, these methods can offer improved predictive performance by combining multiple predictions. In this work, we use 50-year-long monthly time series observed in 270 catchments in the United States to explore the performances provided by an ensemble learning post-processing methodology for issuing probabilistic hydrological predictions. This methodology allows the utilization of flexible quantile regression models for exploiting information about the hydrological model’s error. Its key differences with respect to basic two-stage hydrological post-processing methodologies using the same type of regression models are that (a) instead of a single point hydrological prediction it generates a large number of “sister predictions” (yet using a single hydrological model), and that (b) it relies on the concept of combining probabilistic predictions via simple quantile averaging. A major hydrological modelling challenge is obtaining probabilistic predictions that are simultaneously reliable and associated to prediction bands that are as narrow as possible; therefore, we assess both these desired properties of the predictions by computing their coverage probabilities, average widths and average interval scores. The results confirm the usefulness of the proposed methodology and its larger robustness with respect to basic two-stage post-processing methodologies. Finally, this methodology is empirically proven to harness the “wisdom of the crowd” in terms of average interval score, i.e., the average of the individual predictions combined by this methodology scores no worse –usually better− than the average of the scores of the individual predictions.

To read the full-text of this research,

you can request a copy directly from the authors.

... Indeed, we have recently witnessed the transfer of some notably useful machine learning concepts and methods in the field of probabilistic hydrological forecasting and, more precisely, in its sister field of probabilistic hydrological post-processing. This latter field comprises a wide range of methods for issuing probabilistic hydrological forecasts by using the best-guess outputs of process-based rainfall-runoff models (see, e.g., the review of such methods in Li et al. 2017), with a considerable part of these methods being (i) broadly referred to as methods for estimating the "global uncertainty" or the "total uncertainty" (or simply the "uncertainty") of the various hydrological predictions or simulations (see, e.g., the related definitions in Montanari 2011), and (ii) tested with the process-based rainfall-runoff model being run in simulation mode (see, e.g., the modelling works by Montanari and Brath 2004;Montanari and Grossi 2008;Solomatine and Shrestha 2009;Montanari and Koutsoyiannis 2012;Bourgin et al. 2015;Dogulu et al. 2015;Sikorska et al. 2015;Farmer and Vogel 2016;Bock et al. 2018;Papacharalampous et al. 2019;Tyralis et al. 2019a;Papacharalampous et al. 2020b;Sikorska-Senoner and Quilty 2021;Koutsoyiannis and Montanari 2022;Quilty et al. 2022;Romero-Cuellar et al. 2022). Notably, reviews, overviews and popularizations that focus on the above-referred to as existing and useful machine learning concepts and methods are currently missing from the probabilistic hydrological forecasting literature, despite the large efforts being made towards summarizing and fostering the use of machine learning in hydrology (see, e.g., the reviews by Solomatine and Ostfeld 2008;Raghavendra and Deka 2014;Mehr et al. 2018;Shen 2018;Tyralis et al. 2019b) and in best-guess hydrological forecasting (see, e.g., the reviews by Maier et al. 2010;Sivakumar and Berndtsson 2010;Abrahart et al. 2012;Yaseen et al. 2015;Zhang et al. 2018). ...

... Examples of studies utilizing such datasets to support a successful formulation and selection of skillful probabilistic hydrological forecasting methods also exist. Nonetheless, such examples mostly refer to single-method evaluations (or benchmarking) either in post-processing contexts (e.g., those in Farmer and Vogel 2016;Bock et al. 2018;Papacharalampous et al. 2020b) or in ensemble hydrological forecasting contexts (e.g., those in Pechlivanidis et al. 2020;Girons Lopez et al. 2021). In the latter, process-based rainfall-runoff models are utilized in forecast mode with ensembles of meteorological forecasts in their input to deliver an ensemble of point hydrological forecasts, which collectively constitute the output probabilistic hydrological forecast. ...

... Among them are the "wisdom of the crowd" and the "forecast combination puzzle". The "wisdom of the crowd" can be harnessed through simple averaging (Lichtendahl et al. 2013;Winkler et al. 2019) to increase robustness in probabilistic hydrological postprocessing and forecasting using quantile regression algorithms (see related empirical proofs in Papacharalampous et al. 2020b) or potentially machine and statistical learning algorithms from the remaining families that are summarized in Section 3.1. By increasing robustness, one reduces the risk of delivering poor quality forecasts at every single forecast attempt. ...

Probabilistic forecasting is receiving growing attention nowadays in a variety of applied fields, including hydrology. Several machine learning concepts and methods are notably relevant to formalizing and optimizing probabilistic forecasting implementations by addressing the relevant challenges. Nonetheless, practically-oriented reviews focusing on such concepts and methods are currently missing from the probabilistic hydrological forecasting literature. This absence holds despite the pronounced intensification in the research efforts for benefitting from machine learning in this same literature, and despite the substantial relevant progress that has recently emerged, especially in the field of probabilistic hydrological post-processing, which traditionally provides the hydrologists with probabilistic hydrological forecasting implementations. Herein, we aim to fill this specific gap. In our review, we emphasize key ideas and information that can lead to effective popularizations of the studied concepts and methods, as such an emphasis can support successful future implementations and further scientific developments in the field. In the same forward-looking direction, we identify open research questions and propose ideas to be explored in the future.

... As such, data uncertainty identification [17,[21][22][23][24] and the assessment of its effects on hydrological-hydraulic modelling [25][26][27][28] is needed. Nevertheless, national and/or regional meteorological services rarely report data on the rating curve being used or the alternative for practitioners (and even researchers) depending on the modelling objectives at hand. ...

... Furthermore, it must be emphasised that although GLUE has been used in this study to outline the model prediction bands, the reader is free to use different and perhaps statistically more rigorous approaches (e.g., [2,27,28,59]). The suggested model evaluation approach is not bound to GLUE. ...

... Despite GLUE receiving considerable criticism that has focussed on the subjective assumptions required, particularly in choosing a likelihood measure, avoiding the adoption of a formal error model, and ending up with model prediction bands that are conditional on those subjective assumptions [27,28,56,60,61], the method was implemented in this current study precisely because of this feasibility of incorporating subjective assumptions into the process for estimating model prediction bands. Within this framework, likelihood measures were easily (i.e., subjectively) adapted to account for multi-site evaluation of predictions, as well as for discharge uncertainty. ...

The effect of stage–discharge (H-Q) data uncertainty on the predictions of a MIKE SHE-based distributed model was assessed by conditioning the analysis of model predictions at the outlet of a medium-size catchment and two internal gauging stations. The hydrological modelling was carried out through a combined deterministic–stochastic protocol based on Monte Carlo simulations. The approach considered to account for discharge uncertainty was sta-tistically rather simple and based on (i) estimating the H-Q data uncertainty using prediction bands associated with rating curves; (ii) redefining the traditional concept of residuals to characterise model performance under H-Q data uncertainty conditions; and (iii) calculating a global model performance measure for all gauging stations in the framework of a multi-site (MS) test. The study revealed significant discharge data uncertainties on the order of 3 m3 s-1 for the outlet station and 1.1 m3 s-1 for the internal stations. In general, the consideration of the H-Q data uncertainty and the ap-plication of the MS-test resulted in remarkably better parameterisations of the model capable of simulating a particular peak event that otherwise was overestimated. The proposed model evaluation approach under discharge uncertainty is applicable to modelling conditions differing from the ones used in this study, as long as data uncertainty measures are available.

... As implied by aims 2 − 4 and made e.g., by Krzysztofowicz (1999) and Stedinger et al. (2008) , we herein present toy examples only. A companion paper by Papacharalampous et al. (2019b) is devoted to the validation of the methodology of the study using real-world data. In particular, in this latter paper we address a different set of research questions by conducting a large-scale experiment at monthly time scale. ...

... Segments of the former datasets are used for training the benchmark schemes. In fact, the problems solved by each of the ensemble schemes and its corresponding benchmark seem to be of the same difficulty for toy experiments 1 − 3. We have also tested the prediction schemes using shorter series (see e.g., the investigations of Appendix D and the large-sample experiment in Papacharalampous et al., 2019b ). In that particular case for which the provided historical information is much less, the prediction schemes differentiate with each other in terms of performance. ...

... In the toy problems examined herein the risk of delivering a probabilistic prediction of bad quality for the period T 3 (manifested in the magnitude of the relative improvements presented e.g., in Figs. 7 and 8 ) is much lower than the respective risk that was found to be present in the rainfall-runoff problems examined by Papacharalampous et al. (2019b) . In this latter study the computed relative improvements in terms of average interval score when using the output of the methodology of the study, instead of using each of the individual predictions combined for obtaining this output, range from about − 330% to about 90%. ...

We introduce an ensemble learning post-processing methodology for probabilistic hydrological modelling. This methodology generates numerous point predictions by applying a single hydrological model, yet with different parameter values drawn from the respective simulated posterior distribution. We call these predictions “sister predictions”. Each sister prediction extending in the period of interest is converted into a probabilistic prediction using information about the hydrological model’s errors. This information is obtained from a preceding period for which observations are available, and is exploited using a flexible quantile regression model. All probabilistic predictions are finally combined via simple quantile averaging to produce the output probabilistic prediction. The idea is inspired by the ensemble learning methods originating from the machine learning literature. The proposed methodology offers larger robustness in performance than basic post-processing methodologies using a single hydrological point prediction. It is also empirically proven to “harness the wisdom of the crowd” in terms of average interval score, i.e., the obtained quantile predictions score no worse –usually better− than the average score of the combined individual predictions. This proof is provided within toy examples, which can be used for gaining insight on how the methodology works and under which conditions it can optimally convert point hydrological predictions to probabilistic ones. A large-scale hydrological application is made in a companion paper.

... Advancing the implementation of machine-learning regression algorithms by conducting large-sample (and in-depth) hydrological investigations has been gaining prominence recently (see, e.g., references [42][43][44][45][46]), perhaps following a more general tendency for embracing large-scale hydrological analyses and model evaluations (see, e.g., references [47][48][49][50][51]). The key significance of such studies in improving the modelling of hydrological phenomena, especially when the modelling is data-driven, has been emphasized by several experts in the field (see, e.g., references [16,[52][53][54][55]). ...

... In the present study, we exploit a large dataset for advancing the use of machine-learning algorithms within broader methodological approaches for quantifying the predictive uncertainty in hydrology. The hydrological modelling and hydro-meteorological forecasting literatures include a large variety of such methodologies (see, e.g., references [45,46,[56][57][58][59][60][61][62][63][64][65][66][67][68][69]), reviewed in detail by Montanari [9] and Li et al. [70]. Deterministic "process-based" hydrological models are usually and preferably a core ingredient of probabilistic approaches of this family. ...

... We are explicitly interested in probabilistic hydrological post-processing methodologies whose models are estimated sequentially in more than one stage (see also Section 2.1; hereafter referred to as "multi-stage probabilistic hydrological post-processing methodologies") and machine-learning quantile regression algorithms, since the former can accommodate the latter naturally and effectively. The effectiveness of this accommodation has already been proven, for example, with the large-scale results by Papacharalampous et al. [45] and Tyralis et al. [46] for the monthly and daily timescales, respectively. Aiming at combining the advantages from both the above-outlined "streams of thought" in predictive hydrological modelling, these studies and a few earlier ones (to the best of our knowledge, those mentioned in Table 1) have integrated process-based hydrological models and data-driven algorithmic approaches (spanning from conditional distribution modelling approaches to regression algorithms) within multi-stage probabilistic hydrological post-processing methodologies for predictive uncertainty quantification purposes. ...

We conduct a large-scale benchmark experiment aiming to advance the use of machine-learning quantile regression algorithms for probabilistic hydrological post-processing “at scale” within operational contexts. The experiment is set up using 34-year-long daily time series of precipitation, temperature, evapotranspiration and streamflow for 511 catchments over the contiguous United States. Point hydrological predictions are obtained using the Génie Rural à 4 paramètres Journalier (GR4J) hydrological model and exploited as predictor variables within quantile regression settings. Six machine-learning quantile regression algorithms and their equal-weight combiner are applied to predict conditional quantiles of the hydrological model errors. The individual algorithms are quantile regression, generalized random forests for quantile regression, generalized random forests for quantile regression emulating quantile regression forests, gradient boosting machine, model-based boosting with linear models as base learners and quantile regression neural networks. The conditional quantiles of the hydrological model errors are transformed to conditional quantiles of daily streamflow, which are finally assessed using proper performance scores and benchmarking. The assessment concerns various levels of predictive quantiles and central prediction intervals, while it is made both independently of the flow magnitude and conditional upon this magnitude. Key aspects of the developed methodological framework are highlighted, and practical recommendations are formulated. In technical hydro-meteorological applications, the algorithms should be applied preferably in a way that maximizes the benefits and reduces the risks from their use. This can be achieved by (i) combining algorithms (e.g., by averaging their predictions) and (ii) integrating algorithms within systematic frameworks (i.e., by using the algorithms according to their identified skills), as our large-scale results point out.

... Examples of studies utilizing such datasets to support a successful formulation and selection of skillful probabilistic hydrological forecasting or, more generally, probabilistic hydrological prediction methods also exist. Nonetheless, such examples mostly refer to singlemethod evaluations (or benchmarking) either in post-processing contexts [e.g., those in Farmer and Vogel (2016), Bock et al. (2018), Papacharalampous et al. (2020b)] or in ensemble hydrological forecasting contexts [e.g., those in Pechlivanidis et al. (2020), Girons Lopez et al. (2021]. ...

... Among them are the "wisdom of the crowd" and the "forecast combination puzzle". The "wisdom of the crowd" can be harnessed through simple averaging (Lichtendahl et al., 2013;Winkler et al., 2019) to increase robustness in probabilistic hydrological post-processing and forecasting using quantile regression algorithms [see related empirical proofs in Papacharalampous et al. (2020b)] or potentially machine and statistical learning algorithms from the remaining families that are summarized in Section Quantile, expectile, distributional and other regression algorithms. By increasing robustness, one reduces the risk of delivering poor quality forecasts at every single forecast attempt. ...

Probabilistic forecasting is receiving growing attention nowadays in a variety of applied fields, including hydrology. Several machine learning concepts and methods are notably relevant toward addressing the major challenges of formalizing and optimizing probabilistic forecasting implementations, as well as the equally important challenge of identifying the most useful ones among these implementations. Nonetheless, practically-oriented reviews focusing on such concepts and methods, and on how these can be effectively exploited in the above-outlined essential endeavor, are currently missing from the probabilistic hydrological forecasting literature. This absence holds despite the pronounced intensification in the research efforts for benefitting from machine learning in this same literature. It also holds despite the substantial relevant progress that has recently emerged, especially in the field of probabilistic hydrological post-processing, which traditionally provides the hydrologists with probabilistic hydrological forecasting implementations. Herein, we aim to fill this specific gap. In our review, we emphasize key ideas and information that can lead to effective popularizations, as such an emphasis can support successful future implementations and further scientific developments. In the same forward-looking direction, we identify open research questions and propose ideas to be explored in the future.

... Furthermore, several recent studies have used HM outputs as input to DDMs, demonstrating that such an approach can be used to improve predictive performance compared to the standalone HM Ghaith et al., 2019;Konapala et al., 2020a;Kumanlioglu and Fistikoglu, 2019;Lu et al., 2021;Quilty et al., 2022). Another combined (HM-DDM) approach gaining popularity is 'correcting' HM simulations by using DDMs to simulate the HM residuals (Cho and Kim, 2022;Li et al., 2021aLi et al., , 2021bPapacharalampous et al., 2020aPapacharalampous et al., , 2020bSharma et al., 2021;Shen et al., 2022;Sikorska-Senoner and Quilty, 2021). Incorporating model errors in hydrological simulations has traditionally been used for uncertainty estimation, for example, stochastic resampling from the HM error distribution for assessing predictive uncertainty in streamflow simulations (Montanari and Koutsoyiannis, 2012;Sikorska et al., 2014). ...

... The same method can retrieve the stochastic HM by disregarding the DDM component and resampling the model error from the HM as described in the original blueprint paper (Montanari and Koutsoyiannis, 2012). While the KNN approach was adopted here, other approaches could be used to estimate the conditional PDF of the model error, such as those described in Papacharalampous et al. (2020aPapacharalampous et al. ( , 2020b or Tyralis et al. (2019a). Finally, if the modeller prefers, the DDMs can use other available explanatory variables and/or disregard using previously observed streamflow time lags as input. ...

Recently, the conceptual data-driven approach (CDDA) was proposed to correct residuals of ensemble hydrological models (HMs) using data-driven models (DDMs), followed by the stochastic CDDA (SCDDA) that used HM simulations as input to DDMs within a stochastic framework - both approaches improved ensemble HMs' simulations. Here, a new SCDDA is introduced where CDDA uncertainty is estimated (instead of DDM uncertainty in the original SCDDA). Using nine HM-DDM combinations for daily streamflow simulation in three Swiss catchments, the new SCDDA improved CDDA's mean continuous ranked probability score up to 15% and performed similarly without a snow-routine in a snowy catchment, suggesting that SCDDA may account for missing processes in HMs. The stochastic framework can convert unreliable ensemble models into more reliable (stochastic) models at the cost of simulation sharpness. The coverage probability plot is proposed as a diagnostic tool, predicting SCDDA's out-of-sample reliability using validation set data (CDDA simulations and observations).

... factors (Reeves et al., 2014;Kobold and Sušelj, 2005;Xu et al., 2020b). The predictive uncertainty is a measurement of the spread of precipitation forecasting and could indicate how much the forecasted precipitation values fluctuate around the mean (Papacharalampous et al., 2020). Therefore, the uncertainty range should be given when generating precipitation forecasting results. ...

... The higher the data and model uncertainties, the more divergent and less reliable the forecasting. Therefore, the data and model uncertainties should be jointly considered in the forecasting process (Gal, 2016;Kendall and Gal, 2017;Loquercio et al., 2020;Parrish et al., 2012). Although the predictive errors are close to each other among different forecasting methods in Fig. 6 and Table 2, the predictive uncertainty has some discrepancies. ...

Precipitation forecasting is an important mission in weather
science. In recent years, data-driven precipitation forecasting techniques
could complement numerical prediction, such as precipitation nowcasting,
monthly precipitation projection and extreme precipitation event
identification. In data-driven precipitation forecasting, the predictive
uncertainty arises mainly from data and model uncertainties. Current deep
learning forecasting methods could model the parametric uncertainty by
random sampling from the parameters. However, the data uncertainty is
usually ignored in the forecasting process and the derivation of predictive
uncertainty is incomplete. In this study, the input data uncertainty, target data uncertainty and model uncertainty are jointly modeled in a deep
learning precipitation forecasting framework to estimate the predictive
uncertainty. Specifically, the data uncertainty is estimated a priori and
the input uncertainty is propagated forward through model weights according
to the law of error propagation. The model uncertainty is considered by
sampling from the parameters and is coupled with input and target data
uncertainties in the objective function during the training process.
Finally, the predictive uncertainty is produced by propagating the input
uncertainty in the testing process. The experimental results indicate that
the proposed joint uncertainty modeling framework for precipitation
forecasting exhibits better forecasting accuracy (improving RMSE by 1 %–2 % and R2 by 1 %–7 % on average) relative to several existing methods, and could reduce the predictive uncertainty by ∼28 % relative to the approach of Loquercio et al. (2020). The incorporation of data uncertainty in the objective function changes the distributions of model weights of the forecasting model and the proposed method can slightly smooth the model weights, leading to the reduction of predictive uncertainty relative to the method of Loquercio et al. (2020). The predictive accuracy is improved in the proposed method by incorporating the target data uncertainty and reducing the forecasting error of extreme precipitation. The developed joint uncertainty modeling method can be regarded as a general uncertainty modeling approach to estimate predictive uncertainty from data and model in forecasting applications.

... This is possible through Monte Carlo or Bayesian-based methods (Beven andBinley 1992, Evin et al. 2014), or by adapting the loss function of the models aiming to directly estimate quantiles or expectiles of the probability distribution Papacharalampous 2021, Tyralis et al. 2022). The second class of methods includes post-processor (two-stage) approaches (Montanari and Koutsoyiannis 2012, Sikorska et al. 2015, Li et al. 2017, Biondi and Todini 2018, Papacharalampous et al. 2019, Tyralis et al. 2019a, Papacharalampous et al. 2020a, 2020b, Sikorska-Senoner and Quilty 2021, Li et al. 2021a, Koutsoyiannis and Montanari 2022, Quilty et al. 2022). ...

... One post-processor class includes ensemble models (Quilty et al. 2019, Quilty andAdamowski 2020). Another post-processor class is based on the utilization of quantile regression algorithms, either the linear-in-parameters quantile regression (Dogulu et al. 2015, Papacharalampous et al. 2020a, 2020b or quantile-based machine learning algorithms (Papacharalampous et al. 2019, Tyralis et al. 2019a. ...

Hydrological post-processing using quantile regression algorithms constitutes a prime means of estimating the uncertainty of hydrological predictions. Nonetheless, conventional large-sample theory for quantile regression does not apply sufficiently far in the tails of the probability distribution of the dependent variable. To overcome this limitation that could be crucial when the interest lies on flood events, we here introduce hydrological post-processing through extremal quantile regression for estimating the extreme quantiles of hydrological responses. In summary, the new hydrological post-processing method exploits properties of the Hill's estimator from the extreme value theory to extrapolate quantile regression's predictions to high quantiles. As a proof of concept, the new method is here tested in post-processing daily streamflow simulations provided by three process-based hydrological models for 180 basins in the contiguous United States (CONUS) and is further compared to conventional quantile regression. With this large-scale comparison, it is demonstrated that hydrological post-processing using conventional quantile regression severely underestimates high quantiles (at the quantile level 0.9999) compared to hydrological post-processing using extremal quantile regression, although both methods are equivalent at lower quantiles (at the quantile level 0.9700). Moreover, it is shown that, in the same context, extremal quantile regression estimates the high predictive quantiles with efficiency that is, on average, equivalent in the large-sample study for the three process-based hydrological models.

... Ref. [46] reviewed and evaluated many available methods to perform deterministic forecasts. However, in an operational context, the experience of streamflow forecasters suggests that uncertainty bands can better support decision making for water management schemes [47][48][49][50][51][52]. The intention of a VEF approach is to use all available weighted or non-weighted components of a modelling chain (HFS) to generate streamflow forecasts. ...

... In the development of an operational streamflow forecasting paradigm, it must be determined how total hydrological uncertainty will be reduced to improve streamflow forecasts. This topic is still a matter of discussion among many hydrologists and researchers around the world [47][48][49][50][51][52]. For instance, standardized processes to decompose, quantify, or evaluate the meteorological or the total hydrological uncertainty propagated from a modelling paradigm are required (see, i.e., Figure 8). ...

The combination of Hydrological Models and high-resolution Satellite Precipitation Products (SPPs) or regional Climatological Models (RCMs), has provided the means to establish baselines for the quantification, propagation, and reduction in hydrological uncertainty when generating streamflow forecasts. This study aimed to improve operational real-time streamflow forecasts for the Upper Zambezi River Basin (UZRB), in Africa, utilizing the novel Variational Ensemble Forecasting (VEF) approach. In this regard, we describe and discuss the main steps required to implement, calibrate, and validate an operational hydrologic forecasting system (HFS) using VEF and Hydrologic Processing Strategies (HPS). The operational HFS was constructed to monitor daily streamflow and forecast them up to eight days in the future. The forecasting process called short- to medium-range (SR2MR) streamflow forecasting was implemented using real-time rainfall data from three Satellite Precipitation Products or SPPs (The real-time TRMM Multisatellite Precipitation Analysis TMPA-RT, the NOAA CPC Morphing Technique CMORPH, and the Precipitation Estimation from Remotely Sensed data using Artificial Neural Networks, PERSIANN) and rainfall forecasts from the Global Forecasting System (GFS). The hydrologic preprocessing (HPR) strategy considered using all raw and bias corrected rainfall estimates to calibrate three distributed hydrological models (HYMOD_DS, HBV_DS, and VIC 4.2.b). The hydrologic processing (HP) strategy considered using all optimal parameter sets estimated during the calibration process to increase the number of ensembles available for operational forecasting. Finally, inference-based approaches were evaluated during the application of a hydrological postprocessing (HPP) strategy. The final evaluation and reduction in uncertainty from multiple sources, i.e., multiple precipitation products, hydrologic models, and optimal parameter sets, was significantly achieved through a fully operational implementation of VEF combined with several HPS. Finally, the main challenges and opportunities associated with operational SR2MR streamflow forecasting using VEF are evaluated and discussed.

... Several relevant examples can be found in the literature; see [7,8,9,10,11,16,20,36,37,38,39,40,41,44,45,46,49,50,61,62,68,69,76,83,90,91,92] and the review by [43]. ...

... Finally, similar methodological themes to those proposed in this work, including several ones for issuing point [78] and probabilistic [79] predictions of hydrological signatures, could be provided by exclusively using hydrological models, instead of relying on data driven-ones. Other similar themes for improved quantile-based predictions are those combining multiple hydrological models and more [62]. ...

Predictive uncertainty in hydrological modelling is quantified by using post-processing or Bayesian-based methods. The former methods are not straightforward and the latter ones are not distribution-free. To alleviate possible limitations related to these specific attributes, in this work we propose the calibration of the hydrological model by using the quantile loss function. By following this methodological approach, one can directly simulate pre-specified quantiles of the predictive distribution of streamflow. As a proof of concept, we apply our method in the frameworks of three hydrological models to 511 river basins in contiguous US. We illustrate the predictive quantiles and show how an honest assessment of the predictive performance of the hydrological models can be made by using proper scoring rules. We believe that our method can help towards advancing the field of hydrological uncertainty.

... Smith et al., 2015) in process-based model simulations. Previous studies have shown that residual error models based on machine learning can provide both bias correction (i.e., improving the performance of the deterministic model) and uncertainty estimates (Papacharalampous et al., 2020;Shrestha & Solomatine, 2008;Sikorska et al., 2015;Weerts et al., 2011). It has been recognized widely that probabilistic predictions provide more information than point estimates for decision-makers in flood forecasting and water resources management (Duan et al., 2019;Han & Coulibaly, 2017;Krzysztofowicz, 1999). ...

... Previous studies have shown the residuals that difference between the modeled and observed streamflows have a dependence on the modeled streamflow (X. Jiang et al., 2019;Papacharalampous et al., 2020;Sikorska et al., 2015;Wani et al., 2017). This means the modeled streamflow can be chosen as a predictor of the Bayesian LSTM to predict the residuals. ...

Significant attention has recently been paid to deep learning as a method for improved catchment modeling. Compared with process-based models, deep learning is often criticized for its lack of interpretability. One solution is to combine a process-based hydrological model with a residual error model based on deep learning to give full scope to their respective advantages. In classical residual error models, Bayesian inference via Markov chain Monte Carlo (MCMC) is commonly used to provide an estimation of the uncertainty. However, deep neural networks tend to have excessively large numbers of parameters, making MCMC an unsuitable approach. Here, we introduce an alternative to Bayesian MCMC sampling called stochastic variational inference (SVI) which has recently been developed for Bayesian deep learning in Natural Language Processing. We implement SVI in a Long Short-Term Memory (LSTM) network and construct residual error models in process-based hydrological models. This approach is examined in the contrasting geographical and climatic characteristics of two catchments from China, the Tangnaihai catchment and the Shiquan catchment. Compared with the Bayesian linear regression model, the Bayesian LSTM provides better uncertainty estimates. Specifically, the proposed method improves the Continuous Ranked Probability Score (CRPS) by over 10% in both two catchments. In the Tangnaihai catchment, it provides more than 10% narrower uncertainty intervals in terms of Sharpness with slightly superior Reliability. In the Shiquan catchment, it provides comparable uncertainty intervals with better Reliability. Further, our study highlights the scalability of SVI to high-dimensional parameter spaces in hydrological applications (e.g., distributed hydrological models, groundwater models).

... This type of error model is a kind of one-way coupling (Razavi, 2021) in which the output of hydrologic models is used as the input for machine learning, and is frequently applied in modeling scenarios where the computational expense of the pre-specified model implies that it is not possible to undertake statistical residual error models that rely on high-frequency parameter sampling. Most studies of data-driven models focus on linear regression or linear-variants of quantile regression (Dogulu et al., 2015;Papacharalampous et al., 2020a;Papacharalampous et al., 2020b;Tyralis et al., 2019;Wani et al., 2017), and non-linear regression techniques are less common. Some studies have indicated traditional nonlinear regression techniques such as Artificial Neural Networks (ANN), M5 model trees and K-Nearest Neighbors (KNN) provided superior characterization than linear-based techniques (Shrestha and Solomatine, 2006;Wani et al., 2017). ...

... Four predictors including 3-day antecedent precipitation, simulated soil moisture, simulated streamflow, and previous residuals are used in the linear regression model and the LSTM network to predict the one-stage residuals with the one-time-step ahead prediction. The predictive uncertainty of each case is assessed with Reliability, Sharpness (Smith et al., 2015;Wu et al., 2019), and Average Interval Score statistics (Papacharalampous et al., 2020b;Gneiting and Raftery, 2007). The Reliability metric measures the proportion of observations that are captured by the prediction intervals; the Sharpness metric measures the mean width of the prediction intervals. ...

Errors in hydrological simulations have impacted their applications in flood prediction and water resources management. Properly characterizing the properties of errors such as heteroscedasticity and autocorrelation can provide improved hydrological predictions. Here, we present a probabilistic Long Short-Term Memory (LSTM) network for modeling hydrological residual errors. Three steps are undertaken to characterize uncertainty using the LSTM: (i) the network is trained with gradient optimization to obtain optimal predictions; (ii) the distribution of the errors for optimal predictions are estimated using a Bayesian Markov chain Monte Carlo (MCMC) algorithm; (iii) probabilistic predictions are made using the inferred error distribution and optimal predictions. We examine the model in the source of Yellow River, China over the period 1992–2015. A distributed hydrological model MIKE SHE is applied to simulate streamflows in the catchment. As benchmarks, we compare the results with a Bayesian linear regression model and a traditional probabilistic residual error model. Results show the probabilistic LSTM network reduces the heteroscedasticity and removes almost all autocorrelations in residual errors. Besides, compared with the other two methods, the proposed method produces more than 50% narrower uncertainty intervals with the best probability coverage. Our results highlight the potential ability of a deep learning approach integrated with a hydrological model to better characterize predictive uncertainty in hydrologic modeling.

... Therefore, some concessions should be accepted, since there is no free lunch in modelling [64], so that one obtains a model that provides acceptable predictions and is interpretable simultaneously. For instance, one could combine multiple models and obtain more accurate points [65,66] or probabilistic [67,68] pre- To understand Figure 6, we note that attributes in the vertical axis are reported with regards to their type, i.e., topographic characteristics, climatic indices, land cover characteristics, soil characteristics and geological characteristics (from lower to upper). In general, topographic characteristics and climatic indices seem to be most important when predicting hydrological signatures. ...

... Therefore, some concessions should be accepted, since there is no free lunch in modelling [64], so that one obtains a model that provides acceptable predictions and is interpretable simultaneously. For instance, one could combine multiple models and obtain more accurate points [65,66] or probabilistic [67,68] predictions. However, in this case, interpretability would be lost for the sake of generalization. ...

Hydrological signatures, i.e., statistical features of streamflow time series, are used to characterize the hydrology of a region. A relevant problem is the prediction of hydrological signatures in ungauged regions using the attributes obtained from remote sensing measurements at ungauged and gauged regions together with estimated hydrological signatures from gauged regions. The relevant framework is formulated as a regression problem, where the attributes are the predictor variables and the hydrological signatures are the dependent variables. Here we aim to provide probabilistic predictions of hydrological signatures using statistical boosting in a regression setting. We predict 12 hydrological signatures using 28 attributes in 667 basins in the contiguous US. We provide formal assessment of probabilistic predictions using quantile scores. We also exploit the statistical boosting properties with respect to the interpretability of derived models. It is shown that probabilistic predictions at quantile levels 2.5% and 97.5% using linear models as base learners exhibit better performance compared to more flexible boosting models that use both linear models and stumps (i.e., one-level decision trees). On the contrary, boosting models that use both linear models and stumps perform better than boosting with linear models when used for point predictions. Moreover, it is shown that climatic indices and topographic characteristics are the most important attributes for predicting hydrological signatures.

... The uncertainty quantification of hydrologic models, basically produced for the random nature of hydrological processes (Kong et al., 2017), is largely based on probabilistic method. Stochastic methods are the most powerful ones for uncertainty quantification of hydrologic models (Koutsoyiannis, 2010;Papacharalampous et al., 2020). Consequently, stochastic simulation is vital tool for risk-based management of water resources systems. ...

Floods mostly vary from one region to another, and their severity is determined by a variety of factors, including unpredictable weather patterns and heavy rainfall occurrences (Pham Van and Nguyen-Van, 2020; Soulard et al., 2020).
Although floods are common in many places of India during monsoon seasons, the Ganga basin is particularly vulnerable
(Bhatt et al., 2021; Meena et al., 2021). There are a lot of areas in the state of Bihar that get flooded due to the swelling of
rivers in neighboring Nepal (Lal et al., 2020; Soulard et al., 2020; Wagle et al., 2020). This appealed to the attention of the
present research. The Ganga basin spans China, Nepal, India, and Bangladesh (Agnihotri et al., 2019; Ahmad and Goparaju,
2020; Prakash et al., 2017; Sinha and Tandon, 2014).
The global emergence of COVID-19 has stopped all the activities, and it debuted as the deadliest disease with the
longest nationwide lockdown. These caused enormous disruption in all aspects of people’s livelihood. Besides, major
obstacles got accumulated due to the effect of the flooding event during July 2020. It added misery to the people and livelihood of the people, who were trying to control the spread of COVID-19. These results in disaster-risk mitigation to other
sectors. The only way to have an effective and prompt response is to have real-time information provided by space-based
sensors. Using a cloud-based platform like Google earth engine (GEE), an automated technique is employed to analyze the
flood inundation with Synthetic Aperture Radar (SAR) images. The study exhibits the potential of automated techniques
along with algorithms applied to larger datasets on cloud-based platforms. The results present flood extent maps for the
lower Ganga basin, comprising areas of the Indian subcontinent. Severe floods destroyed several parts of Bihar and West
Bengal affecting a large population. This study offers a prompt and precise estimation of inundated areas to facilitate a
quick response for risk assessment, particularly at times of the COVID-19.
The three states (Bihar, Jharkhand, and West Bengal), collectively known as the Lower Ganga Basin, are home to more
than 30% of the population (Prakash et al., 2017). Rapid population growth and settlements resulted in changes in land use,
increased soil erosion, increased siltation, and other related variables that augmented flood severity (Li et al., 2020; Pham
Van and Nguyen-Van, 2020). However, floods became the most frequent disaster in recent times, what compounded the
problem was the COVID-19 pandemic (Kr€amer et al., 2021; Lal et al., 2020). As a result, new measures were needed to
manage the spread of COVID-19 as well as flood mitigation (Wang et al., 2020; Zoabi et al., 2021). Although ground data
and field measurements are considered to be more accurate, they are time and money consuming. Furthermore, field
surveys were impossible to conduct during this period, since social distancing has become the norm, linked with significant
health concerns and trip expenditures ( Jian et al., 2020; Lattari et al., 2019). Flood mitigation strategies that are ineffective
may result in more human deaths, property damage, and more spread of COVID-19 (Cornara et al., 2019; Shen et al., 2019).
It had disastrous impacts in 149 districts throughout Bihar, Assam, West Bengal. Since the movement was halted owing to a
sudden shutdown, the only way out was to employ robust flood control techniques based on real-time information (Das
et al., 2018; Dong et al., 2020; Tang et al., 2016). The dramatic increase in flood occurrence in these locations prompted
specialists to implement more structured and effective flood management to address the issues, while also adhering to all
COVID-19 norms and regulations (Min et al., 2020; Wang et al., 2019).

... Some of the most popular scientific issues of modern hydrology are climate variability and change and its impact on the water regime, which is of primary importance for states with generally scarce water resources. Modelling and forecasting the time series of river flows are essential elements in the assessment of the hydrological regime and the management of water resources [1][2][3]. The results of analyses of multiyear flow measurement series enable an assessment of the reaction rivers to supply factors or their limitation. ...

River-flow forecasts are important for the management and planning of water resources and their rational use. The present study, based on direct multistep-ahead forecasting with multiple time series specific to the XGBoost algorithm, estimates the long-term changes and forecast monthly flows of selected rivers in Ukraine. In a new, applied approach, a single multioutput model was proposed that forecasts over both short- and long-term horizons using grouped or hierarchical data series. Three forecast stages were considered: using train and test subsets, using a model with train-test data, and training with all data. The historical period included the measurements of the monthly flows, precipitation, and air temperature in the period 1961–2020. The forecast horizons of 12, 60, and 120 months into the future were selected for this dataset, i.e., December 2021, December 2025, and December 2030. The research was conducted for diverse hydrological systems: the Prut, a mountain river; the Styr, an upland river; and the Sula, a lowland river in relation to the variability and forecasts of precipitation and air temperature. The results of the analyses showed a varying degree of sensitivity among rivers to changes in precipitation and air temperature and different projections for future time horizons of 12, 60, and 120 months. For all studied rivers, variable dynamics of flow was observed in the years 1961–2020, yet with a clearly marked decrease in monthly flows during in the final, 2010–2020 decade. The last decade of low flows on the Prut and Styr rivers was preceded by their noticeable increase in the earlier decade (2000–2010). In the case of the Sula River, a continuous decrease in monthly flows has been observed since the end of the 1990s, with a global minimum in the decade 2010–2020. Two patterns were obtained in the forecasts: a decrease in flow for the rivers Prut (6%) and the Styr (12–14%), accompanied by a decrease in precipitation and an increase in air temperature until 2030, and for the Sula River, an increase in flow (16–23%), with a slight increase in precipitation and an increase in air temperature. The predicted changes in the flows of the Prut, the Styr, and the Sula rivers correspond to forecasts in other regions of Ukraine and Europe. The performance of the models over a variety of available datasets over time was assessed and hyperparameters, which minimize the forecast error over the relevant forecast horizons, were selected. The obtained RMSE parameter values indicate high variability in hydrological and meteorological data in the catchment areas and not very good fit of retrospective data regardless of the selected horizon length. The advantages of this model, which was used in the work for forecasting monthly river flows in Ukraine, include modelling multiple time series simultaneously with a single model, the simplicity of the modelling, potentially more-robust results because of pooling data across time series, and solving the “cold start” problem when few data points were available for a given time series. The model, because of its universality, can be used in forecasting hydrological and meteorological parameters in other catchments, irrespective of their geographic location.

... In their blueprint, Montanari and Koutsoyiannis [12] provided a framework to upgrade a deterministic model into stochastic, which provides the probability distribution of the output given the inputs and the deterministic model output, considering the uncertainty in model parameters and in input variables. This work has been discussed [13,14] and advanced in other studies [15][16][17]. ...

Bluecat is a recently proposed methodology to upgrade a deterministic model (D-model) into stochastic (S-model), based on the hypothesis that the information contained in a time series of observations and the concurrent predictions by the D-model is sufficient to support this upgrade. Prominent characteristics of the methodology are its simplicity and transparency, which allow easy use in practical applications, without sophisticated computational means. Here we utilize the Bluecat methodology and expand it in order to be combined with climatic model outputs, which often require extrapolation out of the range of values covered by observations. We apply the expanded methodology to the precipitation and temperature processes in a large area, namely the entire territory of Italy. The results showcase the appropriateness of the method for hydroclimatic studies, as regards the assessment of the performance of the climatic projections, as well as their stochastic conversion with simultaneous bias correction and uncertainty quantification.

... In their blueprint, Montanari and Koutsoyiannis [12] provided a framework to upgrade a deterministic model into a stochastic one, which provides the probability distribution of the output given the inputs and the deterministic model output, considering the uncertainty in the model parameters and in input variables. This work has been discussed [13,14] and advanced in other studies [15][16][17]. ...

Bluecat is a recently proposed methodology to upgrade a deterministic model (D-model) into a stochastic one (S-model), based on the hypothesis that the information contained in a time series of observations and the concurrent predictions made by the D-model is sufficient to support this upgrade. The prominent characteristics of the methodology are its simplicity and transparency, which allow its easy use in practical applications, without sophisticated computational means. In this paper, we utilize the Bluecat methodology and expand it in order to be combined with climate model outputs, which often require extrapolation out of the range of values covered by observations. We apply the expanded methodology to the precipitation and temperature processes in a large area, namely the entire territory of Italy. The results showcase the appropriateness of the method for hydroclimatic studies, as regards the assessment of the performance of the climate projections, as well as their stochastic conversion with simultaneous bias correction and uncertainty quantification.

... As the time-series becomes more coarsely aggregated, it is observed that the relative nRMSE values' distribution for these methods becomes symmetrical (around the two standard deviation range) with a mean close to 0, which shows that it cannot be known a priori whether temporal hierarchical reconciliation will improve forecast performance at a given basin, highlighting the importance of carrying out large-sample experiments when testing and validating new forecasting approaches (Papacharalampous et al., 2020), such as temporal hierarchical reconciliation. The results are aligned with the Global Energy Forecasting Competition 2017 (GEFCom2017), where using hierarchical information did not result in a substantial improvement compared to utilizing hybrid forecast models (Hong et al., 2019). ...

Obtaining consistent forecasts at different timescales is important for reliable decision‐making. This study introduces and evaluates the benefits of utilizing temporal hierarchical reconciliation methods for water resources forecasting, with an application to precipitation. Original (precipitation) Forecasts (ORFs) were produced using “automatic” Exponential Time‐Series Smoothing (ETS), Artificial Neural Network (ANN), and Seasonal Auto‐Regressive Integrated Moving Average (SARIMA) models at six timescales, namely, monthly, 2‐monthly, quarterly, 4‐monthly, bi‐annual, and annual, for 84 basins extracted from the Canadian model parameter experiment. Temporal hierarchical reconciliation methods, including structural scaling‐based Weighted Least Squares (WLS), series variance scaling‐based WLS, and Ordinary Least Squares, along with the simple Bottom‐Up (BU) method, were applied to reconcile the forecasts. In general, ETS (direct forecasting) demonstrated better performance compared to ANN and SARIMA (recursive forecasting). The results confirmed that improvements in accuracy due to reconciliation is dependent on the basin, timescale, and the ORFs' accuracy. For different forecast models, the reconciliation methods showed different levels of performance. For ETS, BU was able to improve forecast accuracy to a greater extent than the temporal hierarchical reconciliation methods, while for ANN and SARIMA, forecast accuracy was improved through all temporal hierarchical reconciliation methods but not BU. The reconciled forecasts' accuracy was affected more by the ORFs' accuracy than by the reconciliation method. Different timescales showed dissimilar sensitivity to reconciliation. The presented results are anticipated to serve as a valuable benchmark for evaluating future developments in the promising area of temporal hierarchical reconciliation for water resources forecasting.

... Moreover, examining and evaluating hydrological post-processors in catchments with different climate and hydrological conditions ensures suitable comparisons and helps to generalise the obtained results [47,95]. Furthermore, the diverse climate conditions of catchments analysed allow us to deduce functional relationships between climatic indices and the post-processors' performance. ...

This research develops an extension of the Model Conditional Processor (MCP), which merges clusters with Gaussian mixture models to offer an alternative solution to manage heteroscedastic errors. The new method is called the Gaussian mixture clustering post-processor (GMCP). The results of the proposed post-processor were compared to the traditional MCP and MCP using a truncated Normal distribution (MCPt) by applying multiple deterministic and probabilistic verification indices. This research also assesses the GMCP’s capacity to estimate the predictive uncertainty of the monthly streamflow under different climate conditions in the “Second Workshop on Model Parameter Estimation Experiment” (MOPEX) catchments distributed in the SE part of the USA. The results indicate that all three post-processors showed promising results. However, the GMCP post-processor has shown significant potential in generating more reliable, sharp, and accurate monthly streamflow predictions than the MCP and MCPt methods, especially in dry catchments. Moreover, the MCP and MCPt provided similar performances for monthly streamflow and better performances in wet catchments than in dry catchments. The GMCP constitutes a promising solution to handle heteroscedastic errors in monthly streamflow, therefore moving towards a more realistic monthly hydrological prediction to support effective decision-making in planning and managing water resources.

... In particolare, Sikorska et al. (2015) hanno proposto un approccio di tipo "nearest-neighbour" per ricostruire la distribuzione di probabilità dell'errore di previsione. Soluzioni analoghe sono state considerate da Papacharalampous et al. (2019), Papacharalampous et al. (2020), Papacharalampous et al. (2019a) e Tyralis et al. (2019Tyralis et al. ( , 2020. Alla luce dei lavori in precedenza citati, la memoria presenta un approccio innovativo e trasparente che si basa sul concetto proposto da Montanari e Koutsoyiannis (2012) per trasformare un modello deterministico generico in una corrispondente formulazione stocastica. ...

We present a new method for simulating and predicting hydrologic variables and in particular river flows,
which is rooted in the probability theory and conceived in order to provide a reliable quantification of its
uncertainty for operational applications. Our approach, which we term with the acronym "Bluecat", results
from a theoretical and numerical development, and is conceived to make a transparent and intuitive use of
the observations. Therefore, Bluecat makes use of a rigorous theory while at the same time proofing the
concept that environmental resources should be managed by making the best use of empirical evidence and
experience. We provide an open and user friendly software to apply the method to the simulation and
prediction of river flows and test Bluecat's reliability for operational applications.

... Finally, similar methodological themes to those proposed in this work, including several ones for issuing point [90] and probabilistic [91] predictions of hydrological signatures, could be provided by exclusively using hydrological models instead of relying on data driven-ones. Other similar themes for improved quantile-based predictions are those combining multiple hydrological models and more [30]. ...

Predictive uncertainty in hydrological modelling is quantified by using post-processing or Bayesian-based methods. The former methods are not straightforward and the latter ones are not distribution-free (i.e., assumptions on the probability distribution of the hydrological model’s output are necessary). To alleviate possible limitations related to these specific attributes, in this work we propose the calibration of the hydrological model by using the quantile loss function. By following this methodological approach, one can directly simulate pre-specified quantiles of the predictive distribution of streamflow. As a proof of concept, we apply our method in the frameworks of three hydrological models to 511 river basins in the contiguous US. We illustrate the predictive quantiles and show how an honest assessment of the predictive performance of the hydrological models can be made by using proper scoring rules. We believe that our method can help towards advancing the field of hydrological uncertainty.

The highly nonlinear and nonstationary nature of runoff events in changing environments makes accurate and reliable runoff forecasting difficult. We propose a hybrid model by integrating an autoregressive model, Bayesian inference, a complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) algorithm, Bayesian optimization, and support vector regression. Two Bayesian inference methods (the No-U-Turn Sampler (NUTS) and variational inference) were used to calculate the parameters of the AR model to obtain a Bayesian AR (BAR) model. Credible intervals were used to analyze the uncertainty of the parameters and model prediction results. The above model is applied to the daily runoff predictions of hydrological stations in the Yellow River Basin of China. The results show that (1) the hybrid model can improve the prediction accuracy and (2) the NUTS algorithm-based model provides a narrower reliable interval and performs better in uncertainty analyses.

Due to the inherently unfavorable geological conditions coupled with various triggering factors, reservoir bank slopes often exhibit long-term and complex deformation processes with multiple accelerating phases, which poses constant threats to the safe operations of hydropower stations, as well as the safety of the residents, houses, and infrastructures both upstream and downstream of the reservoirs. In this study, the quantile regression has been used to develop a fundamental tool for the prediction and early warning of this type of complex slope deformation. The application of this tool is illustrated in the case study of the Duonuo near-dam slope at Baishui River in China. Quantile regression can estimate the parameters of the prediction model, such as the Fukui–Okubo model used in this study, based on different quantile levels. This enables the predictive uncertainty to be quantified and obtain a possible failure-time horizon. Further, two indicators, i.e., the reliability indicator and the strength indicator, have been defined to aggregate the predictive information of the quantile regression. As such, the reliability and intensity of failure precursory signals associated with the accelerating behavior can be diagnosed. Finally, a probabilistic warning indicator has been proposed to establish an early warning procedure, which can measure the risk levels of a slope failure in real time. Two case studies, i.e., the Duonuo slope and the Vajont landslide, have been used to verify the effectiveness of the proposed early warning indicator. Thus, the tool developed in this study provides valuable references for predictive decision-making and disaster management of reservoir bank slopes. It can also easily be transferred to unstable slopes with similar evolution characteristics in other hazardous areas.

Machine and statistical learning algorithms can be reliably automated and applied at scale. Therefore, they can constitute a considerable asset for designing practical forecasting systems, such as those related to urban water demand. Quantile regression algorithms are statistical and machine learning algorithms that can provide probabilistic forecasts in a straightforward way, and have not been applied so far for urban water demand forecasting. In this work, we fill this gap, thereby proposing a new family of probabilistic urban water demand forecasting algorithms. We further extensively compare seven algorithms from this family in practical one‐day ahead urban water demand forecasting settings. More precisely, we compare five individual quantile regression algorithms (i.e., the quantile regression, linear boosting, generalized random forest, gradient boosting machine and quantile regression neural network algorithms), their mean combiner and their median combiner. The comparison is conducted by exploiting a large urban water flow data set, as well as several types of hydrometeorological time series (which are considered as exogenous predictor variables in the forecasting setting). The results mostly favor the linear boosting algorithm, probably due to the presence of shifts (and perhaps trends) in the urban water flow time series. The forecasts of the mean and median combiners are also found to be skillful.

Predictions of hydrological models should be probabilistic in nature. Our aim is to introduce a method that estimates directly the uncertainty of hydrological simulations using expectiles, thus complementing previous quantile-based direct approaches. Expectiles are new risk measures in hydrology. They are least square analogues of quantiles and can characterize the probability distribution in much the same way as quantiles do. To this end, we propose calibrating hydrological models using the expectile loss function, which is consistent for expectiles. We apply our method to 511 basins in contiguous US and deliver predictive expectiles of hydrological simulations with the GR4J, GR5J and GR6J hydrological models at expectile levels 0.500, 0.900, 0.950 and 0.975. An honest assessment empirically proves that the GR6J model outperforms the other two models at all expectile levels. Great opportunities are offered for moving beyond the mean in hydrological modelling by simply adjusting the objective function.

In a companion paper, Sikorska-Senoner and Quilty (2021) introduced the ensemble-based conceptual-data-driven approach (CDDA) for improving hydrological simulations. This approach consists of an ensemble of hydrological model (HM) simulations (generated via different parameter sets) whose residuals are ‘corrected’ by a data-driven model (one per HM parameter set), resulting in an improved ensemble simulation. Through a case study involving three Swiss catchments, it was demonstrated that CDDA generates significantly improved ensemble streamflow simulations when compared to the ensemble HM. In this follow-up study, a stochastic version of CDDA (SCDDA) is developed that, in addition to parameter uncertainty, accounts for input data, input variable selection, and model output uncertainty. Using several deterministic and probabilistic performance metrics, it is shown that SCDDA results in significantly more accurate and reliable ensemble-based streamflow simulations than the CDDA, ensemble and stochastic HMs, and a quantile regression-based approach, improving the mean interval score by 26–79%.

We present a new method for simulating and predicting hydrologic variables with uncertainty assessment and provide example applications to river flows. The method is identified with the acronym “Bluecat" and is based on the use of a deterministic model which is subsequently converted to a stochastic formulation. The latter provides an adjustment on statistical basis of the deterministic prediction along with its confidence limits. The distinguishing features of the proposed approach are the ability to infer the probability distribution of the prediction without requiring strong hypotheses on the statistical characterization of the prediction error (e.g. normality, homoscedasticity) and its transparent and intuitive use of the observations. Bluecat makes use of a rigorous theory to estimate the probability distribution of the predictand conditioned by the deterministic model output, by inferring the conditional statistics of observations. Therefore, Bluecat bridges the gaps between deterministic (possibly physically-based, or deep learning-based) and stochastic models as well as between rigorous theory and transparent use of data with an innovative and user-oriented approach. We present two examples of application to the case studies of the Arno River at Subbiano and Sieve River at Fornacina. The results confirm the distinguishing features of the method along with its technical soundness. We provide an open software working in the R environment, along with help facilities and detailed instructions to reproduce the case studies presented here.

A recent nonlinear and multiscale framework, the Wavelet Data-Driven Forecasting Framework (WDDFF), was proposed for water resources forecasting. The main objective of this study is to explore the WDDFF for short-term urban water demand (UWD) forecasting over multiple lead times (1, 2, 3, 6, 12, 18, and 24 h ahead) by focusing on two separate issues that have yet to be considered within the framework: 1) a comparison of artificial neural network (ANN), least squares support vector machines (LSSVM), regularized extreme learning machines (RELM), and random forest (RF) and 2) two dataset partitioning approaches for reducing overfitting in deterministic and probabilistic machine learning (ML) models, a permutation-based approach for the deterministic models and a bootstrap-based approach for the probabilistic models. The secondary objective is to evaluate the usefulness of an input variable selection approach based on RF (RFIVS) for identifying the most important inputs to use in the ML models. The results of a real-world UWD forecasting case study in Qom, Iran demonstrate several noteworthy findings: 1) the probabilistic RF and its 'best' wavelet-based version provided the most accurate and reliable forecasts, with average test set Nash Sutcliffe Efficiency Index (NASH) coefficients (i.e., across all lead times) of ∼ 0.80 and 0.81, respectively; 2) the permutation- and bootstrap-based dataset partitioning approaches demonstrated potential to reduce overfitting; 3) wavelet decomposition improved probabilistic and deterministic ML model performance, increasing test set NASH coefficients by 1–7% on average (across all lead times); 4) wavelet-based models provided approximately the same level of reliability as the non-wavelet-based models but the best performing wavelet-based models improved forecast sharpness by an average of 14–24% (across all lead times); and 5) RFIVS substantially reduced the number of input variables used in the ML models (e.g., the number of inputs used in the wavelet-based models was often reduced by 50%) while still leading to improved performance over the case where all input variables were used.

Traditional methods for streamflow forecasting include statistical models, which have outperformed machine learning algorithms at large timescales (i.e., monthly and annual) in the absence of informative exogenous variables. This chapter presents an overview as well as the theory of statistical models and methods for forecasting of streamflow time series. We show how statistical models (exponential smoothing and autoregressive fractionally integrated moving average models) can be used for streamflow forecasting, and we present large-scale studies where statistical models are utilized, followed by explanations for their observed behavior. We also outline, in this chapter, the role of large-scale studies in advancing hydrological forecasting in practice.

A novel framework, an ensemble-based conceptual-data-driven approach (CDDA), is developed that integrates a hydrological model (HM) with a data-driven model (DDM) to simulate an ensemble of HM residuals. Thus, a CDDA delivers an ensemble of ‘residual-corrected’ streamflow simulations. This framework is beneficial because it respects hydrological processes via the HM and it profits from the DDM’s ability to simulate the complex relationship between residuals and input variables. The CDDA enables exploring different DDMs to identify the most suitable model. Eight DDMs are explored: Multiple Linear Regression (MLR), k Nearest Neighbours Regression (kNN), Second-Order Volterra Series Model, Artificial Neural Networks (ANN), and two variants of eXtreme Gradient Boosting (XGB) and Random Forests (RF). The proposed CDDA, tested on three Swiss catchments, was able to improve the mean continuous ranked probability score by 16-29% over the standalone HM. Based on these results, XGB and RF are recommended for simulating the HM residuals.

Probabilistic hydrological modelling methodologies often comprise two-stage post-processing schemes, thereby allowing the exploitation of the information provided by conceptual or physically-based rainfall-runoff models. They might also require issuing an ensemble of rainfall-runoff model simulations by using the rainfall-runoff model with different input data and/or different parameters. For obtaining a large number of rainfall-runoff model parameters in this regard, Bayesian schemes can be adopted; however, such schemes are accompanied by computational limitations (that are well-recognized in the literature). Therefore, in this work, we investigate the replacement of Bayesian rainfall-runoff model calibration schemes by computationally convenient non-Bayesian schemes within probabilistic hydrological modelling methodologies of the above-defined family. For our experiments, we use a methodology of this same family that is additionally characterized by the following distinguishing features: It (a) is in accordance with a theoretically consistent blueprint, (b) allows the exploitation of quantile regression algorithms (which offer larger flexibility than parametric models), and (c) has been empirically proven to harness the “wisdom of the crowd” in terms of average interval score. We also use a parsimonious conceptual rainfall-runoff model and 50-year-long monthly time series observed in 270 catchments in the United States to apply and compare 12 variants of the selected methodology. Six of these variants simulate the posterior distribution of the rainfall-runoff model parameters (conditional on the observations of a calibration period) within a Bayesian Markov chain Monte Carlo framework (first category of variants), while the other six variants use a simple computationally efficient approach instead (second category of variants). Six indicative combinations of the remaining components of the probabilistic hydrological modelling methodology (i.e., its post-processing scheme and its error model) are examined, each being used in one variant from each of the above-defined categories. In this specific context, the two large-scale calibration schemes (each representing a different “modelling culture” in our tests) are compared using proper scores and large-scale benchmarking. Overall, our findings suggest that the compared “modelling cultures” can lead to mostly equally good probabilistic predictions.

We discuss possible pathways towards reducing uncertainty in predictive modelling contexts in hydrology. Such pathways may require big datasets and multiple models, and may include (but are not limited to) large-scale benchmark experiments, forecast combinations, and predictive modelling frameworks with hydroclimatic time series analysis and clustering inputs. Emphasis is placed on the newest concepts and the most recent methodological advancements for benefitting from diverse inferred features and foreseen behaviours of hydroclimatic variables, derived by collectively exploiting diverse essentials of studying and modelling hydroclimatic variability and change (from both the descriptive and predictive perspectives). Our discussions are supported by big data (including global-scale) investigations, which are conducted for several hydroclimatic variables at several temporal scales.

Machine and statistical learning algorithms can be reliably automated and applied at scale. Therefore, they can constitute a considerable asset for designing practical forecasting systems, such as those related to urban water demand. Quantile regression algorithms are statistical and machine learning algorithms that can provide probabilistic forecasts in a straightforward way, and have not been applied so far for urban water demand forecasting. In this work, we aim to fill this gap by automating and extensively comparing several quantile-regression-based practical systems for probabilistic one-day ahead urban water demand forecasting. For designing the practical systems, we use five individual algorithms (i.e., the quantile regression, linear boosting, generalized random forest, gradient boosting machine and quantile regression neural network algorithms), their mean combiner and their median combiner. The comparison is conducted by exploiting a large urban water flow dataset, as well as several types of hydrometeorological time series (which are considered as exogenous predictor variables in the forecasting setting). The results mostly favour the practical systems designed using the linear boosting algorithm, probably due to the presence of trends in the urban water flow time series. The forecasts of the mean and median combiners are also found to be skilful in general terms.

The use of neural networks in hydrology has been frequently undermined by limitations regarding the quantification of uncertainty in predictions. Many authors have proposed different methodologies to overcome these limitations, such as running Monte Carlo simulations, Bayesian approximations, and bootstrapping training samples, which come with computational limitations of their own, and two-step approaches, among others. One less-frequently explored alternative is to repurpose the dropout scheme during inference. Dropout is commonly used during training to avoid overfitting. However, it may also be activated during the testing period to effortlessly provide an ensemble of multiple ''sister'' predictions. This study explores the predictive uncertainty in hydrological models based on neural networks by comparing a multiparameter ensemble to a dropout ensemble. The dropout ensemble shows more reliable coverage of prediction intervals, while the multiparameter ensemble results in sharper prediction intervals. Moreover, for neural network structures with optimal lookback series, both ensemble strategies result in similar average interval scores. The dropout ensemble, however, benefits from requiring only a single calibration run, i.e., a single set of parameters. In addition, it delivers important insight for engineering design and decision-making with no increase in computational cost. Therefore, the dropout ensemble can be easily included in uncertainty analysis routines and even be combined with multiparameter or multimodel alternatives.

Catchment models are conventionally evaluated in terms of their response surface or likelihood surface constructed from model runs using different sets of model parameters. Model evaluation methods are mainly based upon the concept of the equifinality of model structures or parameter sets. The operational definition of equifinality is that multiple model structures/parameters are equally capable of producing acceptable simulations of catchment processes such as runoff. Examining various aspects of this convention, in this thesis I demonstrate their shortcomings and introduce improvements including new approaches and insights for evaluating catchment models as multiple working hypotheses (MWH). First (Chapter 2), arguing that there is more to equifinality than just model structures/parameters, I propose a theoretical framework to conceptualise various facets of equifinality, based on a meta-synthesis of a broad range of literature across geosciences, system theory, and philosophy of science. I distinguish between process-equifinality (equifinality within the real-world systems/processes) and model-equifinality (equifinality within models of real-world systems), explain various aspects of each of these two facets, and discuss their implications for hypothesis testing and modelling of hydrological systems under uncertainty. Second (Chapter 3), building up on this theoretical framework, I propose that characterising model-equifinality based on model internal fluxes — instead of model parameters which is the current approach to account for model-equifinality — provides valuable insights for evaluating catchment models. I developed a new method for model evaluation — called flux mapping — based on the equifinality of runoff generating fluxes of large ensembles of catchment model simulations (1 million model runs for each catchment). Evaluating the model behaviour within the flux space is a powerful approach, beyond the convention, to formulate testable hypotheses for runoff generation processes at the catchment scale. Third (Chapter 4), I further explore the dependency of the flux map of a catchment model upon the choice of model structure and parameterisation, error metric, and data information content. I compare two catchment models (SIMHYD and SACRAMENTO) across 221 Australian catchments (known as Hydrologic Reference Stations, HRS) using multiple error metrics. I particularly demonstrate the fundamental shortcomings of two widely used error metrics — i.e. Nash–Sutcliffe efficiency and Willmott’s refined index of agreement — in model evaluation. I develop the skill score version of Kling–Gupta efficiency (KGEss), and argue it is a more reliable error metric that the other metrics. I also compare two strategies of random sampling (Latin Hypercube Sampling) and guided search (Shuffled Complex Evolution) for model parameterisation, and discuss their implications in evaluating catchment models as MWH. Finally (Chapter 5), I explore how catchment characteristics (physiographic, climatic, and streamflow response characteristics) control the flux map of catchment models (i.e. runoff generation hypotheses). To this end, I formulate runoff generating hypotheses from a large ensemble of SIMHYD simulations (1 million model runs in each catchment). These hypotheses are based on the internal runoff fluxes of SIMHYD — namely infiltration excess overland flow, interflow and saturation excess overland flow, and baseflow — which represent runoff generation at catchment scale. I examine the dependency of these hypotheses on 22 different catchment attributes across 186 of the HRS catchments with acceptable model performance and sufficient parameter sampling. The model performance of each simulation is evaluated using KGEss metric benchmarked against the catchment-specific calendar day average observed flow model, which is more informative than the conventional benchmark of average overall observed flow. I identify catchment attributes that control the degree of equifinality of model runoff fluxes. Higher degree of flux equifinality implies larger uncertainties associated with the representation of runoff processes at catchment scale, and hence pose a greater challenge for reliable and realistic simulation and prediction of streamflow. The findings of this chapter provides insights into the functional connectivity of catchment attributes and the internal dynamics of model runoff fluxes.

This thesis falls into the scientific areas of stochastic hydrology, hydrological modelling and hydroinformatics. It contributes with new practical solutions, new methodologies and large-scale results to predictive modelling of hydrological processes, specifically to solving two interrelated technical problems with emphasis on the latter. These problems are:
(A) hydrological time series forecasting by exclusively using endogenous predictor variables (hereafter, referred to simply as “hydrological time series forecasting”); and
(B) stochastic process-based modelling of hydrological systems via probabilistic post-processing (hereafter, referred to simply as “probabilistic hydrological post-processing”).
For the investigation of these technical problems, the thesis forms and exploits a novel predictive modelling and benchmarking toolbox. This toolbox is consisted of:
(i) approximately 6 000 hydrological time series (sourced from larger freely available datasets),
(ii) over 45 ready-made automatic models and algorithms mostly originating from the four major families of stochastic, (machine learning) regression, (machine learning) quantile regression, and conceptual process-based models,
(iii) seven flexible methodologies (which together with the ready-made automatic models and algorithms consist the basis of our modelling solutions), and
(iv) approximately 30 predictive performance evaluation metrics.
Novel model combinations coupled with different algorithmic argument choices result in numerous model variants, many of which could be perceived as new methods. All the utilized models (i.e., the ones already available in open software, as well as those automated and proposed in the context of the thesis) are flexible, computationally convenient and fast; thus, they are appropriate for large-sample (even global-scale) hydrological investigations. Such investigations are implied by the (mainly) algorithmic nature of the methodologies of the thesis. In spite of this nature, the thesis also provides innovative theoretical supplements to its practical and methodological contribution.
Technical problem (A) is examined in four stages. During the first stage, a detailed framework for assessing forecasting techniques in hydrology is introduced. Complying with the principles of forecasting and contrary to the existing hydrological (and, more generally, geophysical) time series forecasting literature (in which forecasting performance is usually assessed within case studies), the introduced framework incorporates large-scale benchmarking. The latter relies on big hydrological datasets, large-scale time series simulation by using classical stationary stochastic models, many automatic forecasting models and algorithms (including benchmarks), and many forecast quality metrics. The new framework is exploited (by utilizing part of the predictive modelling and benchmarking toolbox of the thesis) to provide large-scale results and useful insights on the comparison of stochastic and machine learning forecasting methods for the case of hydrological time series forecasting at large temporal scales (e.g., the annual and monthly ones), with emphasis on annual river discharge processes. The related investigations focus on multi-step ahead forecasting.
During the second stage of the investigation of technical problem (A), the work conducted during the previous stage is expanded by exploring the one-step ahead forecasting properties of its methods, when the latter are applied to non-seasonal geophysical time series. Emphasis is put on the examination of two real-world datasets, an annual temperature dataset and an annual precipitation dataset. These datasets are examined in both their original and standardized forms to reveal the most and least accurate methods for long-run one-step ahead forecasting applications, and to provide rough benchmarks for the one-year ahead predictability of temperature and precipitation.
The third stage of the investigation of technical problem (A) includes both the examination-quantification of predictability of monthly temperature and monthly precipitation at global scale, and the comparison of a large number of (mostly stochastic) automatic time series forecasting methods for monthly geophysical time series. The related investigations focus on multi-step ahead forecasting by using the largest real-world data sample ever used so far in hydrology for assessing the performance of time series forecasting methods.
With the fourth (and last) stage of the investigation of technical problem (A), the multiple-case study research strategy is introduced −in its large-scale version− as an innovative alternative to conducting single- or few-case studies in the field of geophysical time series forecasting. To explore three sub-problems associated with hydrological time series forecasting using machine learning algorithms, an extensive multiple-case study is conducted. This multiple-case study is composed by a sufficient number of single-case studies, which exploit monthly temperature and monthly precipitation time series observed in Greece. The explored sub-problems are lagged variable selection, hyperparameter handling, and comparison of machine learning and stochastic algorithms.
Technical problem (B) is examined in three stages. During the first stage, a novel two-stage probabilistic hydrological post-processing methodology is developed by using a theoretically consistent probabilistic hydrological modelling blueprint as a starting point. The usefulness of this methodology is demonstrated by conducting toy model investigations. The same investigations also demonstrate how our understanding of the system to be modelled can guide us to achieve better predictive modelling when using the proposed methodology.
During the second stage of the investigation of technical problem (B), the probabilistic hydrological modelling methodology proposed during the previous stage is validated. The validation is made by conducting a large-scale real-world experiment at monthly timescale. In this experiment, the increased robustness of the investigated methodology with respect to the combined (by this methodology) individual predictors and, by extension, to basic two-stage post-processing methodologies is demonstrated. The ability to “harness the wisdom of the crowd” is also empirically proven.
Finally, during the third stage of the investigation of technical problem (B), the thesis introduces the largest range of probabilistic hydrological post-processing methods ever introduced in a single work, and additionally conducts at daily timescale the largest benchmark experiment ever conducted in the field. Additionally, it assesses several theoretical and qualitative aspects of the examined problem and the application of the proposed algorithms to answer the following research question: Why and how to combine process-based models and machine learning quantile regression algorithms for probabilistic hydrological modelling?

Hydroclimatic time series analysis focuses on a few feature types (e.g., autocorrelations, trends, extremes), which describe a small portion of the entire information content of the observations. Aiming to exploit a larger part of the available information and, thus, to deliver more reliable results (e.g., in hydroclimatic time series clustering contexts), here we approach hydroclimatic time series analysis differently, i.e., by performing massive feature extraction. In this respect, we develop a big data framework for hydroclimatic variable behaviour characterization. This framework relies on approximately 60 diverse features and is completely automatic (in the sense that it does not depend on the hydroclimatic process at hand). We apply the new framework to characterize mean monthly temperature, total monthly precipitation and mean monthly river flow. The applications are conducted at the global scale by exploiting 40-year-long time series originating from over 13 000 stations. We extract interpretable knowledge on seasonality, trends, autocorrelation, long-range dependence and entropy, and on feature types that are met less frequently in the literature. We further compare the examined hydroclimatic variable types in terms of this knowledge and, identify patterns related to the spatial variability of the features. For this latter purpose, we also propose and exploit a hydroclimatic time series clustering methodology. This new methodology is based on Breiman's random forests. The descriptive and exploratory insights gained by the global-scale applications prove the usefulness of the adopted feature compilation in hydroclimatic contexts. Moreover, the spatially coherent patterns characterizing the clusters delivered by the new methodology build confidence in its future exploitation. Given this spatial coherence and the scale-independent nature of the features (which makes them particularly useful in forecasting and simulation contexts), we believe that this methodology could also be beneficial within regionalization frameworks, in which knowledge on hydrological similarity is exploited in technical and operative terms.

To evaluate models as hypotheses, we developed the method of Flux Mapping to construct a hypothesis space based on dominant runoff generating mechanisms. Acceptable model runs, defined as total simulated flow with similar (and minimal) model error, are mapped to the hypothesis space given their simulated runoff components. In each modeling case, the hypothesis space is the result of an interplay of factors: model structure and parameterization, chosen error metric, and data information content. The aim of this study is to disentangle the role of each factor in model evaluation. We used two model structures (SACRAMENTO and SIMHYD), two parameter sampling approaches (Latin Hypercube Sampling of the parameter space and guided-search of the solution space), three widely used error metrics (Nash-Sutcliffe Efficiency - NSE, Kling-Gupta Efficiency skill score - KGEss, and Willmott refined Index of Agreement - WIA), and hydrological data from a large sample of Australian catchments. First, we characterized how the three error metrics behave under different error types and magnitudes independent of any modeling. We then conducted a series of controlled experiments to unpack the role of each factor in runoff generation hypotheses. We show that KGEss is a more reliable metric compared to NSE and WIA for model evaluation. We further demonstrate that only changing the error metric -- while other factors remain constant -- can change the model solution space and hence vary model performance, parameter sampling sufficiency, and or the flux map. We show how unreliable error metrics and insufficient parameter sampling impair model-based inferences, particularly runoff generation hypotheses.

We propose a brisk method for uncertainty estimation in hydrology which maximizes the probabilistic efficiency of the estimated confidence bands over the whole range of the predicted variables. It is an innovative approach framed within the blueprint we proposed in 2012 for stochastic physically-based modelling of hydrological systems. We present the theoretical foundation which proves that global uncertainty can be estimated with an integrated approach by tallying the empirical joint distribution of predictions and predictands in the calibration phase. We also theoretically prove the capability of the method to correct the bias and to fit heteroscedastic uncertainty for any probability distribution of the modelled variable. The method allows the incorporation of physical understanding of the modelled process along with its sources of uncertainty. We present an application to a toy case to prove the capability of the method to correct the bias and the entire distribution function of the predicting model. We also present a case study of a real world catchment. We prepare open source software to allow reproducibility of the results and replicability to other catchments. We term the new approach with the acronym BLUE CAT: Brisk Local Uncertainty Estimation by Conditioning And Tallying.

Recently, a stochastic data-driven framework was introduced for forecasting uncertain multiscale hydrological and water resources processes (e.g., streamflow, urban water demand (UWD)) that uses wavelet decomposition of input data to address multiscale change and stochastics to account for input variable selection, parameter, and model output uncertainty (Quilty et al., 2019). The former study considered all sources of uncertainty together. In contrast, this study explores how input variable selection uncertainty and wavelet decomposition impact probabilistic forecasting performance by considering eight variations of this framework that either include/ignore wavelet decomposition and varying levels of uncertainty: 1) none; 2) parameter; 3) parameter and model output; and 4) input variable selection, parameter, and model output. For a daily UWD forecasting case study in Montreal (Canada), substantial improvements in forecasting performance (e.g., 16–30% improvement in the mean interval score) was achieved when input variable selection uncertainty and wavelet decomposition were included within the framework.

Delivering useful hydrological forecasts is critical for urban and agricultural water management, hydropower generation, flood protection and management, drought mitigation and alleviation, and river basin planning and management, among others. In this work, we present and appraise a new methodology for hydrological time series forecasting. This methodology is based on simple combinations. The appraisal is made by using a big dataset consisted of 90-year-long mean annual river flow time series from approximately 600 stations. Covering large parts of North America and Europe, these stations represent various climate and catchment characteristics, and thus can collectively support benchmarking. Five individual forecasting methods and 26 variants of the introduced methodology are applied to each time series. The application is made in one-step ahead forecasting mode. The individual methods are the last-observation benchmark, simple exponential smoothing, complex exponential smoothing, automatic autoregressive fractionally integrated moving average (ARFIMA) and Facebook's Prophet, while the 26 variants are defined by all the possible combinations (per two, three, four or five) of the five afore-mentioned methods. The findings have both practical and theoretical implications. The simple methodology of the study is identified as well-performing in the long run. Our large-scale results are additionally exploited for finding an interpretable relationship between predictive performance and temporal dependence in the river flow time series, and for examining one-year ahead river flow predictability.

We conduct a large-scale benchmark experiment aiming to advance the use of machine-learning quantile regression algorithms for probabilistic hydrological post-processing “at scale” within operational contexts. The experiment is set up using 34-year-long daily time series of precipitation, temperature, evapotranspiration and streamflow for 511 catchments over the contiguous United States. Point hydrological predictions are obtained using the Génie Rural à 4 paramètres Journalier (GR4J) hydrological model and exploited as predictor variables within quantile regression settings. Six machine-learning quantile regression algorithms and their equal-weight combiner are applied to predict conditional quantiles of the hydrological model errors. The individual algorithms are quantile regression, generalized random forests for quantile regression, generalized random forests for quantile regression emulating quantile regression forests, gradient boosting machine, model-based boosting with linear models as base learners and quantile regression neural networks. The conditional quantiles of the hydrological model errors are transformed to conditional quantiles of daily streamflow, which are finally assessed using proper performance scores and benchmarking. The assessment concerns various levels of predictive quantiles and central prediction intervals, while it is made both independently of the flow magnitude and conditional upon this magnitude. Key aspects of the developed methodological framework are highlighted, and practical recommendations are formulated. In technical hydro-meteorological applications, the algorithms should be applied preferably in a way that maximizes the benefits and reduces the risks from their use. This can be achieved by (i) combining algorithms (e.g., by averaging their predictions) and (ii) integrating algorithms within systematic frameworks (i.e., by using the algorithms according to their identified skills), as our large-scale results point out.

Uncertainty analysis is an integral part of any scientific modeling, particularly within the domain of hydrological sciences given the various types and sources of uncertainty. At the center of uncertainty rests the concept of equifinality, i.e. reaching a given endpoint (finality) through different pathways. The operational definition of equifinality in hydrological modeling is that various model structures and/or parameter sets (i.e. equal pathways) are equally capable of reproducing a similar (not necessarily identical) hydrological outcome (i.e. finality). Here we argue that there is more to model-equifinality than model structures/parameters, i.e. other model components can give rise to model-equifinality and/or could be used to explore equifinality within model space. We identified six facets of model-equifinality namely model structure, parameters, performance metrics, initial and boundary conditions, inputs, and internal fluxes. Focusing on model internal fluxes, we developed a methodology called Flux Mapping that has fundamental implications in understanding and evaluating model process-representation within the paradigm of multiple working hypotheses. To illustrate this, we examine the equifinality of runoff fluxes of a conceptual rainfall-runoff model for a number of different Australian catchments. We demonstrate how flux maps can give new insights into the model behavior that cannot be captured by conventional model evaluation methods. We discuss the advantages of flux space, as a sub-space of the model space not usually examined, over parameter space. We further discuss the utility of flux mapping in hypothesis generation and testing, extendable to any field of scientific modeling of open complex systems under uncertainty.

This study introduces a method to quantify the conditional predictive uncertainty in hydrological post-processing contexts when it is cumbersome to calculate the likelihood (intractable likelihood). Sometimes, it can be difficult to calculate the likelihood itself in hydrological modelling, specially working with complex models or with ungauged catchments. Therefore, we propose the ABC post-processor that exchanges the requirement of calculating the likelihood function by the use of some sufficient summary statistics and synthetic datasets. The aim is to show that the conditional predictive distribution is qualitatively similar produced by the exact predictive (MCMC post-processor) or the approximate predictive (ABC post-processor). We also use MCMC post-processor as a benchmark to make results more comparable with the proposed method. We test the ABC post-processor in two scenarios: (1) the Aipe catchment with tropical climate and a spatially-lumped hydrological model (Colombia) and (2) the Oria catchment with oceanic climate and a spatially-distributed hydrological model (Spain). The main finding of the study is that the approximate (ABC post-processor) conditional predictive uncertainty is almost equivalent to the exact predictive (MCMC post-processor) in both scenarios.

This paper is the outcome of a community initiative to identify major unsolved scientific problems in hydrology motivated by a need for stronger harmonisation of research efforts. The procedure involved a public consultation through on-line media, followed by two workshops through which a large number of potential science questions were collated, prioritised, and synthesised. In spite of the diversity of the participants (230 scientists in total), the process revealed much about community priorities and the state of our science: a preference for continuity in research questions rather than radical departures or redirections from past and current work. Questions remain focussed on process-based understanding of hydrological variability and causality at all space and time scales. Increased attention to environmental change drives a new emphasis on understanding how change propagates across interfaces within the hydrological system and across disciplinary boundaries. In particular, the expansion of the human footprint raises a new set of questions related to human interactions with nature and water cycle feedbacks in the context of complex water management problems. We hope that this reflection and synthesis of the 23 unsolved problems in hydrology will help guide research efforts for some years to come.

In this paper, we present a study of assessing regional water resources in a highly regulated river basin, the Dee river basin in the UK. The aims of this study include: 1) to address the issue of hydrological simulations for regulated river catchments; 2) to develop a new method revealing the trends of water resources for different scenarios (e.g. dry and wet) and 3) to facilitate water resources assessment under both climate change impacts and regulations. We use the SWAT model to model the hydrological process of the river basin with carefully designed configurations to isolate the impact from the water use regulations and practice. The spatially-distributed model simulations are then analysed with the quantile regression method to reveal the spatial and temporal patterns of regional water resources. The results show that this approach excels in presenting distributed, spatially focused trend information for extremely dry and wet scenarios, which can well address the needs of practitioners and decision-makers in dealing with long-term planning and climate change impact. The representation of the management practice in the modelling process helps identify the impact from both climate change and necessary regulatory practices, and as such lays a foundation for further study on how various management practices can mitigate the impact from other sources such as those from climate change. The novelty of the study lies in three aspects: 1) it devises a new way of isolating and representing management practice in the hydrological modelling process for regulated river basins; 2) it integrates the QR technique to study spatial-temporal trends of catchment water yield in a distributed fashion, for wet and dry scenarios instead of the mean; 3) the combination of the methods are able to reveal the impacts from various sources as well as their interactions with catchment water resources.

Random forests (RF) is a supervised machine learning algorithm, which has recently started to gain prominence in water resources applications. However, existing applications are generally restricted to the implementation of Breiman’s original algorithm for regression and classification problems, while numerous developments could be also useful in solving diverse practical problems in the water sector. Here we popularize RF and their variants for the practicing water scientist, and discuss related concepts and techniques, which have received less attention from the water science and hydrologic communities. In doing so, we review RF applications in water resources, highlight the potential of the original algorithm and its variants, and assess the degree of RF exploitation in a diverse range of applications. Relevant implementations of random forests, as well as related concepts and techniques in the R programming language, are also covered.

We introduce an ensemble learning post-processing methodology for probabilistic hydrological modelling. This methodology generates numerous point predictions by applying a single hydrological model, yet with different parameter values drawn from the respective simulated posterior distribution. We call these predictions “sister predictions”. Each sister prediction extending in the period of interest is converted into a probabilistic prediction using information about the hydrological model’s errors. This information is obtained from a preceding period for which observations are available, and is exploited using a flexible quantile regression model. All probabilistic predictions are finally combined via simple quantile averaging to produce the output probabilistic prediction. The idea is inspired by the ensemble learning methods originating from the machine learning literature. The proposed methodology offers larger robustness in performance than basic post-processing methodologies using a single hydrological point prediction. It is also empirically proven to “harness the wisdom of the crowd” in terms of average interval score, i.e., the obtained quantile predictions score no worse –usually better− than the average score of the combined individual predictions. This proof is provided within toy examples, which can be used for gaining insight on how the methodology works and under which conditions it can optimally covert point hydrological predictions to probabilistic ones. A large-scale hydrological application is made in a companion paper.

Seasonal precipitation forecasts at regional or local areas can help guide agricultural practice and urban water resource management. The North American multi-model ensemble (NMME) is a seasonal forecasting system providing precipitation forecasts globally. Bias correction and downscaling of the NMME is a critical step before applied at local scales. Here, the machine learning methods coupling with wavelet are used to correct the precipitation forecasts in NMME for 518 meteorological stations in China for eight models at 0.5–8.5 months leads. Compared with the traditional quantile mapping (QM) approach, the wavelet support vector machine (WSVM) and wavelet random forest (WRF) methods exhibit obvious advantage in downscaling, with an overall average improvement of Pearson’s correlation coefficient increasing by 0.05–0.3 and root mean square error (RMSE) reducing by 18–40 mm (21–33%) for individual models. Both the spatial and seasonal patterns of downscaled results demonstrate the superiority of wavelet machine learning methods over QM. A spatial analysis indicates that the corrected NMME precipitation forecasts show the best skill in South China, with an average RMSE of about 30 mm, while the worst skill in Central and Southwest China with a RMSE of 80 mm. In spite of the correction, the uncertainties of seasonal precipitation forecasts in summer and extreme wet cases are still large. However, the WSVM and WRF methods may serve as an effective tool in the bias correction of NMME precipitation forecasts.

Research within the field of hydrology often focuses on the statistical problem of comparing stochastic to machine learning (ML) forecasting methods. The performed comparisons are based on case studies, while a study providing large-scale results on the subject is missing. Herein, we compare 11 stochastic and 9 ML methods regarding their multi-step ahead forecasting properties by conducting 12 extensive computational experiments based on simulations. Each of these experiments uses 2000 time series generated by linear stationary stochastic processes. We conduct each simulation experiment twice; the first time using time series of 100 values and the second time using time series of 300 values. Additionally, we conduct a real-world experiment using 405 mean annual river discharge time series of 100 values. We quantify the forecasting performance of the methods using 18 metrics. The results indicate that stochastic and ML methods may produce equally useful forecasts.

In water resources applications (e.g., streamflow, rainfall-runoff, urban water demand [UWD], etc.), ensemble member selection and ensemble member weighting are two difficult yet important tasks in the development of ensemble forecasting systems. We propose and test a stochastic data-driven ensemble forecasting framework that uses archived deterministic forecasts as input and results in probabilistic water resources forecasts. In addition to input data and (ensemble) model output uncertainty, the proposed approach integrates both ensemble member selection and weighting uncertainties, using input variable selection and data-driven methods, respectively. Therefore, it does not require one to perform ensemble member selection and weighting separately. We applied the proposed forecasting framework to a previous real-world case study in Montreal, Canada, to forecast daily UWD at multiple lead times. Using wavelet-based forecasts as input data, we develop the Ensemble Wavelet-Stochastic Data-Driven Forecasting Framework, the first multiwavelet ensemble stochastic forecasting framework that produces probabilistic forecasts. For the considered case study, several variants of Ensemble Wavelet-Stochastic Data-Driven Forecasting Framework, produced using different input variable selection methods (partial correlation input selection and Edgeworth Approximations-based conditional mutual information) and data-driven models (multiple linear regression, extreme learning machines, and second-order Volterra series models), are shown to outperform wavelet- and nonwavelet-based benchmarks, especially during a heat wave (first time studied in the UWD forecasting literature).

We provide contingent empirical evidence on the solutions to three problems associated with univariate time series forecasting using machine learning (ML) algorithms by conducting an extensive multiple-case study. These problems are: (a) lagged variable selection, (b) hyperparameter handling, and (c) comparison between ML and classical algorithms. The multiple-case study is composed by 50 single-case studies, which use time series of mean monthly temperature and total monthly precipitation observed in Greece. We focus on two ML algorithms, i.e. neural networks and support vector machines, while we also include four classical algorithms and a naïve benchmark in the comparisons. We apply a fixed methodology to each individual case and, subsequently, we perform a cross-case synthesis to facilitate the detection of systematic patterns. We fit the models to the deseasonalized time series. We compare the one- and multi-step ahead forecasting performance of the algorithms. Regarding the one-step ahead forecasting performance, the assessment is based on the absolute error of the forecast of the last monthly observation. For the quantification of the multi-step ahead forecasting performance we compute five metrics on the test set (last year’s monthly observations), i.e. the root mean square error, the Nash-Sutcliffe efficiency, the ratio of standard deviations, the coefficient of correlation and the index of agreement. The evidence derived by the experiments can be summarized as follows: (a) the results mostly favour using less recent lagged variables, (b) hyperparameter optimization does not necessarily lead to better forecasts, (c) the ML and classical algorithms seem to be equally competitive.

Highlights The predictability of drought in China was examined using heterogeneous models. The 'low POD low POF' and 'high POD high POF' phenomena were presented. The performance of drought prediction remains poor with low PODs and high POFs. Abstract The predictability of droughts in China was investigated using a series of statistical, dynamic and hybrid models. The results indicate that, statistical models exhibit better skill in forecasting the Standardized Precipitation Index in six months (SPI6) than dynamic models. Overall, the ensemble streamflow prediction (ESP) method and wavelet machine learning models outperform other statistical models in forecasting SPI6. The hybrid model can improve the performance of SPI6 forecast by combining statistical and dynamic models using Bayesian model averaging (BMA) method. As for drought onset detection, the 'low probability of detection (POD) low probability of false alarm (POF)' and 'high POD high POF' phenomena exist in statistical and dynamic models, respectively. On average, less than 20% drought onsets can be detected in statistical models while less than 40% in dynamic models, with more than 40% false alarms appearing in statistical models and more than 75% in dynamic models. The hybrid model can slightly balance them, resulting in a POD of 20% and a POF of 50%. In spite of the low predictability, some stations with high equitable threat score (ETS) can be used in early drought warning under certain requirement. These conclusions may help improving drought prediction at a local or national scale.

We assess the performance of random forests and Prophet in forecasting daily streamflow up to seven days ahead in a river in the US. Both the assessed forecasting methods use past streamflow observations, while random forests additionally use past precipitation information. For benchmarking purposes we also implement a naïve method based on the previous streamflow observation, as well as a multiple linear regression model utilizing the same information as random forests. Our aim is to illustrate important points about the forecasting methods when implemented for the examined problem. Therefore, the assessment is made in detail at a sufficient number of starting points and for several forecast horizons. The results suggest that random forests perform better in general terms, while Prophet outperforms the naïve method for forecast horizons longer than three days. Finally, random forests forecast the abrupt streamflow fluctuations more satisfactorily than the three other methods.

We assess the performance of the recently introduced Prophet model in multi-step ahead forecasting of monthly streamflow by using a large dataset. Our aim is to compare the results derived through two different approaches. The first approach uses past information about the time series to be forecasted only (standard approach), while the second approach uses exogenous predictor variables alongside with the use of the endogenous ones. The additional information used in the fitting and forecasting processes includes monthly precipitation and/or temperature time series, and their forecasts respectively. Specifically, the exploited exogenous (observed or forecasted) information considered at each time step exclusively concerns the time of interest. The algorithms based on the Prophet model are in total four. Their forecasts are also compared with those obtained using two classical algorithms and two benchmarks. The comparison is performed in terms of four metrics. The findings suggest that the compared approaches are equally useful.

The simplest way to forecast geophysical processes, an engineering problem with a widely recognised challenging character, is the so called “univariate time series forecasting” that can be implemented using stochastic or machine learning regression models within a purely statistical framework. Regression models are in general fast-implemented, in contrast to the computationally intensive Global Circulation Models, which constitute the most frequently used alternative for precipitation and temperature forecasting. For their simplicity and easy applicability, the former have been proposed as benchmarks for the latter by forecasting scientists. Herein, we assess the one-step ahead forecasting performance of 20 univariate time series forecasting methods, when applied to a large number of geophysical and simulated time series of 91 values. We use two real-world annual datasets, a dataset composed by 112 time series of precipitation and another composed by 185 time series of temperature, as well as their respective standardized datasets, to conduct several real-world experiments. We further conduct large-scale experiments using 12 simulated datasets. These datasets contain 24 000 time series in total, which are simulated using stochastic models from the families of Autoregressive Moving Average and Autoregressive Fractionally Integrated Moving Average. We use the first 50, 60, 70, 80 and 90 data points for model-fitting and model-validation and make predictions corresponding to the 51st, 61st, 71st, 81st and 91st respectively. The total number of forecasts produced herein is 2 177 520, among which 47 520 are obtained using the real-world datasets. The assessment is based on eight error metrics and accuracy statistics. The simulation experiments reveal the most and least accurate methods for long-term forecasting applications, also suggesting that the simple methods may be competitive in specific cases. Regarding the results of the real-world experiments using the original (standardized) time series, the minimum and maximum medians of the absolute errors are found to be 68 mm (0.55) and 189 mm (1.42) respectively for precipitation, and 0.23 °C (0.33) and 1.10 °C (1.46) respectively for temperature. Since there is an absence of relevant information in the literature, the numerical results obtained using the standardized real-world datasets could be used as rough benchmarks for the one-step ahead predictability of annual precipitation and temperature.

We investigate the predictability of monthly temperature and precipitation by applying automatic univariate time series forecasting methods to a sample of 985 40-year-long monthly temperature and 1552 40-year-long monthly precipitation time series. The methods include a naïve one based on the monthly values of the last year, as well as the random walk (with drift), AutoRegressive Fractionally Integrated Moving Average (ARFIMA), exponential smoothing state-space model with Box–Cox transformation, ARMA errors, Trend and Seasonal components (BATS), simple exponential smoothing, Theta and Prophet methods. Prophet is a recently introduced model inspired by the nature of time series forecasted at Facebook and has not been applied to hydrometeorological time series before, while the use of random walk, BATS, simple exponential smoothing and Theta is rare in hydrology. The methods are tested in performing multi-step ahead forecasts for the last 48 months of the data. We further investigate how different choices of handling the seasonality and non-normality affect the performance of the models. The results indicate that: (a) all the examined methods apart from the naïve and random walk ones are accurate enough to be used in long-term applications; (b) monthly temperature and precipitation can be forecasted to a level of accuracy which can barely be improved using other methods; (c) the externally applied classical seasonal decomposition results mostly in better forecasts compared to the automatic seasonal decomposition used by the BATS and Prophet methods; and (d) Prophet is competitive, especially when it is combined with externally applied classical seasonal decomposition.

The long-range dependence (LRD) is considered an inherent property of geophysical processes, whose presence increases uncertainty. Here we examine the spatial behaviour of LRD in precipitation by regressing the Hurst parameter estimate of mean annual precipitation instrumental data which span from 1916-2015 and cover a big area of the earth's surface on location characteristics of the instrumental data stations. Furthermore, we apply the Mann-Kendall test under the LRD assumption (MKt-LRD) to reassess the significance of observed trends. To summarize the results, the LRD is spatially clustered, it seems to depend mostly on the location of the stations, while the predictive value of the regression model is good. Thus when investigating for LRD properties we recommend that the local characteristics should be considered. The application of the MKt LRD suggests that no significant monotonic trend appears in global precipitation, excluding the climate type D (snow) regions in which positive significant trends appear.
Supplementary information files are hosted at: https://doi.org/10.6084/m9.figshare.4892447.v1

One of the important issues in hydrological modelling is to specify the initial conditions of the catchment since it has a major impact on the response of the model. Although this issue should be a high priority among modelers, it has remained unaddressed by the community. The typical suggested warm-up period for the hydrological models has ranged from one to several years, which may lead to an underuse of data. The model warm-up is an adjustment process for the model to reach an ‘optimal’ state, where internal stores (e.g., soil moisture) move from the estimated initial condition to an ‘optimal’ state. This study explores the warm-up period of two conceptual hydrological models, HYMOD and IHACRES, in a southwestern England catchment. A series of hydrologic simulations were performed for different initial soil moisture conditions and different rainfall amounts to evaluate the sensitivity of the warm-up period. Evaluation of the results indicates that both initial wetness and rainfall amount affect the time required for model warm up, although it depends on the structure of the hydrological model. Approximately one and a half months are required for the model to warm up in HYMOD for our study catchment and climatic conditions. In addition, it requires less time to warm up under wetter initial conditions (i.e., saturated initial conditions). On the other hand, approximately six months is required for warm-up in IHACRES, and the wet or dry initial conditions have little effect on the warm-up period. Instead, the initial values that are close to the optimal value result in less warm-up time. These findings have implications for hydrologic model development, specifically in determining soil moisture initial conditions and warm-up periods to make full use of the available data, which is very important for catchments with short hydrological records.

Time series forecasting using machine learning algorithms has gained popularity recently. Random forest is a machine learning algorithm implemented in time series forecasting; however, most of its forecasting properties have remained unexplored. Here we focus on assessing the performance of random forests in one-step forecasting using two large datasets of short time series with the aim to suggest an optimal set of predictor variables. Furthermore, we compare its performance to benchmarking methods. The first dataset is composed by 16,000 simulated time series from a variety of Autoregressive Fractionally Integrated Moving Average (ARFIMA) models. The second dataset consists of 135 mean annual temperature time series. The highest predictive performance of RF is observed when using a low number of recent lagged predictor variables. This outcome could be useful in relevant future applications, with the prospect to achieve higher predictive accuracy.

A non-parametric method is applied to quantify residual uncertainty in
hydrologic streamflow forecasting. This method acts as a post-processor on
deterministic model forecasts and generates a residual uncertainty
distribution. Based on instance-based learning, it uses a k
nearest-neighbour search for similar historical hydrometeorological
conditions to determine uncertainty intervals from a set of historical
errors, i.e. discrepancies between past forecast and observation. The
performance of this method is assessed using test cases of hydrologic
forecasting in two UK rivers: the Severn and Brue. Forecasts in retrospect
were made and their uncertainties were estimated using kNN resampling and two
alternative uncertainty estimators: quantile regression (QR) and uncertainty
estimation based on local errors and clustering (UNEEC). Results show that
kNN uncertainty estimation produces accurate and narrow uncertainty intervals
with good probability coverage. Analysis also shows that the performance of
this technique depends on the choice of search space. Nevertheless, the
accuracy and reliability of uncertainty intervals generated using kNN
resampling are at least comparable to those produced by QR and UNEEC. It is
concluded that kNN uncertainty estimation is an interesting alternative to
other post-processors, like QR and UNEEC, for estimating forecast
uncertainty. Apart from its concept being simple and well understood, an
advantage of this method is that it is relatively easy to implement.

A problem frequently met in engineering hydrology is the forecasting of hydrologic variables conditional on their historical observations and the hindcasts and forecasts of a deterministic model. On the contrary, it is a common practice for climatologists to use the output of general circulation models (GCMs) for the prediction of climatic variables despite their inability to quantify the uncertainty of the predictions. Here we apply the well-established Bayesian Processor of Forecasts (BPF) for forecasting hydroclimatic variables using stochastic models through coupling them with GCMs. We extend the BPF to cases where long-term persistence appears, using the Hurst-Kolmogorov process (HKp, also known as fractional Gaussian noise) and we investigate analytically its properties. We apply the framework to calculate the distributions of the mean annual temperature and precipitation stochastic processes for the time period 2016-2100 in the United States of America conditional on historical observations and the respective output of GCMs.

Over the years, the Standard Least Squares (SLS) has been the most commonly adopted criterion for the calibration of hydrological models, despite the fact that they generally do not fulfill the assumptions made by the SLS method: very often errors are autocorrelated, heteroscedastic, biased and/or non-Gaussian. Similarly to recent papers, which suggest more appropriate models for the errors in hydrological modeling, this paper addresses the challenging problem of jointly estimate hydrological and error model parameters (joint inference) in a Bayesian framework, trying to solve some of the problems found in previous related researches. This paper performs a Bayesian joint inference through the application of different inference models, as the known SLS or WLS and the new GL++ and GL++Bias error models. These inferences were carried out on two lumped hydrological models which were forced with daily hydrometeorological data from a basin of the MOPEX project. The main finding of this paper is that a joint inference, to be statistically correct, must take into account the joint probability distribution of the state variable to be predicted and its deviation from the observations (the errors). Consequently, the relationship between the marginal and conditional distributions of this joint distribution must be taken into account in the inference process. This relation is defined by two general statistical expressions called the Total Laws (TLs): the Total Expectation and the Total Variance Laws. Only simple error models, as SLS, do not explicitly need the TLs implementation. An important consequence of the TLs enforcement is the reduction of the degrees of freedom in the inference problem namely, the reduction of the parameter space dimension. This research demonstrates that non-fulfillment of TLs produces incorrect error and hydrological parameter estimates and unreliable predictive distributions. The target of a (joint) inference must be fulfilling the error model hypotheses rather than to achieve the better fitting to the observations. Consequently, for a given hydrological model, the resulting performance of the prediction, the reliability of its predictive uncertainty, as well as the robustness of the parameter estimates, will be exclusively conditioned by the degree in which errors fulfill the error model hypotheses.

The last decade has seen growing research in producing probabilistic hydro-meteorological forecasts and increasing their reliability. This followed the promise that, supplied with information about uncertainty, people would take better risk-based decisions. In recent years, therefore, research and operational developments have also started focusing attention on ways of communicating the probabilistic forecasts to decision-makers. Communicating probabilistic forecasts includes preparing tools and products for visualisation, but also requires understanding how decision-makers perceive and use uncertainty information in real time. At the EGU General Assembly 2012, we conducted a laboratory-style experiment in which several cases of flood forecasts and a choice of actions to take were presented as part of a game to participants, who acted as decision-makers. Answers were collected and analysed. In this paper, we present the results of this exercise and discuss if we indeed make better decisions on the basis of probabilistic forecasts.

Model averaging is statistical method that is widely used to quantify the conceptual uncertainty of environmental system models and to improve the sharpness and skill of forecast ensembles of multi-model prediction systems. Here, I present a MATLAB toolbox for postprocessing of forecast ensembles. This toolbox, called MODELAVG implements many different model averaging techniques, including methods that provide point forecasts only, and methods that produce a forecast distribution of the variable(s) of interest. MCMC simulation with DREAM is used for averaging methods without a direct closed-form solution of their point forecasts. The toolbox returns to the user (among others) a vector (or matrix with posterior samples) of weights and (if appropriate) standard deviation(s) of the members' forecast distribution, a vector of averaged forecasts (and performance metrics thereof), and (if appropriate) estimates of the width and coverage of the forecast distribution, and convergence diagnostics of the DREAM algorithm. The toolbox also creates many different figures with the results of each method. Three case studies illustrate the capabilities of the MODELAVG toolbox.

Bayesian inference has found widespread application and use in science and engineering to reconcile Earth system models with data, including prediction in space (interpolation), prediction in time (forecasting), assimilation of observations and deterministic/stochastic model output, and inference of the model parameters. Bayes theorem states that the posterior probability, p(H|Y~) of a hypothesis, H is proportional to the product of the prior probability, p( H) of this hypothesis and the likelihood, L(H|Y~) of the same hypothesis given the new observations, Y~, or p(H|Y~)∝p(H)L(H|Y~). In science and engineering, H often constitutes some numerical model, ℱ(x) which summarizes, in algebraic and differential equations, state variables and fluxes, all knowledge of the system of interest, and the unknown parameter values, x are subject to inference using the data Y~. Unfortunately, for complex system models the posterior distribution is often high dimensional and analytically intractable, and sampling methods are required to approximate the target. In this paper I review the basic theory of Markov chain Monte Carlo (MCMC) simulation and introduce a MATLAB toolbox of the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm developed by Vrugt et al. (2008a, 2009a) and used for Bayesian inference in fields ranging from physics, chemistry and engineering, to ecology, hydrology, and geophysics. This MATLAB toolbox provides scientists and engineers with an arsenal of options and utilities to solve posterior sampling problems involving (among others) bimodality, high-dimensionality, summary statistics, bounded parameter spaces, dynamic simulation models, formal/informal likelihood functions (GLUE), diagnostic model evaluation, data assimilation, Bayesian model averaging, distributed computation, and informative/noninformative prior distributions. The DREAM toolbox supports parallel computing and includes tools for convergence analysis of the sampled chain trajectories and post-processing of the results. Seven different case studies illustrate the main capabilities and functionalities of the MATLAB toolbox.

Many watershed models used within the hydrologic research community assume (by default) stationary conditions - that is - the key watershed properties that control water flow are considered to be time-invariant. This assumption is rather convenient and pragmatic and opens up the wide arsenal of (multivariate) statistical and nonlinear optimization methods for inference of the (temporally-fixed) model parameters. Several contributions to the hydrologic literature have brought into question the continued usefulness of this stationary paradigm for hydrologic modeling. This paper builds on the likelihood-free diagnostics approach of Vrugt and Sadegh [2013] and uses a diverse set of hydrologic summary metrics to test the stationary hypothesis and detect changes in the watersheds response to hydro-climatic forcing. Models with fixed parameter values cannot simulate adequately temporal variations in the summary statistics of the observed catchment data, and consequently the DREAM(ABC) algorithm cannot find solutions that sufficiently honor the observed metrics. We demonstrate that the presented methodology is able to differentiate successfully between watersheds that are classified as stationary and those that have undergone significant changes in land-use, urbanization and/or hydro-climatic conditions, and thus are deemed nonstationary. This article is protected by copyright. All rights reserved.

Benchmarking the quality of river discharge data and understanding its information content for hydrological analyses is an important task for hydrologic science. There is a wide variety of techniques to assess discharge uncertainty. However, few studies have developed generalized approaches to quantify discharge uncertainty. This study presents a generalized framework for estimating discharge uncertainty at many gauging stations with different errors in the stage-discharge relationship. The methodology utilizes a nonparametric LOWESS regression within a novel framework that accounts for uncertainty in the stage-discharge measurements, scatter in the stage-discharge data and multisection rating curves. The framework was applied to 500 gauging stations in England and Wales and we evaluated the magnitude of discharge uncertainty at low, mean and high flow points on the rating curve. The framework was shown to be robust, versatile and able to capture place-specific uncertainties for a number of different examples. Our study revealed a wide range of discharge uncertainties (10–397% discharge uncertainty interval widths), but the majority of the gauging stations (over 80%) had mean and high flow uncertainty intervals of less than 40%. We identified some regional differences in the stage-discharge relationships, however the results show that local conditions dominated in determining the magnitude of discharge uncertainty at a gauging station. This highlights the importance of estimating discharge uncertainty for each gauging station prior to using those data in hydrological analyses.

In operational hydrology, estimation of the predictive
uncertainty of hydrological models used for flood modelling
is essential for risk-based decision making for flood
warning and emergency management. In the literature, there
exists a variety of methods analysing and predicting uncertainty.
However, studies devoted to comparing the performance
of the methods in predicting uncertainty are limited.
This paper focuses on the methods predicting model residual
uncertainty that differ in methodological complexity: quantile
regression (QR) and UNcertainty Estimation based on local
Errors and Clustering (UNEEC). The comparison of the
methods is aimed at investigating how well a simpler method
using fewer input data performs over a more complex method
with more predictors. We test these two methods on several
catchments from the UK that vary in hydrological characteristics
and the models used. Special attention is given to
the methods’ performance under different hydrological conditions.
Furthermore, normality of model residuals in data
clusters (identified by UNEEC) is analysed. It is found that
basin lag time and forecast lead time have a large impact on
the quantification of uncertainty and the presence of normality
in model residuals’ distribution. In general, it can be said
that both methods give similar results. At the same time, it
is also shown that the UNEEC method provides better performance
than QR for small catchments with the changing
hydrological dynamics, i.e. rapid response catchments. It is
recommended that more case studies of catchments of distinct
hydrologic behaviour, with diverse climatic conditions,
and having various hydrological features, be considered.

We introduce an ensemble learning post-processing methodology for probabilistic hydrological modelling. This methodology generates numerous point predictions by applying a single hydrological model, yet with different parameter values drawn from the respective simulated posterior distribution. We call these predictions “sister predictions”. Each sister prediction extending in the period of interest is converted into a probabilistic prediction using information about the hydrological model’s errors. This information is obtained from a preceding period for which observations are available, and is exploited using a flexible quantile regression model. All probabilistic predictions are finally combined via simple quantile averaging to produce the output probabilistic prediction. The idea is inspired by the ensemble learning methods originating from the machine learning literature. The proposed methodology offers larger robustness in performance than basic post-processing methodologies using a single hydrological point prediction. It is also empirically proven to “harness the wisdom of the crowd” in terms of average interval score, i.e., the obtained quantile predictions score no worse –usually better− than the average score of the combined individual predictions. This proof is provided within toy examples, which can be used for gaining insight on how the methodology works and under which conditions it can optimally convert point hydrological predictions to probabilistic ones. A large-scale hydrological application is made in a companion paper.

Post-processing of hydrological model simulations using machine learning algorithms can be applied to quantify the uncertainty of hydrological predictions. Combining multiple diverse machine learning algorithms (referred to as base-learners) using stacked generalization (stacking, i.e. a type of ensemble learning) is considered to improve predictions relative to the base-learners. Here we propose stacking of quantile regression and quantile regression forests. Stacking is performed by minimising the interval score of the quantile predictions provided by the ensemble learner, which is a linear combination of quantile regression and quantile regression forests. The proposed ensemble learner post-processes simulations of the GR4J hydrological model for 511 basins in the contiguous US. We illustrate its significantly improved performance relative to the base-learners used and a less prominent improvement relative to the “hard to beat in practice” equal-weight combiner.

The finding of important explanatory variables for the location and scale parameters of the generalized extreme value (GEV) distribution, when the latter is used for the modelling of annual streamflow maxima, is known to have reduced the uncertainties in inferences, as estimated through regional flood frequency analysis frameworks. However, important explanatory variables have not been found for the GEV shape parameter, despite its critical significance, which stems from the fact that it determines the behaviour of the upper tail of the distribution. Here we examine the nature of the shape parameter by revealing its relationships with basin attributes. We use a dataset that comprises information about daily streamflow and forcing, climatic indices, topographic, land cover, soil and geological characteristics of 591 basins with minimal human influence in the contiguous United States. We propose a framework that uses random forests and linear models to find (a) important predictor variables of the shape parameter and (b) an interpretable model with high predictive performance. The process of study comprises of assessing the predictive performance of the models, selecting a parsimonious predicting model and interpreting the results in an ad-hoc manner. The findings suggest that the median of the shape parameter is 0.19, the shape parameter mostly depends on climatic indices, while the selected prediction model is a linear one and results in more than 20% higher accuracy in terms of RMSE compared to a naïve approach. The implications are important, since it is shown that incorporating the regression model into regional flood frequency analysis frameworks can considerably reduce the predictive uncertainties.

One important component of continental-scale hydrologic modeling is quantifying the level of uncertainty in long-term hydrologic simulations and providing a range of possible simulated streamflow and/or runoff values for gaged and ungaged locations. In this paper, uncertainty was quantified for simulated streamflow and runoff generated from a monthly water balance model (MWBM) at 1575 streamgages and 109,951 hydrologic response units (HRUs), which span the conterminous United States (CONUS). A stochastic-approach, which incorporated the properties of modeled streamflow residuals back into the simulated model output, was used to create time series of upper and lower uncertainty intervals (UIs) around the simulated monthly time series. This approach was applied to an existing hydrologic regionalization implementation. Metrics used to evaluate the UIs across the CONUS (the coverage ratio, average width index, and interval skill score) indicated that on average the UIs were reliable, skillful, and sharp in being able to both contain measured streamflow observations and reduce estimates of uncertainty based on expected model predictions. These uncertainty evaluation metrics can complement each other in characterizing model skill and uncertainty over large-scale domains.

The objective of this study was to understand whether spatial differences in runoff generation mechanisms affect the magnitudes of diurnal streamflow fluctuations during low flow periods and which part of the catchment induces the diurnal streamflow signal. The spatiotemporal variability of the streamflow fluctuations observed at 12 locations in the 66-ha Hydrological Open Air Laboratory experimental catchment in Austria was explained by differences in the vegetation cover and runoff generation mechanisms. Almost a quarter of the volume associated with diurnal streamflow fluctuations at the catchment outlet was explained by transpiration from vegetation along the tributaries; more than three quarters was due to transpiration by the riparian forest along the main stream. The lag times between radiative forcing and evapotranspiration estimated by a solar radiation-driven model increased from 3 to 11 hr from spring to autumn. The recession time scales increased from 21 days in spring to 54 days in autumn. Observations and model simulations suggest that a separation of scales in transpiration effects on low flows exists both in time and space; that is, the diurnal streamflow fluctuations are induced by transpiration from the riparian vegetation, while most of the catchment evapotranspiration, such as evapotranspiration from the crop fields further away from the stream, do not influence the diurnal signal in streamflow.

Quantile regression quantifies the association of explanatory variables with a conditional quantile of a dependent variable without assuming any specific conditional distribution. It hence models the quantiles, instead of the mean as done in standard regression. In cases where either the requirements for mean regression, such as homoscedasticity, are violated or interest lies in the outer regions of the conditional distribution, quantile regression can explain dependencies more accurately than classical methods. However, many quantile regression papers are rather theoretical so the method has still not become a standard tool in applications. In this article, we explain quantile regression from an applied perspective. In particular, we illustrate the concept, advantages and disadvantages of quantile regression using two datasets as examples.