FIGURE 2 - uploaded by Henning Oppel
Content may be subject to copyright.
| Case study basins of the Upper Main (upper left) and Regen (lower right) in south-east Germany. Five gauges (triangle) have been used for local application and as training data for (trans-) regional application (circle).

| Case study basins of the Upper Main (upper left) and Regen (lower right) in south-east Germany. Five gauges (triangle) have been used for local application and as training data for (trans-) regional application (circle).

Source publication
Article
Full-text available
Can machine learning effectively lower the effort necessary to extract important information from raw data for hydrological research questions? On the example of a typical water-management task, the extraction of direct runoff flood events from continuous hydrographs, we demonstrate how machine learning can be used to automate the application of ex...

Contexts in source publication

Context 1
... the spatial arrangement of the chosen gauges shows (Figure 2), training and validation gauges have been selected to cover similar relationships of neighboring and nested catchments. Additionally, the training and validation sets have been compiled to cover the same ranges of catchments area. ...
Context 2
... each data set from the catchments marked as training gauges in Table 1 and Figure 2, we tested if flood hydrograph separation could be automated by means of ML. Like in the previous section, we randomly chose 50% of the available flood event data for training of the algorithms. ...
Context 3
... a regional transfer has been tested. Here, we used the data sets from the local application (sections 3.2, 4.1) to train the MLalgorithms and validated their performance at five new gauges in the same basin, i.e., regional neighborhood (Figure 2). Likewise to the procedure in section 4.1, we analyzed the impact of training data on the performance. ...
Context 4
... on this results, we asked if the algorithms could be applied to catchments of another basin, i.e., if the trained algorithms could be used in a trans-regional application. Likewise to the regional application, trained algorithms were used to estimate the time stamps of event begin and end of the floods events, but in this case for catchments in the Regen basin (Figure 2). The results of the trans-regional applications approved our findings of the regional application (Figure 10). ...

Similar publications

Article
Full-text available
Urban pluvial flooding is a threatening natural hazard in urban areas all over the world, especially in recent years given its increasing frequency of occurrence. In order to prevent flood occurrence and mitigate the subsequent aftermath, urban water managers aim to predict precipitation characteristics, including peak intensity, arrival time and d...

Citations

... In hydrological sciences, machine learning has been used in applications such as precipitation analysis (Sun and Tang, 2020), rainfall-runoff processes (Hsu et al., 1995;Minns and Hall, 1996;Dawson and Wilby, 1998;Abrahart and See, 2000;Duan et al., 2020;Oppel and Mewes, 2020), groundwater hydrology (Karandish and Šimnek, 2016;Sahu et al., 2020), reservoir hydrology (Bai et al., 2016;Mital et al., 2020), hydraulic networks (Dibike et al., 1999), river basin management (Solomatine and Ostfeld, 2008), flow mapping (Zhu and Guo, 2014), land use analysis (Loukika et al., 2021), and disaster risk management (Whitehurst et al., 2021). Explainable artificial intelligence (XAI) is a subdomain of machine learning that aid in the interpretability of machine learning models by helping users understand how their 'black-box' models operate (Maksymiuk et al., 2020;Althoff et al., 2021). ...
Article
The Prediction in Ungauged Basins (PUB) initiative set out to improve the understanding of hydrological processes with an aim of improving hydrologic models for application in ungauged basins. With a majority of basins around the world essentially ungauged, this suggests the need to shift from calibration-based models that rely on observed streamflow data to models based on process understanding. This is especially important in natural infrastructure planning projects such as investments in the conservation of wetlands across the watershed, where the lack of streamflow data hinders the quantification of their benefits (such as flood attenuation), resulting in a difficulty in prioritization. This research sought to contribute to this growing body of literature by (a) developing visual tools and metrics for assessing flow dynamics and flood attenuation benefits of wetlands in relation to their position in the watershed, (b) examining distribution-based topographic metrics in regard to their efficacy in predicting hydrologic response and providing a methodology for examining other metrics in future studies, (c) building robust functional forms for two important catchment metrics: the width function and hypsometric curve, and (d) devising a hierarchical clustering approach to assess hydrological similarity and find analogous basins that is computationally efficient and has a potential for large-scale applications. Taken together, this study paves the way toward an analytical formulation of the geomorphological instantaneous unit hydrograph (GIUH) that can be used to assess the hydrological behavior in ungauged or data-scarce basins.
... On top of this, a peak-over-threshold criteria can be applied to consider only larger events (Norbiato et al., 2009;Tang & Carey, 2017). Recently, different studies have developed methodologies to avoid the baseflow separation (Fischer et al., 2021;Oppel & Mewes, 2020;Thiesen et al., 2019;Towler & McCreight, 2021). However, these methodologies still require the calibration of parameters or to manually train machine learning algorithms. ...
Article
Full-text available
Methodologies for rainfall‐runoff event identification from continuous time series suffer from significant subjectivity. In particular, whether they initiate the identification from rainfall or from the streamflow timeseries, they usually require baseflow separation and they need substantial modifications and parameters’ recalibration when changing temporal resolution of the data. Therefore, here we propose a novel objective methodology for event identification that is easily transferable across sites and temporal resolutions, without having to make subjective choices and adjust multiple parameters. The proposed method to identify rainfall‐runoff events is based on a time series analysis technique that simultaneously considers rainfall and streamflow time series and does not make any a priori assumptions about baseflow separation. The novel method allows also to produce a baseflow separation a posteriori by connecting the delimiters of identified streamflow events. Moreover, the proposed method can be applied at any time resolution as long as the resolution is high enough to capture the time delay between precipitation and runoff response. When comparing the results between the proposed and the traditional baseflow‐based event identification approach, we observe a good agreement in terms of event properties both at hourly and daily scale (correlation of runoff ratios between the two methods equal to 0.78 [daily data] and 0.84 [hourly data]). The analysis comparing hourly and daily event identifications with the proposed method reveals also that the novel method produces coherent events across different temporal resolutions (correlation of runoff ratios between daily and hourly data equal to 0.71).
... Busico et al. (2018) have used the groundwater major constituents and heavy metals to perform three steps of FA separately for identifying the governing source of groundwater pollution in their study region and the obtained results precisely justified the study. Statistical techniques are widely applied in several studies dealing with the Earth systems, including hydrological events, water management tasks, climate changes, etc. by various researchers/scientists all over the world (Aslam et al. 2018;Flowers 2018;Murray et al. 2018;Rhein 2019;Oppel and Mewes 2020). However, some drawbacks have been reported while performing these statistical approaches, such as flaws in uncertainty analysis, the precision of the results, high computational cost, and the requirement of bulk amount of data (Akpoti et al. 2019;Ardabili et al. 2020). ...
... In this regard, data-driven-based ML models are well applied in water resources studies, including Geographic Information System (GIS) and Remote Sensing (RS) platform viz., hydrological flow series prediction/streamflow simulation (Atiquzzaman and Kandasamy 2018;Tongal and Booij 2018), rainfall-runoff modeling/rainfall forecasting (Yu et al. 2017;Kratzet et al. 2019), interpretation of RS images (Cresson 2018; Li et al. 2019), modeling of evapotranspiration (ET) (Granata 2019;Valipour et al. 2019), flood prediction events (Mosavi et al. 2018;Oppel and Mewes 2020;Schmidt et al. 2020), forecasting of water demand in urban (Antunes et al. 2018;Bata et al. 2020), groundwater level predictions (Yoon et al. 2016), prediction of various chemical/heavy metal concentrations in groundwater (Bhagat et al. 2020;Bui et al. 2020), water quality index prediction (Singha et al. , 2021 and many more. It is no doubt that ML has, thus, gained a lot of attention (Shen 2018), with the majority of research studies implementing ML models for prediction purposes. ...
Article
Full-text available
The present study is aimed to assess the spatial variation of groundwater quality based on the influencing hydrogeological parameters in the surrounding mining areas of India’s one of the largest coal fields, the Korba coal field, Chhattisgarh, Central India. To achieve this goal, a knowledge-driven approach with the aid of a Machine Learning (ML) decision tree-based model, i.e., Classification and Regression Tree (CART) model, was developed to predict possible factors contributing to the degradation of groundwater quality in the selected regions. A total of five influencing factors were selected viz., water table depth (WTD), groundwater drawdown (DR), slope (S), elevation (E), and distance to mines (DTM), which were considered as the important input variables. Groundwater Quality Index (GWQI) values of 216 locations within a buffer zone of 20 km centered from the coal mining area were assigned as the target variables in the CART model. The influences of these factors on groundwater quality were assessed using a recursive partitioning combined with a pruned algorithm. Results showed that the significant factors followed decreasing trend of S (34%) > DTM (23%) > WTD (16%) > DR (15%) > E (12%). The model predicted relatively higher GWQI values attributed to the available lower ground slope in the study area. Similarly, wells situated within a 3 km radius of the buffer zone had low groundwater quality apparently due to the influence of mines. Higher GWQI values were observed in the wells having low WTD value (˂7.9 m) and higher DR in the study area. The results suggested that the anthropogenic activity is one of the major sources of groundwater contamination, whereas the impact of mines was only observed within a radius of 3 km from the center of the mining areas.
... Tarasova et al., 2018, Diederen et al., 2019, the latter focuses on the flood events only, for example by separating flood events for given flood peaks (e.g. Oppel and Mewes, 2020), which are characterised by a large increase of runoff (see German norm (DIN 4049-1, 1992). Due to the nature of such events, in Central Europe they are expected to occur seldomly within a year. ...
... However, event beginning and end are not provided directly. Instead, often additionally baseflow separation methods are applied, where it is assumed that a deviation of baseflow (the direct runoff) indicates a flood event (Oppel and Mewes, 2020). Though they are originally designed for the separation of runoff events, they are suitable in the context of floods when applied to apriori identified flood events (e.g. ...
... Recently, application of machine-learning techniques offered a new perspective on flood event separation (e.g. Thiesen et al., 2019;Oppel and Mewes, 2020). These techniques use pattern recognition to separate flood events and hence overcome the drawback of POT-based methods, that is the application of constant thresholds and the usage of baseflow separation. ...
Article
Full-text available
The classification of characteristics of flood events, like peak, volume, duration and baseflow components is essential for many hydrological applications such as multivariate flood statistics, the validation of rainfall-runoff models and comparative hydrology in general. The basis for estimations of these characteristics is formed by flood event separation. It requires an indicator for the time when a flood peak occurs as well as the definition of the beginning and end of a flood event and a subdivision of the total volume into direct and baseflow components. However, the variable nature of runoff and the multiple processes and impacts that determine rainfall-runoff relationships make a separation difficult, especially an automation of it. We propose a new statistics-based flood event separation that was developed to analyse long series of daily discharges automatically to obtain flood events for flood statistics. Moreover, the related flood-inducing precipitation is identified, allowing the estimation of the flood-inducing rainfall and the runoff coefficient. With an additional tool to manually check the separation results easily and quickly, expert knowledge can be included without much effort. The algorithm was applied to seven basins in Germany, covering alpine, mountainous and flatland catchments with different runoff processes. In a sensitivity analysis, the impact of chosen parameters was evaluated. The results show that the algorithm delivers reasonable results for all catchments and only needs manual adjustment for long timeslots with increasing or high baseflow. It reliably separates flood events only instead of all runoff events and the estimated beginning and end of an event was shifted in mean by less than one day compared to manual separation.
Article
Full-text available
The estimation of catchment response time (Tr) plays an important role in several hydrological and civil engineering design problems. The non-linear relationship between Tr and rainfall intensity necessitates the estimation of an event-based set of Tr values instead of a characteristic constant value. However, there is no generally accepted method to define individual rainfall-runoff events from time-series. Here we propose a new, automated method which results in the selection of rainfall-runoff events and the corresponding Tr values. The proposed method yields an event-based set of Tr values more efficiently than other existing methods and has only two parameters. The results of the new method were compared to those of a statistical and a semi-manual event selection approach. The latter calculates eight different Tr values, including the time of concentration, lag time, time to peak, and time to equilibrium. The median Tr value of the proposed method yields the strongest agreement with the median of the time elapsed between the maxima of the total rainfall and runoff with a root-mean-square error of 4.94 hours. It is also demonstrated that a median time of concentration value can be estimated as the maximum of the event based Tr values by the current method. A sensitivity analysis explores the robustness of the proposed method, and also yields the optima of its two parameters. Once calibrated, the present automated methodology dispenses with any event selection procedure.
Article
Full-text available
Understanding hydrological variability is of crucial importance for water resource management in sub-Saharan Africa (SSA). While existing studies typically focus on individual river basins, and suffer from incomplete records, this study provides a new perspective of trends and variability in hydrological flood and drought characteristics (frequency, duration, and intensity) across the entire SSA. This is achieved by: i) creating a 65-year long, complete daily streamflow dataset consisting of over 600 gauging stations; ii) quantifying changes in flood and drought characteristics between 1950 and 2014; iii) evaluating how decadal variability influences historical trends. Results of daily streamflow reconstructions using random forests provide satisfactory performance over most of SSA, except for parts of southern Africa. Using change-point and trend analyses, we identify three periods that characterise historical variations affecting hydrological extremes in western and central Africa, and some parts of southern Africa: i) the 1950s–60s and after the 1980s–90s, when floods (droughts) tend to be more (less) intense, more (less) frequent and more (less) persistent; and ii) the 1970s–80s, when floods (droughts) are less (more) intense, less (more) frequent and less (more) persistent. Finally, we reveal significant decadal variations in all flood and drought characteristics, which explain aperiodic increasing and decreasing trends. This stresses the importance of considering multiple time-periods when analysing recent trends, as previous assessments may have been unrepresentative of long-term changes.
Article
The prediction of hydrologic conditions in watersheds has manifold applications, ranging from flood disaster preparedness to water supply and environmental flow management. In watersheds with scarce or no flow data, it is difficult to make accurate hydrologic predictions. Past work has used similarity in single-valued properties of the terrain (for example, drainage area, mean slope) as the basis to relate flow conditions in gauged watersheds to the ungauged ones. The resulting predictions show modest accuracy and have a weak physical basis. In this study, we develop a physics-informed machine learning approach to extract features that represent the hydrologic dynamics–width function and hypsometric curve. These two geomorphometric measures are computed using functional forms fitted to estimates derived from digital elevation data. Furthermore, dynamically-similar groups are identified based on results from unsupervised clustering and divergence measures. Our approach paves the way towards a flexible and scalable machine learning approach that can be used to assess hydrologic similarity and improve prediction, one informed by physics of surface flow generation and transport in watersheds. A case study involving 72 sub-watersheds in the Narmada River Basin (India) is used to illustrate the new methodology.
Article
Full-text available
Statistical learning methods offer a promising approach for low-flow regionalization. We examine seven statistical learning models (Lasso, linear, and nonlinear-model-based boosting, sparse partial least squares, principal component regression, random forest, and support vector regression) for the prediction of winter and summer low flow based on a hydrologically diverse dataset of 260 catchments in Austria. In order to produce sparse models, we adapt the recursive feature elimination for variable preselection and propose using three different variable ranking methods (conditional forest, Lasso, and linear model-based boosting) for each of the prediction models. Results are evaluated for the low-flow characteristic Q95 (Pr(Q>Q95)=0.95) standardized by catchment area using a repeated nested cross-validation scheme. We found a generally high prediction accuracy for winter (RCV2 of 0.66 to 0.7) and summer (RCV2 of 0.83 to 0.86). The models perform similarly to or slightly better than a top-kriging model that constitutes the current benchmark for the study area. The best-performing models are support vector regression (winter) and nonlinear model-based boosting (summer), but linear models exhibit similar prediction accuracy. The use of variable preselection can significantly reduce the complexity of all the models with only a small loss of performance. The so-obtained learning models are more parsimonious and thus easier to interpret and more robust when predicting at ungauged sites. A direct comparison of linear and nonlinear models reveals that nonlinear processes can be sufficiently captured by linear learning models, so there is no need to use more complex models or to add nonlinear effects. When performing low-flow regionalization in a seasonal climate, the temporal stratification into summer and winter low flows was shown to increase the predictive performance of all learning models, offering an alternative to catchment grouping that is recommended otherwise.