Figure - available from: Water Resources Research
This content is subject to copyright. Terms and conditions apply.

# The major task for the machine learning algorithms in this study is presented in the upper part of the figure: To estimate the unknown tracer concentration, C, by training a machine learning algorithm to the pattern formed by a subset of discharge and a measured pair of tracers. The structure of the chosen algorithm for this study are shown in subplots a, b, c and d.

Source publication
Article
Full-text available
Karstic groundwater systems are often investigated by a combination of environmental or artificial tracers. One of the major downsides of tracer‐based methods is the limited availability of tracer measurements, especially in data sparse regions. This study presents an approach to systematically evaluate the information content of the available data...

## Citations

... T. Yang et al., 2016), water quality prediction (K. Chen et al., 2020;Lu & Ma, 2020;, data mining for sparse environmental data measurement (Mewes et al., 2020;Zhou, 2020), evapotranspiration estimation (Goyal et al., 2014;, groundwater management (Naghibi & Pourghasemi, 2016;Podgorski & Berg, 2020), and so on. It has been confirmed that machine learning is an effective tool to explore implicit relationships in complex nonlinear systems (Goodfellow et al., 2016;Lecun et al., 2015). ...
Article
Full-text available
Rivers play an important role in water supply, irrigation, navigation, and ecological maintenance. Forecasting the river hydrodynamic changes is critical for flood management under climate change and intensified human activities. However, efficient and accurate river modeling is challenging, especially with complex lake boundary conditions and uncontrolled downstream boundary conditions. Here, we proposed a coupled framework by taking the advantages of interpretability of physical hydrodynamic modeling and the adaptability of machine learning. Specifically, we coupled the Gated Recurrent Unit (GRU) with a 1‐D HydroDynamic model (GRU‐HD) and applied it to the middle and lower reaches of the Yangtze River, the longest river in China. We show that the GRU‐HD model could quickly and accurately simulate the water levels, streamflow, and water exchange rates between the Yangtze River and two important lakes (Poyang and Dongting), with most of the Kling‐Gupta efficiency coefficient (KGE $\mathrm{K}\mathrm{G}\mathrm{E}$) above 0.90. Using machine learning‐based predicted water levels, instead of the rating curve approach, as the downstream boundary conditions could improve the accuracy of modeling the downstream water levels of the lake‐connected river system. The GRU‐HD model is dedicated to the synergy of physical modeling and machine learning, providing a powerful avenue for modeling rivers with complex boundary conditions.
... Machine learning models are increasingly used to make hydrological predictions [71,72], and the most accurate versions tend to utilize ensemble models that combine inputs from independent algorithms before making final decisions [73][74][75]. Machine learning models can also be used to explore complex, non-linear relationships between predictor and target variables. ...
Article
Full-text available
Human agriculture, wastewater, and use of fossil fuels have saturated ecosystems with nitrogen and phosphorus, threatening biodiversity and human water security at a global scale. Despite efforts to reduce nutrient pollution, carbon and nutrient concentrations have increased or remained high in many regions. Here, we applied a new ecohydrological framework to ~12,000 water samples collected by the U.S. Environmental Protection Agency from streams and lakes across the contiguous U.S. to identify spatial and temporal patterns in nutrient concentrations and leverage (an indicator of flux). For the contiguous U.S. and within ecoregions, we quantified trends for sites sampled repeatedly from 2000 to 2019, the persistence of spatial patterns over that period, and the patch size of nutrient sources and sinks. While we observed various temporal trends across ecoregions, the spatial patterns of nutrient and carbon concentrations in streams were persistent across and within ecoregions, potentially because of historical nutrient legacies, consistent nutrient sources, and inherent differences in nutrient removal capacity for various ecosystems. Watersheds showed strong critical source area dynamics in that 2–8% of the land area accounted for 75% of the estimated flux. Variability in nutrient contribution was greatest in catchments smaller than 250 km ² for most parameters. An ensemble of four machine learning models confirmed previously observed relationships between nutrient concentrations and a combination of land use and land cover, demonstrating how human activity and inherent nutrient removal capacity interactively determine nutrient balance. These findings suggest that targeted nutrient interventions in a small portion of the landscape could substantially improve water quality at continental scales. We recommend a dual approach of first prioritizing the reduction of nutrient inputs in catchments that exert disproportionate influence on downstream water chemistry, and second, enhancing nutrient removal capacity by restoring hydrological connectivity both laterally and vertically in stream networks.
... Considering the recent wide applicability of "shallow" neural networks in hydrology, oceanology and atmospheric sciences (Crawford et al., 2019;Bergen et al., 2019;Berkhahn et al., 2019;Kulp and Strauss, 2019;Sezen et al., 2019;Flombaum et al., 2020;Jia et al., 2020;Diez-Sierra and del Jesus, 2020;Nourani et al., 2020;Mewes et al., 2020;Snieder et al., 2020), we hope the proposed method may be beneficial not only for stream temperature modelling, but possibly also other hydrological applications. ...
Article
For about two decades neural networks are widely used for river temperature modelling. However, in recent years one has to distinguish between the “classical” shallow neural networks, and deep learning networks. The applicability of rapidly developing deep learning networks to stream water temperature modelling may be limited, but some methods developed for deep learning, if properly re-considered, may efficiently improve performance of shallow networks. Dropout is widely considered the method that allows deep learning networks to avoid overfitting to training data, facilitating its implementations to versatile problems. Recently the successful applicability of dropout for river temperature modelling by means of shallow multilayer perceptron neural networks has been introduced. In the present study we propose to use dropout solely for input neurons of product unit neural networks for the purpose of stream temperature modelling. We perform tests on data collected from six catchments located in temperate climate zones on two continents in various orographic conditions. We show that the average performance of product unit neural networks trained with input dropout is better than the average performance of product units without dropout, product units with dropout applied to every layer of the networks, multilayer perceptron neural networks with or without dropout, and the semi-physical air2stream model. The advantage of product unit neural networks with input dropout is statistically significant on hilly or mountainous catchments; the performance on flat ones is similar to the performances obtained from competitive models.
... High-frequency conductivity measurements were effective predictors of all major ions derived from weathering of mountaintop removal mined watersheds (Ross et al., 2018). High-frequency sulphate time series were produced with discharge as an input variable for multiple machine learning algorithms (Mewes et al., 2020). Kisi and Parmar (2016) predicted monthly chemical oxygen demand in an Indian river with nutrient and other water quality information. ...
... We accepted default parameters for the RF model, including the number of trees required for the ensemble (n = 500) and the number of variables tried at each split in an individual tree (mtry = 2). We chose the SVM and RF models because both have been previously applied in hydrological contexts with strong results (e.g., Kim et al., 2020;Mewes et al., 2020). The main difference between the two is the RF uses discrete predictions, which can help identify non-linear patterns, and the SVM is a continuous function. ...
Article
Stream solute monitoring has produced many insights into ecosystem and Earth system functions. Although new sensors have provided novel information about the fine‐scale temporal variation of some stream water solutes, we lack adequate sensor technology to gain the same insights for many other solutes. We used two machine learning algorithms – Support Vector Machine and Random Forest – to predict concentrations at 15‐min resolution for 10 solutes, of which eight lack specific sensors. The algorithms were trained with data from intensive stream sensing and manual stream sampling (weekly) for four full years in a hydrologic reference stream within the Hubbard Brook Experimental Forest in New Hampshire, USA. The Random Forest algorithm was slightly better at predicting solute concentrations than the Support Vector Machine algorithm (Nash‐Sutcliffe efficiencies ranged from 0.35 to 0.78 for Random Forest compared to 0.29 to 0.79 for Support Vector Machine). Solute predictions were most sensitive to the removal of fluorescent dissolved organic matter, pH and specific conductance as independent variables for both algorithms, and least sensitive to dissolved oxygen and turbidity. The predicted concentrations of calcium and monomeric aluminium were used to estimate catchment solute yield, which changed most dramatically for aluminium because it concentrates with stream discharge. These results show great promise for using a combined approach of stream sensing and intensive stream discrete sampling to build information about the high‐frequency variation of solutes for which an appropriate sensor or proxy is not available.
... hervorgehobene Nützlichkeit von DDMs als Unterstützung von konzeptionellen Modellen. Beispielsweise als Kombination der Modellstrukturen zur Vorhersage einer Zielgröße, bspw.Ratto et al. (2007), oder als Verwendung von ML zur Aufbereitung und Validierung von Eingangsdaten(Mewes, Oppel & Hartmann, 2019; für konzeptionelle Modelle. Unabhängig von bestehenden Konzepten wird allgemein empfohlen so viele Informationen über das Zielgebiet wie möglich in die gewählte Modellstruktur einzubeziehen. ...
Book
Full-text available
In data scarce regions common hydrological models cannot be applied. Due to the missing data for calibration conceptual rainfall-runoff models cannot be parametrized and, hence, not be used for operational predictions or definition of design hydrographs. Geomorphological instantaneous unit hydrographs (GIUH) offer the unique possibility to adapt model structure to catchment structure and thereby increase model accuracy in data scarce regions. The parsimony as well as the incorporation of catchment structures is a valuable advantage for prediction in ungauged basins. The drawback of GIUH-models is the required parametrization for each individual event. Hence, applications of GIUH-models have been limited to scientific reanalysis of past rainfall-runoff events. In this study an ensemble of machine learning (ML) algorithms was applied for the estimation of the required parameters in ungauged basins. Indicators of meteorological forcing and initial catchments states were used as predictors for the estimation of drainage velocity and runoff coefficient. Eight algorithms were applied and their performance has been evaluated in a leave-one-out study in three major catchments in South-East Germany. Predictions provided by the algorithms were given to an improved GIUH-model to transform 2-dimensional precipitation data into an ensemble prediction of hydrographs in ungauged basins. The performance of the improved GIUH-model and the ML-Algorithms were evaluated separately. The GIUH-structure proved to be as flexible as demanded. In a synthetic case study it was able to incorporate different catchment shapes, flowpath distributions and characteristics into the shape of predicted hydrographs. A variation of drainage velocity by flowpath was implemented and improved simulation results. Moreover, a parametrization directly from rainfall-runoff event analysis seemed possible, yet calibrated parameters led to better performance. The setup of the ML-module has been evaluated with respect to the predictors and data segmentation by model approach. In a subsequent regional application, data from all available gauges were used to train the algorithms. Withheld data was used to imitate a prediction in ungauged basin. The models showed an average relative error for drainage velocity of 20% and 40% for runoff volume. The error were lower afterwards by selective data composition, considering only a limited number of similar catchments for model training. The combination of both model components were tested subsequently. The mean efficiencies considering hydrograph timing, volume and variance were close to optimum value. Yet the model worked only in ensemble mode, because a single ML-algorithms proved not to be capable of imitating the full range of hydrological complexity. A comparison to a regionalized HBV-model showed superior results for the GIUH-ML model in ungauged catchments and equal results for gauged catchments. Finally, the possibility of deriving assumptions about hydrological processes from trained ML-dependencies has been discussed. For the performed case studies an assumption about changing dependencies of driving factors and the resulting ratio of flood volume and peak was derived.
Article
Full-text available
Rainfall-runoff simulation is vital for planning and controlling flood control events. Hydrology modeling using Hydrological Engineering Center—Hydrologic Modeling System (HEC-HMS) is accepted globally for event-based or continuous simulation of the rainfall-runoff operation. Similarly, machine learning is a fast-growing discipline that offers numerous alternatives suitable for hydrology research’s high demands and limitations. Conventional and process-based models such as HEC-HMS are typically created at specific spatiotemporal scales and do not easily fit the diversified and complex input parameters. Therefore, in this research, the effectiveness of Random Forest, a machine learning model, was compared with HEC-HMS for the rainfall-runoff process. Furthermore, we also performed a hydraulic simulation in Hydrological Engineering Center—Geospatial River Analysis System (HEC-RAS) using the input discharge obtained from the Random Forest model. The reliability of the Random Forest model and the HEC-HMS model was evaluated using different statistical indexes. The coefficient of determination (R2), standard deviation ratio (RSR), and normalized root mean square error (NRMSE) were 0.94, 0.23, and 0.17 for the training data and 0.72, 0.56, and 0.26 for the testing data, respectively, for the Random Forest model. Similarly, the R2, RSR, and NRMSE were 0.99, 0.16, and 0.06 for the calibration period and 0.96, 0.35, and 0.10 for the validation period, respectively, for the HEC-HMS model. The Random Forest model slightly underestimated peak discharge values, whereas the HEC-HMS model slightly overestimated the peak discharge value. Statistical index values illustrated the good performance of the Random Forest and HEC-HMS models, which revealed the suitability of both models for hydrology analysis. In addition, the flood depth generated by HEC-RAS using the Random Forest predicted discharge underestimated the flood depth during the peak flooding event. This result proves that HEC-HMS could compensate Random Forest for the peak discharge and flood depth during extreme events. In conclusion, the integrated machine learning and physical-based model can provide more confidence in rainfall-runoff and flood depth prediction.
Article
Full-text available
Statistical learning methods offer a promising approach for low-flow regionalization. We examine seven statistical learning models (Lasso, linear, and nonlinear-model-based boosting, sparse partial least squares, principal component regression, random forest, and support vector regression) for the prediction of winter and summer low flow based on a hydrologically diverse dataset of 260 catchments in Austria. In order to produce sparse models, we adapt the recursive feature elimination for variable preselection and propose using three different variable ranking methods (conditional forest, Lasso, and linear model-based boosting) for each of the prediction models. Results are evaluated for the low-flow characteristic Q95 (Pr(Q>Q95)=0.95) standardized by catchment area using a repeated nested cross-validation scheme. We found a generally high prediction accuracy for winter (RCV2 of 0.66 to 0.7) and summer (RCV2 of 0.83 to 0.86). The models perform similarly to or slightly better than a top-kriging model that constitutes the current benchmark for the study area. The best-performing models are support vector regression (winter) and nonlinear model-based boosting (summer), but linear models exhibit similar prediction accuracy. The use of variable preselection can significantly reduce the complexity of all the models with only a small loss of performance. The so-obtained learning models are more parsimonious and thus easier to interpret and more robust when predicting at ungauged sites. A direct comparison of linear and nonlinear models reveals that nonlinear processes can be sufficiently captured by linear learning models, so there is no need to use more complex models or to add nonlinear effects. When performing low-flow regionalization in a seasonal climate, the temporal stratification into summer and winter low flows was shown to increase the predictive performance of all learning models, offering an alternative to catchment grouping that is recommended otherwise.
Chapter
Hydrology is the science of studying the natural flow of water and the effect of human activity on the water. Hydrological modeling is essential for the management and conservation of water. In recent decades, machine learning (ML) has been applied efficiently in hydrology. In this study, the application of ML in four subfields of hydrology, including flood, precipitation estimation, water quality, and groundwater, is presented. This review shows that ML performs better in flood prediction than traditional data-driven and physical hydrology modeling, particularly in short-term flood forecasting. In addition, using the ML technique helps to estimate precipitation from satellite datasets. This study provides a review of the potential of ML in water quality and groundwater modeling. The study shows that using an optimization algorithm for parameter selection can improve the performance of ML. Moreover, modeling accuracy is often improved through ML hybridization. Finally, it is recommended that hydrologists use ML in their modeling owing to their low computational cost and high performance.
Preprint
Full-text available
Statistical learning methods offer a promising approach for low flow regionalization. We examine seven statistical learning models (lasso, linear and non-linear model based boosting, sparse partial least squares, principal component regression, random forest, and support vector machine regression) for the prediction of winter and summer low flow based on a hydrological diverse dataset of 260 catchments in Austria. In order to produce sparse models we adapt the recursive feature elimination for variable preselection and propose to use three different variable ranking methods (conditional forest, lasso and linear model based boosting) for each of the prediction models. Results are evaluated for the low flow characteristic Q95 (Pr(Q>Q95) = 0.95) standardized by catchment area using a repeated nested cross validation scheme. We found a generally high prediction accuracy for winter (R2CV of 0.66 to 0.7) and summer (R2CV of 0.83 to 0.86). The models perform similar or slightly better than a Top-kriging model that constitutes the current benchmark for the study area. The best performing models are support vector machine regression (winter) and non-linear model based boosting (summer), but linear models exhibit similar prediction accuracy. The use of variable preselection can significantly reduce the complexity of all models with only a small loss of performance. The so obtained learning models are more parsimonious, thus easier to interpret and more robust when predicting at ungauged sites. A direct comparison of linear and non-linear models reveals that non-linear relationships can be sufficiently captured by linear learning models, so there is no need to use more complex models or to add non-liner effects. When performing low flow regionalization in a seasonal climate, the temporal stratification into summer and winter low flows was shown to increase the predictive performance of all learning models, offering an alternative to catchment grouping that is recommended otherwise.