ArticlePDF Available

Information‐Based Machine Learning for Tracer Signature Prediction in Karstic Environments

Article

Information‐Based Machine Learning for Tracer Signature Prediction in Karstic Environments

Abstract and Figures

Karstic groundwater systems are often investigated by a combination of environmental or artificial tracers. One of the major downsides of tracer‐based methods is the limited availability of tracer measurements, especially in data sparse regions. This study presents an approach to systematically evaluate the information content of the available data, to interpret predictions of tracer concentration from machine learning algorithms, and to compare different machine learning algorithms to obtain an objective assessment of their applicability for predicting environmental tracers. There is a large variety of machine learning approaches, but no clear rules exist on which of them to use for this specific problem. In this study, we formulated a framework to choose the appropriate algorithm for this purpose. We compared four different well‐established machine learning algorithms (Support Vector Machines, Extreme Learning Machines, Decision Trees, and Artificial Neural Networks) in seven different karst springs in France for their capability to predict tracer concentrations, in this case SO42− and NO3−, from discharge. Our study reveals that the machine learning algorithms are able to predict some characteristics of the tracer concentration, but not the whole variance, which is caused by the limited information content in the discharge data. Nevertheless, discharge is often the only information available for a catchment, so the ability to predict at least some characteristics of the tracer concentrations from discharge time series to fill, for example, gaps or increase the database for consecutive analyses is a helpful application of machine learning in data sparse regions or for historic databases.
This content is subject to copyright. Terms and conditions apply.
InformationBased Machine Learning for Tracer
Signature Prediction in Karstic Environments
B. Mewes
1
, H. Oppel
1,2
, V. Marx
2
, and A. Hartmann
2
1
Institute of Hydrology, Water Resources and Environmental Engineering, RuhrUniversity Bochum, Bochum, Germany,
2
Chair of Hydrological Modeling and Water Resources, AlbertLudwigsUniversity of Freiburg, Freiburg, Germany
Abstract Karstic groundwater systems are often investigated by a combination of environmental or
articial tracers. One of the major downsides of tracerbased methods is the limited availability of tracer
measurements, especially in data sparse regions. This study presents an approach to systematically evaluate
the information content of the available data, to interpret predictions of tracer concentration from machine
learning algorithms, and to compare different machine learning algorithms to obtain an objective
assessment of their applicability for predicting environmental tracers. There is a large variety of machine
learning approaches, but no clear rules exist on which of them to use for this specic problem. In this study,
we formulated a framework to choose the appropriate algorithm for this purpose. We compared four
different wellestablished machine learning algorithms (Support Vector Machines, Extreme Learning
Machines, Decision Trees, and Articial Neural Networks) in seven different karst springs in France for
their capability to predict tracer concentrations, in this case SO
42
and NO
3
, from discharge. Our study
reveals that the machine learning algorithms are able to predict some characteristics of the tracer
concentration, but not the whole variance, which is caused by the limited information content in the
discharge data. Nevertheless, discharge is often the only information available for a catchment, so the ability
to predict at least some characteristics of the tracer concentrations from discharge time series to ll, for
example, gaps or increase the database for consecutive analyses is a helpful application of machine learning
in data sparse regions or for historic databases.
1. Introduction
Tracerbased methods are often the only way to separate stream ow components and to determine the ori-
gin of water (Kirchner, 2003;Klaus & McDonnell, 2013;Mei & Anagnostou, 2015;Mewes & Oppel, 2019;
Rimmer & Hartmann, 2014;Weiler et al., 2017). Especially in karstic environments, tracer investigations
allow a deeper understanding of the underlying karstic system and the interdependencies of discharge
and the current state of the subterraneous processes or storages (Aquilina et al., 2005;Gur et al., 2003;Lee
& Krothe, 2001).
The joint analysis of tracer data and discharge measurements is a common tool to derive information about
hydrological systems, for example, the identication of the origin of water within a catchment. Despite their
advantages, these approaches demand long time series of tracer measurements covering a wide range of
hydrological system dynamics (Garvelmann et al., 2017; Lee & Krothe, 2001). To describe catchments with
hydrological models, the link between tracer signatures and the system's hydrological state is of interest to
set up suitable calibration strategies. Although the dependency on tracer data in model studies is high, the
information content of tracer measurements has rarely been analyzed. Furthermore, the informationto
noise ratio in the data has to be high to derive the desired information about the system (Kelleher et al.,
2015). Another problem is the lack of available tracer databases that hinders many applications, especially
in data sparse regions. Here, machine learning could be useful because of the core concept to predict values
that are difcult to measure with input data that are straightforward to measure. If the algorithms are able to
predict tracer concentrations from discharge time series, datadriven interpolations of continuous tracer
concentrations time series can be obtained.
With the rise of machine learning technologies and further improvements in information technology, the
application of new approaches for data analysis and the interplay of data, information content, and results
have increased (Goodfellow et al., 2016; Kelleher et al., 2015). Machine learning is the umbrella term for
©2020. The Authors.
This is an open access article under the
terms of the Creative Commons
AttributionNonCommercial License,
which permits use, distribution and
reproduction in any medium, provided
the original work is properly cited and
is not used for commercial purposes.
RESEARCH ARTICLE
10.1029/2018WR024558
Special Section:
Big Data & Machine Learning
in Water Sciences: Recent
Progress and Their Use in
Advancing Science
Key Points:
Application of entropy and mutual
information reveals the information
content gap between discharge
derived from joint tracer and
discharge analyses
Understanding the information
content of hydrological data
enhances the interpretation of
machine learning prediction results
Similarities in information could be
used for regionalization of
catchment characteristics of
karstaffected catchments
Supporting Information:
Supporting Information S1
Correspondence to:
B. Mewes,
benjamin.mewes@rub.de
Citation:
Mewes, B., Oppel, H., Marx, V., &
Hartmann, A. (2020).
Informationbased machine learning
for tracer signature prediction in karstic
environments. Water Resources
Research,56, e2018WR024558. https://
doi.org/10.1029/2018WR024558
Received 4 DEC 2018
Accepted 9 JAN 2020
Accepted article online 11 JAN 2020
MEWES ET AL. 1of20
processes that extract patterns from data automatically (Goodfellow et al., 2016). Machine learningbased
algorithms are used in many hydrological applications (Raghavendra & Deka, 2014), like rainfallrunoff
modeling with articial neural networks (Hu et al., 2011;Nourani et al., 2009), precipitation forecasting
(Yu et al., 2017), evapotranspiration prediction (Tabari et al., 2012), baseow separation (Corzo &
Solomantine, 2007), measurement setup design (ChaconHurtado et al., 2017), streamow forecasting
(Shortridge et al., 2016;Shrestha & Solomatine, 2006;Taormina et al., 2015;Yaseen et al., 2016), the separa-
tion of ood events from time series of discharge (Mewes & Oppel, 2019), water resource management
(Fotovatikhah et al., 2018), and many more. In these studies, machine learning algorithms were mostly used
to replicate a system and transform a certain variable into the future. Machine learning was found a useful
tool to manipulate data in complex systems, like catchments, where the rules leading from input to output
are not completely describable. For example, using a MultiLayerPerceptron neural network, dispersion of a
tracer was evaluated for a small river in 1D prole (Piotrowski et al., 2007).
For machine learning algorithms the information content of training data is important (Han & Kamber,
2010;Kelleher et al., 2015;Vapnik, 2013). The Shannon entropy is a common concept in information theory
to analyze the information content of given data (Shannon, 1948; see also Fernando et al., 2009). Until now,
no study tried to predict natural tracer concentrations in karstic environments from discharge dynamics by
the application of machine learning algorithms to ll gaps between point measurements of tracer concentra-
tions. This strategy was chosen, because discharge is often the only available data source with an appropriate
temporal resolution for hydrological modeling at an event scale. In the database we used, some infrequent
tracer concentration measurements were available as point measurements. A machine learning tool capable
of lling these gaps would allow the application of databases of frequent discharge measurements and non-
frequent measured tracer concentrations. Additionally, an already trained algorithm could predict tracer
concentrations for catchments in which only a limited number of discharge measurements is available used.
Furthermore, it would qualify historic data for application in modeling approaches that require a higher
temporal resolution of tracer measurements. In karstic environments, the joint analysis of tracer data is
often the key for a deeper understanding of system states and behavior (Mudarra et al., 2019). Therefore,
we assume a high information content in the measured tracer data because they describe the complex inter-
action of subterraneous processes. Machine learning algorithms depend on information provided in the
data. Consequently, the available data sets of discharge and tracer measurements have to be analyzed on
explanatory power, what has not been done before for a database of karst springs. Furthermore, an informa-
tion contentbased analysis of the interpolated tracer measurements can be conducted by comparing the pre-
diction results with the information content of the input data.
In this study, we analyzed observed discharge and natural tracer data (sulfate, SO
42
, and nitrate, NO
3
)
from seven different karst springs across Europe regarding their information content. We took natural tracers
because they exist in varying concentrations and are measurable without any induced injection. We chose
nitrate and sulfate because they represent different residence times in the system. While nitrate represents
shallow fast owing water, sulfate represents the opposite origin: slow phreatic processes. We applied differ-
ent machine learning algorithms such as Support Vector Machines (SVM), Classication and Regression
Trees (CART), Extreme Learning Machines (ELM), and Articial Neural Networks (ANN), to estimate tracer
concentrations from discharge dynamics. We selected those four machine learning approaches that (a) are
well established in hydrology, (b) are used for pattern recognition in structured data sets, and (c) deliver to
a certain degree interpretable structures for the researcher. Furthermore, we compared different concepts
of prediction, including the univariate prediction that separately estimates each tracer with a specialized
machine and the multivariate estimation that tries to predict a set of tracers with a combined machine. We
tested each of the chosen approaches on the prediction capability in seven different catchments and created
a strategy to build a datadriven interpolation tool set for the interpolation of continuous time series of tracer
measurements. Finally, we linked the prediction results with the observed information content in the data as
well as with the mutual information between the chosen tracers.
2. Methods and Data
Sound results from machine learning approaches require data with a high informationtonoise ratio.
Moreover, the choice of the appropriate machine learning algorithm for this task is difcult to justify
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 2of20
without understanding the internal structure of the problem. Following the NoFreeLunchTheorem, all
available approaches should be equally suitable to solve this problem but with a different performance and
different demands to the data in terms of amount and quality (Wolpert & Macready, 1997). Accordingly,
without information for an a priori selection of the best machine learning approach to use, we chose four
structurally different approaches to estimate tracer concentrations in seven catchments. To quantify the
information content within the data set, we introduce concepts like continuous entropy and mutual informa-
tion. After dening these basic concepts, we explain the choice of machine learning algorithms in this study
and explain the further scheme of this application.
2.1. Entropy and Mutual Information
Shannon's model of entropy allows to quantify the amount of information gain by adding new data to the
analysis (Shannon, 1948). The entropy His dened by the chance of a sample X
d
to be of one of the given
classes x1;;xNd
fg
with P(x
n
) as the probability that X
d
=x
n
with a sample length N:
HX
d
ðÞ¼
N
n¼1
Px
n
ðÞlog2Pxn
ðÞ (1)
Because Shannon's entropy is only valid for discrete data, the concept was extended to the continuous
entropy for a continuous variableX
c
, which is in our case discharge:
hX
c
ðÞ¼
Ω
fxðÞlog2fxðÞdx(2)
where f(x) is the probability density function (PDF) of X
c
and Ωis the dened domain of X
c
(Gong et al.,
2014). To determine the explanatory power of data concerning a variable, for example, how much of the
information of NO
3
is explained by discharge only, we further extend the concept of continuous entropy
to conditional entropy (Thomas & Cover, 2006), where yis the tracer concentration and xis the discharge
sequence:
HYjXðÞ¼
xX;yY
Px;yðÞlog2
PxðÞ
Px;yðÞ (3)
Conditional entropy describes how much of variable ycan be explained by variable x. To describe the shared
information between two data points given as xand y, we apply the mutual information (Shannon, 1948;
Sharma, 2000). In our case, we investigate the shared information between the two chosen tracers NO
3
and SO
42
. The mutual information between two measurements is dened as
MI ¼∫∫fx;yx;yðÞlog2
fx;yx;yðÞ
fxxðÞfyyðÞ
"#
dxdy(4)
where f
x
(x) and f
y
(y) are marginal PDFs of xand yf
x,y
(x,y) is the joint PDF of xand y(Sharma, 2000). After
Sharma, 2000, the mutual information score from equation (4) can be approximated by
MI ¼1
N
N
i¼1
log2
fx;yxi;yi
ðÞ
fxxi
ðÞ
fyyi
ðÞ
"# (5)
In this approximation f
x
(x
i
), f
y
(y
i
), and f
x,y
(x
i
,y
i
) are marginal functions and joint densities at the same point
of the same sample (Fernando et al., 2009;Sharma, 2000). To estimate the density, we apply a kernel estima-
tor (Fernando et al., 2009). Without the kernel estimator a theoretical distribution of the MI has to be
assumed, which adds more bias to the approach. As nearly all models rely on the interplay of input and out-
put data the shared information through mutual information has to be weighted stronger than the internal
information represented through the continuous entropy.
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 3of20
2.2. Machine Learning Algorithms
The main aim of the paper is to use discharge data as a predictor for tracer concentrations because discharge
in streams and rivers is more commonly measured than tracer concentrations, especially in regions where
access to the site is limited and research relies on public databases. Therefore, we train machine learning
algorithms using time series of runoff to predict time series of tracer concentrations.
The discharge dynamics are captured by a window of discharge data from the original time series with tracer
measurement tas input for the machine learning algorithms. The machine learning algorithms predict the
tracer concentrations based on information from the discharge pattern (Figure 1). For training and valida-
tion, the predicted tracer concentrations are compared to the measured data (which is considered to repre-
sent the reality). To reduce overtting due to complex input data, an optimal length for the window of
discharge data has to be identied, which is discussed in detail in section 2.3. Without dening a window,
a LongShortMemory network can be applied, which requires a continuous time series of input and training
data. Due to a lack of continuous time series of tracer measurements, this approach was discarded.
Four structurally different machine learning algorithms are used in this study: SVM, CART, ELM, and Multi
LayerPerceptron ANN. These algorithms were chosen because of their suitability for regression problems
and their origin in two of the four main machine learning families: errorbased learning and information
based learning (Kelleher et al., 2015). Moreover, they are commonly applied in hydrology and deliver, to a
certain degree, structures that can be interpreted by the researcher. SVM and CART are not known to capture
temporal patterns in time series data. By the reduction from a complete time series to a window with a vari-
able length, temporal dependencies are reduced to dependencies of the relative position within the window.
Thus, the problem is diminished to a pattern recognition problem (Nasrabadi, 2007).
A SVM is an errorbased machine learning algorithm that tries to set up a regression to estimate the
unknown tracer concentration from the input discharge sequence (solid line in Figure 2(a)). This regression
is depicted through a hyperplane, for which the distance to the margin (dashed line in Figure 2(a)) and the
most distant feature, the socalled support vector, is maximized (Cortes & Vapnik, 1995; Raghavendra &
Deka, 2014). For a linear problem, this tting of a regression can easily be done, but most of the machine
learning problems, as the one presented here, are highly nonlinear. Therefore, we have to transfer the exist-
ing problem to a higher dimension where the problem becomes linear with a kernel function (Chang et al.,
2010; Kelleher et al., 2015). As the choice of the mapping kernel is highly problem specic, a selection of sev-
eral kernel functions (radial basis function, linear, polynomial, and sigmoid) was tested and the best kernel
was chosen (in terms of numerical stability and computational demands), in our case the radial basis func-
tion kernel. For more information on the choice of the kernel, see Vapnik (2013). The created boundary layer
is used to predict the unknown tracer concentration Cin the feature space through the input discharge
dynamic, represented as a green dot (Figure 2). Accordingly, the SVM tries to solve the regression problem
by transferring the discharge data into either a single tracer concentration or a set of tracer concentrations in
the multivariate output. Hence, the hyperplane represents the regression function to estimate the respective
tracer concentration from the discharge sequence.
CART builds decision trees that are guidebooks to estimate the tracer concentration from the discharge
values. The tree shows the ramications of decisions leading to the nal regression result (Breiman
et al., 1984; Kelleher et al., 2015; Quinlan, 1986). To build the tree, all discharge values are analyzed
in their ability to maximize the decrease of the residuum of the regression between observed and esti-
mated tracer concentration at each branch. The branching occurs on the descending order of error
reduction. As a result, the structure of the decision tree can be obtained as guidebook for unknown
values, in order to get the desired tracer concentration C(Figure 2(b)). In the given example, the dis-
charge value at position 0 has the highest inuence on error reduction and results in the decision
between the major branches, which are themselves as diversied as certain discharge values, resulting
in the nal leaves with the target value Crepresented as a green dot. The error reduction within the
tree for each node is calculated with the rootmeansquare error (RMSE) of the regression (see error
metrics section). The regression tree analyzes the discharge values to nd the values that have the high-
est inuence on the regression problem to determine the predicted tracer concentration. The depth of
the CART tree was limited to the number of input values from the time series of runoff in order to cap-
ture all details of the variability of discharge.
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 4of20
ANN and a ELM (Figures 2(c) and 2(d)) are both variations of neural networks that try to solve the regres-
sion or classication problem by imitating the structure of the human brain and by guiding the training data
through a network of hidden layers equipped with neurons (Haykin, 1999). Here, the input nodes are the
discharge values from the window of discharge values for estimation of the desired tracer concentration.
The hidden layers and nodes represent the underlying system, in this case the karst subsurface system.
The connection between nodes and layers is trained by the optimization of weights in order to minimize
the regression error. An ELM is a special case of an ANN: The nodes on the hidden layer receive their
weights only once. In the following, they remain constant over the process of network adaption. Only the
weights from the hidden layer to the output node are updated, which is called a feedforward network due
to the update direction of weight (Huang et al., 2004). Here, the discharge values are sent through the net-
work of nodes and hidden layers to identify the pattern and estimate the tracer amount. The network can
either be trained to estimate a single tracer or a set of tracers. Generally, the number of hidden layers is
restricted to a single hidden layer with half of the input window length as hidden nodes (and a minimum
of three hidden nodes for stability reasons).
To avoid overtting of the data, the number of input data was reduced to a maximum of half of the available
runoff data in the window with a minimum of three remaining runoff values as input data. Furthermore, the
random selection of input values was shufed 10 times and the mean prediction was taken to be represen-
tative for the specied window length.
Machine learning algorithms depend on the information content of the data (Goodfellow et al., 2016;
Kelleher et al., 2015). Consequently, we assume a link between the performance of the algorithm and the
information content of the data (dened in section 2.1). We train the algorithms by two different ways: (1)
by a univariate strategy estimating each tracer individually and (2) by a multivariate strategy that trains
one algorithm to estimate both tracers simultaneously. We expect that the multivariate strategy performs
better than the univariate as the combination (i.e., interaction) of data should lead to more incorporated
information than just the information content of a single data set. A globally trained algorithm to predict
a set of natural tracers would lower the interpretability of the results. Thus, we discarded the idea of a uni-
versal machine for tracer concentration prediction but focused on the two mentioned natural tracers.
2.3. Training
The discharge data have to be reduced to a window with an unknown length. This optimal length might be
highly subjective whether all information on the system's behavior is covered in the respective time span. The
window to be selected contains the tracer measurements and the number of discharge values depicting the
Figure 1. Workow of the analysis including the clipping of the window for the discharge data, the prediction of tracer
signatures by the machine learning algorithms, and the following comparison with measured tracer measurements.
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 5of20
discharge dynamics. As we do not know whether the window length depends on the chosen approach or
region we varied it from 1 to 180 days in steps of [1, 3, 6, 30, 60, 90, 180] with equally sized borders to face
the unknown optimal length. The window lengths chosen here represent natural breaks within the
classication of time to describe a system. We chose these different lengths of the window to include
short, medium, and longterm processes in the discharge data and to minimize the number of data sets
analyzed. Therefore, we focused on time spans like a month, two months, and half a year. The discharge
in the sequence is normalized by the catchment specic average discharge to reduce the inuence of the
peak. The measured tracer concentrations are also normalized by the specic mean of this tracer for the
catchment. The share of the training data is increased gradually to understand how simulation
performance is inuenced by the size of the training data. Therefore, we varied the amount of data used
Figure 2. The major task for the machine learning algorithms in this study is presented in the upper part of the gure: To estimate the unknown tracer concentra-
tion, C, by training a machine learning algorithm to the pattern formed by a subset of discharge and a measured pair of tracers. The structure of the chosen algo-
rithm for this study are shown in subplots a, b, c and d.
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 6of20
for training from 1090% of the available time series for the catchment. Using the length of the covered time
series instead would be insufcient because the input data includes runoff sequences that might overlap.
Hence, the number of tracer measurements is important.
We train the algorithms with both a univariate and a multivariate strategy. We compare the results from the
two learning strategies to quantify the potential improvement of shared information and joint learning.
Furthermore, we discuss the inuence of the window length on prediction quality. This is relevant as the
length of the input sequences can create a bias in the learning process. If we choose the length too short,
we might not cover all relevant processes, whereas sequences that are too long might confuse the algorithms
in nding a suitable system. In the last step, we elaborate on the transferability of the algorithms to be used
as predictors at catchments for which they were not trained. That way, we can test whether machine learn-
ing tools and their results might reveal hidden similarities in catchment responses or even more interesting
the application of machine learning is suitable for the prediction of missing tracer measurement data.
2.4. Evaluation and Error Measures
To compare the different machine learning approaches, training strategies and window lengths, quantitative
performance measures were used.
In order to show the general prediction performance, the RMSE was applied for observed and estimated tra-
cer measurements, which becomes 0 for a perfect prediction. To calculate RMSE for the tracer content, we
differentiated between measured and predicted c
T
, with Nbeing the number of samples in the validation:
RMSE ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
N
i¼1
cTmeas
cTpred

2
N
v
u
u
u
t(6)
We apply RMSE for both tracers individually and calculate the mean of both as an indication of the com-
bined error. Because of the variable window length, individual RMSEs are calculated for each approach
and each region. As the normalization in RMSE does not show the direction of error in contrast to the mean
error which is less robust against outliers, we also analyze the average concentration ratio cTthat provides
information about the general strength and direction of the error of prediction:
cT¼1
N
N
i¼1
cTpred
cTmeas
(7)
cTis able to show the direction and the strength of the error by its sign and its difference from one, respec-
tively. Again, because of the multitude of different window lengths, a range of cTvalues is calculated for each
region and approach.
As all the presented measures are merely a measure of quantitative performance, the qualitative perfor-
mance is measured with the accuracy of the internal ranking of the two tracer signatures. Therefore, we cal-
culated the accuracy by an error matrix of true and false combinations of ranking. The deducted measure of
accuracy acc is able to describe the qualitative information between the two tracers as an accuracy with a
ranking (Han & Kamber, 2010):
acc ¼posTrue
pos
pos
pos þnegðÞ
þnegTrue
neg
neg
pos þnegðÞ
¼posTrue
pos þnegðÞ
þnegTrue
pos þnegðÞ (8)
With pos
True
and neg
True
as the ranking of the pair of tracers in concentration, for example,cTAobs>cTBobs but
cTAest<cTBest results in a neg prediction, whereas cTAobs>cTBobs and cTAest >cTBest counts as pos
True
. Accuracy
shows the ability of the machine learning method to replicate the ranking of the tracer concentrations in
order to replicate changing tracer dynamics.
The three measures considered here to judge the performance represent the major key characteristics of the
prediction results. The overall goodness represented as the RMSE, deviation from the mean and the ranking
between both tracers. So, by a correct ranking the qualitative information that tracer concentration domi-
nates is still captured, even though the variance of the prediction is not high enough.
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 7of20
2.5. Data
The target variables of the machine learning prediction are the concentrations of SO
42
and NO
3
that act as
a combined tracer signature. While NO
3
is known as an indicator for fast water uxes from the soil or epi-
karst, that is, the shallow subsurface (Hartmann et al., 2016; Mahler & Garner, 2009), SO
42
in karst systems
is usually derived from geogenic processes that dissolve evaporates in the phreatic subsurface that sustains
base ow (Hartmann et al., 2017; Mudarra & Andreo, 2011). We chose these two tracers as an example for
any tracer combination. Due to their different origins, the shallow subsurface (NO
3
) and the phreatic zone
(SO
42
), we expect that their observations of dissolved evaporates include different information.
The data for our analyses originate from seven different karst springs in France (Table 1 and Figure 3). Tracer
measurements were normalized by individual mean values, leading to seven different means (Eaufrance,
2018a). The tracers analyzed in this study are natural tracers; no humaninduced injections were made.
The tracer concentrations were measured repeatedly, but not at xed intervals. There was a strong linear cor-
relation between both tracers SO
42
and NO
3
with r= 0.67. Measured discharge values were obtained from
Banque Hydrologique and have a daily resolution (Eaufrance, 2018b). Banque Hydrologique publically pro-
vides daily discharge data of continuously measured rivers and springs collected by French state agencies.
The two springs Baget and Fontestorbes are located in the Pyrénées Mountains (Ariège department) at a
median altitude of 1,000 m. The recharge areas are 13 and 80 km
2
for the Baget and Fontestorbes spring,
respectively. Mean daily discharge of the Fontestorbes spring, which is one of the largest intermittent karst
springs in the world, is 2.1 and 0.5 m
3
/s at the Baget spring. Due to the similarity of the two midaltitude
basins (Labat et al., 2002), mean annual precipitation of 1,178 mm (BaillyComte et al., 2018) can be assumed
for both locations. The Durzon spring is located on the Larzac Plateau in the Grands Causses area in the
Massif Central (Aveyron department). It is a perennial, vauclusiantype spring with a mean daily discharge
of 1.5 m
3
/s. The recharge area has been determined to be >100 km
2
(Jacob et al., 2008). The Fontaine de
Vaucluse spring is a well described and famous karst spring being the largest karstic outlet in France
(Vaucluse department). The mean daily discharge is over 20 m
3
/s and the low ow discharge is always
higher than 4 m
3
/s. The recharge area is about 1,115 km
2
(Fleury et al., 2009). The Fontbelle spring is part
of the Ouysse karst system (Lot department) (Kavouri et al., 2011). The Source de la Touvre is the second
largest karst spring in France and the sole outlet of Rochefoucault karst system (Charente department).
The spring, fed by the losses of three large rivers, has a mean daily discharge of 13 m
3
/s and a recharge area
of about 126 km
2
. The water resources are used for the water supply of Angouleme city. The Source du Lez is
the main perennial outlet of the Lez karst system (Montpellier department) with a mean daily discharge of
2m
3
/s. Pumping for the water supply of Montpellier city puts the aquifer under high anthropogenic pressure
(Bicalho et al., 2017).
More details about the springs are provided in Table 1 and Figure S1 (see supporting information) or at data
base webpage (hydro.eaufrance.fr).
3. Results
3.1. Entropy and Mutual Information of Available Data Sets
Following the principle of continuous entropy, the information content of discharge and the mutual informa-
tion of the joint data sets (tracer signatures) was calculated. We resampled the complete set of sequences ten
times and looked at the mean entropy of each individual data set and the mutual information of two different
tracer signatures, SO
42
and NO
3
. Missing or erroneous results are labeled NA, which leads to gaps shown
in the information contents of springs like Fontaine de Vaucluse (see supporting information).
The Baget example shows that the entropy of discharge decreases when more data are used for training
(Figure 4). The mutual information between the two tracers exceeds the continuous entropy of discharge
by far. The information content shared between those two tracers is 35 times higher than the continuous
entropy of the discharge. That means that we need a lot of information to fully describe the variability of
the interplay between those two tracers and we might not successfully describe this variability with the dis-
charge data alone. Using more than 60% of the available tracer data sets, the mutual information reaches a
plateau where no further information is needed to describe the dynamics. The behavior of MI is similar for
all other catchments: The information content is by far higher than the continuous entropy of discharge and
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 8of20
Table 1
Overview of Used Data
Source Lat Lon Department
Mean
daily
discharge
(m
3
/s)
Recharge
area
(km
2
)
Köppen
Geiger
climate
Mean
annual
rainfall
(mm/a) Geology
Length of daily
discharge
measurements
Tracer
measurements
SO
42
and
NO
3
Baget
ad
42.9554 1.0304 Ariège 0.5 13 Dfb/Dfc 1,187 Lower Cretaceous
limestone
19682015 24
Source de
Fontestor-
bes
bd
42.8925 1.9271 Ariège 2.1 80 Cfb 1,187 Cretaceous
limestones and
marls
19652015 43
Durzon
e
43.9909 3.2617 Aveyron 1.5 124 Dfc/Cfb 400
(k)
Middle to upper
Jurassic limestones
and dolomites
19962016 154
Fontaine de
Vaucluse
a,
f
43.9177 5.1327 Vaucluse 20 1,115 Csb/Csa 960 Great, lower
Cretaceous
limestone series
19662016 51
Fontbelle
g
44.7956 1.5640 Lot 0.1 Cfb 730
h
MiddletoLate
Jurassic tabular
carbonate sequence
20042015 194
Source de la
Touvre
i
45.6630 0.2546 Charente 13 126 Cfb 945 Upper Jurassic
limestones
19802016 125
Source du
Lez
a,j
43.7182 3.8842 Montpellier 2 Csb 942 Upper Jurassic and
early Cretaceous
limestones
19872016 300
a
Jourde et al. (2018)
b
Labat et al. (2002)
c
BaillyComte et al. (2018)
d
BDLisa (2019)
e
Jacob et al. (2008)
f
Fleury et al. (2009)
g
Kavouri et al. (2011)
h
Obtained from Hartmann et al. (2015)
i
Le Moine et al. (2008)
j
Bicalho et al. (2017)
Figure 3. Location of analyzed carbonate rock dominated sources in France.
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 9of20
a plateau is reached using at least 60% of data. Therefore, we assume that we need at least 60% of the
available tracer measurements to cover the variability of the system's dynamics in the training. For more
details, we refer to the supplement (Figure S2) where the entropy and the mutual information for all
catchments is shown in detail.
3.2. Validation of Prediction Accuracy
For the validation of the prediction accuracy, we compared two different learning strategies: the univariate
strategy, focusing on only one tracer at a time, and the multivariate strategy, considering both tracers at the
same learning phase. The results shown here represent all considered sizes of the discharge window. The
prediction results are presented as a boxplot to show the variability and the inuence of the different window
lengths without going into detail on the specicinuence of the window (Figure 5). The average tracer con-
centration ratio cTindicates that the tracer signatures can be predicted better at some springs than at others.
Furthermore, they show a preference toward certain prediction techniques with a cTvalue close to the opti-
mum value. For the Fontaine de Vaucluse, Fontbelle, Sources de Fontestorbes and Source du Lez, cTcon-
verged to the optimal value 1.0. The differences between the machines were marginal, although ELM and
ANN results were less variable and thus less inuenced the amount of training data. For the Baget catch-
ment, we could not predict the concentrations with any machine as the variability is high for all applied
approaches and amounts of training data. For the catchments Durzon and Source de la Touvre either
NO
3
or SO
42
was overestimated or underestimated, although CART delivered acceptable results for the
Source de la Touvre.
The RMSE of the prediction from all investigated window lengths is presented as a boxplot in Figure 6. The
RMSE of the tracer concentrations shows similar results like cT. While for some catchments RMSEs were low
regardless of the chosen machine, for catchments like Baget the results are worse than for catchments like
Fontbelle and Source de la Touvre. If the cTof the catchment does not converge to 1.0 (like the SVM in
Source du Lez), the RMSE is higher than in regions like Fontaine de Vaucluse and Fontbelle where cTis also
close to the optimum. The choice of the machine has only small inuence on the RMSE, apart from Source
du Lez where the SVM delivers worse results than any other method. Generally, a RMSE lower than 1.0 is an
acceptable value for the prediction of the normalized concentration. This limit is reached for all machines in
the catchments Fontaine de Vaucluse, Fontbelle, Source de Fontestorbes, and Source de la Touvre while at
Baget, Durzon, and Source du Lez the RMSE remains highly variable. Whether a univariate or a multivariate
approach results in a lower mean RMSE cannot be stated with certainty from these results, but in most cases
the mean RMSE of the multivariate approach was lower than the mean RMSE of the respective univariate
approach.
Figure 4. Mean continuous entropy and mutual information between NO
3
and SO
42
at Baget spring, showing the
shared information between both tracers and discharge and the singular information through the continuous entropy
of the isolated data sets.
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 10 of 20
Figure 5. cTof SVM, CART, ELM, and ANN. The variability within each boxplot expresses the performance according to the applied type of training data. While the
results are good for most catchments, some concentrations like SO
4
in certain catchments, like Source de la Touvre are overestimated while using an ELM or an ANN
algorithm.
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 11 of 20
Figure 6. RMSE of the normalized tracer concentrations of SVM, CART, ELM, and ANN for univariate and multivariate algorithms. The variability shows the
inuence of the learning threshold on the development of RMSE in the catchment. The RMSE results are similar to the results from cTand show that the error
relates to the average tracer concentration and that for some catchments problems in the prediction occur, like catchment Baget. The choice of the machine has only
a small inuence on the error and depends on the region.
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 12 of 20
The Acc value describing the correct ranking of tracer concentrations shows for all catchments that at least
40% of the rankings are estimated correctly (Figure 7). None of the machines reached mean Acc values >70%.
Here, the choice of machines has an inuence on the dynamics of the tracer concentrations. The Acc values
were highest for catchment Baget compared to all other catchments, while showing the highest variability of
cT. The multivariate prediction does not automatically improve the results in terms of Acc at all catchments,
and the improvement or deterioration varies among the applied approaches (e.g., SVM and ELM in Durzon).
The reason behind this might be found in the interplay of information content, regional aspects of the catch-
ment, and the quality of the input data. Therefore, it is out of scope of this paper to check the causality of the
preferred choice. Nevertheless, in most catchments, the multivariate machines improve Acc. Again, the
choice of the machine has less impact on results and it is merely a catchment specic question.
The inuence of the chosen window length on the prediction capability of NO
3
and SO
42
is exemplied by
the cTvalues of all four (univariate) machines in catchment Source de Fontestorbes (Figure 8). Generally,
either very short windows (14 days) or long windows (>60 days) lead to good results, while window lengths
in between worsen the results for SVM, CART, and ELM. For further information on the window depen-
dency of the other catchments, which are very similar to the information we derived from our example,
we refer to the supporting information.
As a good example for choosing an approach with the required number of training data for a catchment, we
elaborate the case of Fontbelle (Figure 9). Here, ANN and SVM obtain cTvalues close to the optimum of 1.0,
but the ANN results in lower RMSE values than the SVM. Therefore, we chose the ANN to predict tracer
concentrations in this catchment. The resulting time series (predicted by an ANN trained with 70% of the
available measurements) reveals that the measured tracer concentrations and the predicted time series show
an acceptable agreement with the mean value of concentration captured as well as the general ability to pre-
dict concentrations at all levels measured.
Taking a closer look at the prediction capability for SO
42
, we can see that the multivariate approach inter-
polates concentration in the same range, even close to a concentration of 0.0 mg/L (red marked area in
Figure 9). The multivariate approach is able to cover the peaks, while the univariate approach predicts values
close to the mean concentration. Interestingly, the mean tracer concentration rises over time using the uni-
variate approach. However, the behavior NO
3
is different: The univariate prediction shows a variability that
reects the measured tracer concentrations better, while the multivariate machine predictions show too low
variability around the observed mean concentration. As shown by the red marked area of Figure 8, the uni-
variate approach allows interpolating NO
3
concentrations from Day 2,000 to Day 3,200. The following
decreasing trend cannot be interpolated, and thus, the approach lacks a signicant performance here from
Day 3,200 until the end.
4. Discussion
Missing tracer measurements in terms of gaps or irregular measurement campaigns are the major downside
in using these data to develop models for system characterization. In many cases, it is not possible to repeat
the measurements for the desired tracers, for instance, when data are obtained from online databases like
the U.S. Geological Survey. Furthermore, only limited knowledge is available on the information content
of the data used in traceraided modeling (Hartmann et al., 2017; Kelleher et al., 2019). Our results indicate
that machine learning algorithms represent a valuable technique to predict some characteristics of tracer
concentrations in the karstic environments. Even though none of the machine learning methods were able
to describe the complete dynamics between the two tracers with high precision, our comparative approach
of using different machine learning methods allows us to choose the most appropriate method describing a
specic characteristic at a specic site. Hence, we are able to predict key characteristics like the mean con-
centration and the relative ranking of tracers in a joint tracer analysis. The reason that tracer concentration
dynamics could not entirely be predicted by discharge alone is the low information content thereof com-
pared to the shared information of the tracers. The use of ancillary data or more sophisticated approaches
to improve our prediction is hampered by data limitations or unsecure quality (in terms of measurement
quality). Consequently, the prediction capability of the algorithms is lowered by the limitation to discharge
data and the low temporal resolution of concentration measurements. Thus, results have to be interpreted
carefully and with special regard to the information content of the underlying data.
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 13 of 20
Figure 7. Accuracy Acc of the applied machines with both the univariate and the multivariate approach. The variability of the plots shows the inuence of amount
of training data used for prediction.
Figure 8. Dependency of chosen window length on cTat source de Fontestorbes of all applied machines. Good results are either achieved by short sequences (1
4 days) or long sequences (>60 days).
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 14 of 20
Like for other machine learning applications in hydrology, the choice of the most promising algorithm has to
be found through trial and error (He et al., 2014; Raghavendra & Deka, 2014). Hence, we adapted the research
design to the NoFreeLunchTheorem (Wolpert & Macready, 1997) and compared four different algorithms
from two of the main machine learning families (Kelleher et al., 2015). We assumed that discharge data are
able to provide enough information to describe the interplay between tracer measurements and to predict the
concentrations. However, the continuous entropy of discharge and mutual information between NO
3
and
SO
42
emphasized that the information needed to describe the interplay between this pair of tracers is far
higher than the continuous entropy of the discharge data alone. Although the algorithms were able to predict
certain aspects like the mean concentration and peaks quite well, the complete variability could not be pre-
dicted. In contrast to concentrationdischarge relations that require distinct knowledge on the measured data
and the catchment, our study shows that machine learning algorithms can be trained from databases with
few discontinuous measurements to provide continuous reconstructions of tracer concentrations.
With knowledge on the required information content and the delivered information content, we were not
able to distinguish properly among the different approaches and a further choice would depend strongly
on the focus of the task: Would we like to predict the tracer concentration, or is the ranking of tracer meth-
ods for the dynamic description more important? This lack of a clear preference of the chosen machine
learning methods can also be observed in other comparative machine learning studies in hydrology, for
example, in ood event separation (Mewes & Oppel, 2019) and the simulation of streamow (Shortridge
et al., 2016). Similarly to their results, there might not be a single machine for all purposes that works with
our data set, but a set of machines that work together to deliver the desired results, which was shown to be
useful for hydrological modeling in general (Clark et al., 2008). We assume that the interplay of the informa-
tion content of the tracers and discharge determines the choice of the best working algorithm. This assumed
link between information content of data, prediction performance, and method preference might be a way to
regionalize karst catchments by a datadriven approach (Abdollahi et al., 2017).
Figure 9. Interpolated time series of SO
42
and NO
3
predicted for catchment Fontbelle with a univariate and a multivariate ANN and 80% of data used for train-
ing. The black line indicates the observed discharge dynamics.
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 15 of 20
Consequently, our comparative analysis of algorithms and learning approaches allowed setting up a strategy
to use the aforementioned algorithms to predict tracer signatures. Interestingly, the length of the input
sequence of discharge consists of two groups: a group that prefers short windows and a group that prefers
long windows. This might be related to different processes that relate to the transition time of the karst
spring, which means that we use the information of the time spent by the water in the karst system
(Hartmann et al., 2016). While SO
42
requires long times to dissolve from the karstic rock to the water,
NO
3
dissolves faster. This is the reason that the two tracers are investigated: to separate slow from fast
water. Here, SO
42
could be predicted better by long windows of input data, while NO
3
had higher perfor-
mances with short input windows.
Apparently, the information that we use right now is sufcient for peak concentrations and the mean values,
but concentrations of SO
42
close to nearly 0.0 mg/L lead to errors (Figure 9). Hence, processes that lead to
low SO
42
concentrations in the discharge are not yet covered by the discharge data and should be included
with ancillary data. Such multiinput machine learning applications are widely used in remote sensing and
other applications but underrepresented in hydrology because knowledge on the information content of the
input data is crucial for their application and that remains unknown in many hydrological studies
(Mountrakis et al., 2011; Piotrowski et al., 2007; Zheng et al., 2015).
Overall, our investigations show that we cannot state a clear preference toward a single approach. However,
the introduction of a comparative framework helps to identify the most appropriate solution to predict tracer
concentrations for a specic catchment. In the following parts of the discussion, we adapt our concept of
entropy and present a preliminary framework that could be used to predict tracer concentrations.
4.1. Improvements for Concept of Entropy
Due to the mixed results of the multivariate approach, we analyzed the results of both approaches, univari-
ate and multivariate, as an example and learned that we need one tracer to predict the other. As we can see
from the interpolated time series of catchment Fontbelle, the multivariate approach performed better for
SO
42
than for NO
3
. Therefore, the additional information from NO
3
helped the algorithm to nd the pat-
tern in SO
42
. Hence, a framework should consist of a univariate ANN to predict NO
3
, which acts as addi-
tional information to predict SO
42
.
To reveal the relationship of explanatory power between predictors and variables, we transfer the concept of
mutual information to conditional, or relative, entropy (ChaconHurtado et al., 2017; Corzo & Solomantine,
2007; Keum & Coulibaly, 2017). The conditional entropy shows that NO
3
has a higher conditional explana-
tory power than SO
42
to be predicted by discharge (Figure 10). This means that a univariate approach is
Figure 10. Conditional or relative entropy of NO
3
and SO
42
of catchment Fontbelle. Relative entropy shows the expla-
natory power how much of the entropy of discharge can be used to predict a single tracer.
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 16 of 20
more benecial to predict NO
3
than it is for predicting SO
42
. Consequently, we can use the concept of con-
ditional entropy to decide whether a univariate or a multivariate approach should be preferred and which
tracer measurement can be used as ancillary data for the prediction of other tracer concentrations.
4.2. Application of Machine Learning in Interpolation of Tracer Time Series
Discharge separation by tracers relies on tracer observations which are often limited in availability (Birkel &
Soulsby, 2015; Klaus & McDonnell, 2013). We assumed that machine learning is a tool to interpolate time
series of tracers by discharge observations. Keeping the aforementioned downsides of machine learning in
mind, the shown interpolation capability of the algorithms is a valuable addition to discharge separation
applications (Garvelmann et al., 2017; Klaus & McDonnell, 2013).
As the explanatory power of discharge alone is too low to describe the interplay between the tracers in all its
variations, the question toward the lling of the gaps by machine learning tools has to be precise. In our fra-
mework, an extensive preanalysis was conducted to show the general applicability in terms of RMSE and cT
for all considered algorithms and amounts of available training data. The length of the input sequence again
is a source for uncertainty in our approach, but we were able to link good prediction results with the geo-
chemical residence time of the tracer in the system. So, for hypothesis testing on transit times, the machine
learning approach can be utilized. To describe the uncertainty of the prediction, both lengths of input
sequences should be used: a short window length of discharge to catch short residence time processes and
a long window of discharge to catch slow processes. Nevertheless, the denition of short and long windows
is catchment specic and has to be determined either by a datadriven preanalysis or detailed knowledge of
the respective catchment, which would be identical to the calibration of a hydrological model (Hartmann
et al., 2014; Wu & Chau, 2011).
5. Conclusions
Our initial study focus explored the use of machine learning algorithms for the prediction of tracer measure-
ments. Since time series of tracer measurements are often too sparse for modeling, machine learning tools
can potentially be useful for researchers with limited access to environmental tracer data or limited resources
to obtain additional measurements. We could show that our selected machine learning tools were able to
identify some characteristics of the observed tracer concentrations like average concentrations or the appro-
priate constellations of tracer concentrations at the selected test sites. Our analysis also revealed that the
information content of discharge alone is not sufcient to predict tracer concentrations with all its entire
variability, as the mutual information between the pairs of tracers is higher than the continuous entropy of
the discharge data. For that reason, the prediction capability of the machine learning algorithms is lowered
substantially. The interpretation of the predicted time series has to be done with care, because the predicted
time series lack extreme concentrations that are abundant in the observations.
Moreover, we were able to build a preliminary framework that creates an ensemble of predictions addressing
the uncertainty of a machine learningbased approach by eliminating the bias of the chosen input sequence
length and the learning approach of the algorithms. All methods considered in this paper deliver acceptable
results in comparison, but the choice of the most suitable algorithm remains catchment specic and should
be based on sitespecic knowledge (e.g., residence time estimations) or extensive datadriven preanalysis.
We found that the amount of required training data is high, as the mutual information between the pair
of tracers requires at least 60% of the available data to reach a plateau. Hence, the training of the machines
is not likely to be successful in datapoor regions.
We conclude from our investigations that the setup of a framework to predict tracer concentrations with
machine learning tools remains challenging. Nevertheless, we show that the process of setting up the
machine learningbased ensemble framework can be facilitated by informationbased analyses like the con-
cept of entropy, conditional entropy, and mutual information. Knowledge on the information content of the
data helps to justify the nonobvious choice of methods facing blackboxmachine learning approaches.
Moreover, they could be the basis for future regionalization of catchments and the transfer of trained
machines to datapoor regions, in case the machine learning approaches were trained in informationrich
environments. By the training with informationrich training data, linkages between processes that are hid-
den in data, like discharge data, become transferable and quantiable. Hydrological models, on the other
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 17 of 20
hand, require the same amount of data regardless of their information content. So measurements too few for
traditional hydrological models may still contain sufcient information to improve machine learning mod-
els. Overall, we are just at the doorstep to use datadriven approaches in hydrology, especially in complex
environments like karst. Disregarding the problems that we still have to face in the future, advanced data
driven machine learning approaches may allow further improvements of data analysis, model calibration,
and model development.
Although there is no silver bullet in predicting tracer concentrations, we could show by the input win-
dow analysis that the characteristics of the assumed transit time of tracers becomes visible in the most
suitable input window lengths for the prediction. However, through analyzing on how a machine learns
data patterns and investigating the results of the prediction, our study highlights the importance of an
information content analysis. This opens the eld of further entropybased approaches of data mining in
hydrological contexts, especially in often datasparse applications like karst hydrology.
Author Contributions
The analysis and writing was conducted by B. M. and supported and advised by H. O. and A. H. V. M. con-
ducted the literature review on the investigated karst regions, the site selection, and data preparation.
References
Abdollahi, S., Raeisi, J., Khalilianpour, M., Ahmadi, F., & Kisi, O. (2017). Daily mean streamow prediction in perennial and nonperennial
rivers using four data driven techniques. Water Resources Management,31(15), 48554874.
Aquilina, L., Ladouche, B., & Döriger, N. (2005). Recharge processes in karstic systems investigated through the correlation of chemical
and isotopic composition of rain and springwaters. Applied Geochemistry,20(12), 21892206.
BaillyComte, V., Ladouche, B., Allanic, C., Bitri, A., Moiroux, F., Monod, B., et al. (2018). Evaluation des ressources en eaux souterraines
du Plateau de Sault Amélioration des connaisances sur les potentialiés de la ressource et cartographi e de la vulnérabilité. Rapport nal.
BRGM/PR67528FR.
BDLisa. (2019). Retrieved from https://bdlisa.eaufrance.fr
Bicalho, C. C., BatiotGuilhe, C., Taupin, J. D., Patris, N., Van Exter, S., & Jourde, H. (2017). A Conceptual Model for Groundwater
Circulation Using Isotopes and Geochemical Tracers Coupled with Hydrodynamics: A Case Study of the Lez Karst System. Chemical
Geology, (February 2016), 01: France. https://doi.org/10.1016/j.chemgeo.2017.08.014
Birkel, C., & Soulsby, C. (2015). Advancing traceraided rainfallrunoff modelling: A review of progress, problems and unrealised potential .
Hydrological Processes,29(25), 52275240. https://doi.org/10.1002/hyp.10594
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classication and Regression Trees. Taylor &: Francis.
ChaconHurtado, J. C., Alfonso, L., & Solomatine, D. P. (2017). Rainfall and streamow sensor network design: A review of applications,
classication, and a proposed framework. Hydrology and Earth System Sciences,21(6), 30713091. https://doi.org/10.5194/hess213071
2017
Chang, Y.W., Hsieh, C.J., Chang, K.W., Ringgaard, M., & Lin, C.J. (2010). Training and testing lowdegree polynomial data mappings via
linear SVM. Journal of Machine Learning Research,11, 14711490.
Clark, M. P., Slater, A. G., Rupp, D. E., Woods, R. A., Vrugt, J. A., Gupta, H. V., et al. (2008). Framework for understanding structural errors
(FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research,44(12).
Cortes, C., & Vapnik, V. (1995). Supportvector networks. Machine Learning,20(3), 273297.
Corzo, G., & Solomantine, D. (2007). Baseow separation techniques for modular articial neural network modelling in ow forecasting.
Hydrological Sciences Journal,52(3), 491507. https://doi.org/10.1623/hysj.52.3.491
Eaufrance (2018a). ADES: Portail nationale d'Accès aux Données sur les Eaux Souterraines, http://www.ades.eaufrance.fr/
Eaufrance (2018b). Banque Hydro, http://hydro.eaufrance.fr/
Fernando, T. M. K. G., Maier, H., & Dandy, G. (2009). Selection of input variables for data driven models: An average shifted histogram
partial mutual information estimator approach. Journal of Hydrology,367. https://doi.org/10.1016/j.jhyd rol.2008.10.019
Fleury, P., Ladouche, B., Conroux, Y., Jourde, H., & Döriger, N. (2009). Modelling the hydrologic functions of a karst aquifer under active
water managementThe Lez spring. Journal of Hydrology,365(34), 235243. https://doi.org/10.1016/j.jhydrol.2008.11.037
Fotovatikhah, F., Herrera, M., Shamshirband, S., Chau, K.W., Ardabili, S. F., & Piran, M. J. (2018). Survey of computational intelligence as
basis to big ood management: Challenges, research directions and future work. Engineering Applications of Computational Fluid
Mechanics,12(1), 411437. https://doi.org/10.1080/19942060.2018.1448896
Garvelmann, J., Warscher, M., Leonhardt, G., Franz, H., Lotz, A., & Kunstmann, H. (2017). Quantication and characterization of the
dynamics of springand stream water systems in the Berchtesgaden Alps with a longterm stable isotope dataset. Environmental Earth
Sciences,76(22), 117. https://doi.org/10.1007/s1266501771076
Gong, W., Yang, D., Gupta, H. V., & Nearing, G. (2014). Estimating information entropy for hydrological data: Onedimensional case.
Water Resources Research,50(6), 50035018. https://doi.org/10.1002/2014WR015874
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge: MIT Press.
Gur, D., BarMatthews, M., & Sass, E. (2003). Hydrochemistry of the main Jordan River sources: Dan, Banias, and Kezinim springs, north
Hula Valley, Israel. Israel Journal of Earth Sciences,52. https://doi.org/10.1560/RRMW9WXD31VUMWHN
Han, J., & Kamber, M. (2010). Data Mining: Concepts and Techniques, the Morgan Kaufmann Series in Data Management Systems.
Amsterdam: Elsevier.
Hartmann, A., Barberá, J. A., & Andreo, B. (2017). On the value of water quality data and informative ow states in karst modelling.
Hydrology and Earth System Sciences,21(12), 59715985. https://doi.org/10.5194/hess2159712017
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 18 of 20
Acknowledgments
Support to Andreas Hartmann was
provided by the Emmy Noether
Programme of the German Research
Foundation (DFG; Grant HA 8113/11;
project Global Assessment of Water
Stress in Karst Regions in a Changing
World).
Hartmann, A., Gleeson, T., Rosolem, R., Pianosi, F., Wada, Y., & Wagener, T. (2015). A largescale simulation model to assess karstic
groundwater recharge over Europe and the Mediterranean. Geoscientic Model Devel opment,8(6), 17291746. https://doi.org/10.5194/
gmd817292015
Hartmann, A., Goldscheider, N., Wagener, T., Lange, J., & Weiler, M. (2014). Karst water resources in a changing world: Review of
hydrological modeling approaches. Reviews of Geophysics,52(3), 218242. https://doi.org/10.1002/2013RG000443
Hartmann, A., Kobler, J., Kralik, M., Dirnböck, T., Humer, F., & Weiler, M. (2016). Modelaided quantication of dissolved carbon and
nitrogen release after windthrow disturbance in an Austrian karst system. Biogeosciences,13(1), 159174. https://doi.org/10.5194/bg13
1592016
Haykin, S. (1999). Neural Networks: A Comprehensive Foundation. Upper Saddle River: Prentice Hall.
He, Z., Wen, X., Liu, H., & Du, J. (2014). A comparative study of articial neural network, adaptive neuro fuzzy inference system and
support vector machine for forecasting river ow in the semiarid mountain region. Journal of Hydrology,509, 379386. https://doi.org/
10.1016/j.jhydrol.2013.11.054
Hu, C, J.j. Wang, Z.n. Wu, and LinaLiu (Eds.) (2011). Application of the support vector machine on precipitationrunoff modelling in
Fenhe River, 2011 International Symposium on Water Resource and Environmental Protection, 10991103, vol. 2.
Huang, G.B., Zhu, Q.Y., & Siew, C.K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. Neural
Networks,2, 985990.
Jacob, T., Bayer, R., Chery, J., Jourde, H., Le Moigne, N., Boy, J.P., et al. (2008). Absolute gravity mon itoring of water storage variation in a
karst aquifer on the Larzac plateau (southern France). Journal of Hydrology,359(12), 105117. https://doi.org/10.1016/j.
jhydrol.2008.06.020
Jourde, H., Massei, N., Mazzili, N., & Binet, S. (2018). SNO KARST: A French network of observatories for the multidisciplinary study of
critical zone processes in karst watersheds and aquifers. Vadose Zone Journal,17(1).
Kavouri, K., Plagnes, V., Tremoulet, J., Döriger, N., Rejiba, F., & Marchet, P. (2011). PaPRIKa: A method for estimating karst resource and
source vulnerabilityApplication to the Ouysse karst system (Southwest France). Hydrogeology Journal,19(2), 339353. https://doi.org /
10.1007/s1004001006888
Kelleher, C., Ward, A., Knapp, J. L. A., Blaen, P. J., Kurz, M. J., Drummond, J. D., et al. (2019). Exploring tracer information and model
framework tradeoffs to improve estimation of stream transient storage processes. Water Resources Research,55,34813501. https://doi.
org/10.1029/2018WR023585
Kelleher, J. D., Mac Namee, B., & D'Arcy, A. (2015). Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked
Examples, and Case Studies. Cambridge: MIT Press.
Keum, J., & Coulibaly, P. (2017). Information theorybased decision support system for integrated design of multivariable hydrometric
networks. Water Resources Research,53, 62396259. https://doi.org/10.1002/2016WR019981
Kirchner, J. W. (2003). A double paradox in catchment hydrology and geochemistry. Hydrological Processes,17(4), 871874.
Klaus, J., & McDonnell, J. J. (2013). Hydrograph separation using stable isotopes: Review and evaluation. Journal of Hydrology,505,4764.
Labat, D., Mangin, A., & Ababou, R. (2002). Rainfallrunoff relations for karstic springs: Multifractal analyses. Journal of Hydrology,256,
176195. https://doi.org/10.1016/S00221694(01)005352
Le Moine, N., Andréassian, V., & Mathevet, T. (2008). Confronting surfaceand groundwater balances on the La RochefoucauldTouvre
karstic system (Charente, France). Water Resources Research,44, W03403. https://doi.org/10.1029/2007WR005984
Lee, E. S., & Krothe, N. C. (2001). A fourcomponent mixing model for water in a karst terrain in southCentral India na, USA. Using solute
concentration and stable isotopes as tracers. Chemical Geology,179(1), 129143.
Mahler, B. J., & Garner, B. D. (2009). Using nitrate to quantify quick ow in a karst aquifer. Ground Water,47(3), 350360. https://doi.org/
10.1111/j.17456584.2008.00499.x
Mei, Y., & Anagnostou, E. N. (2015). A hydrograph separation method based on information from rainfall and runoff records. Journal of
Hydrology,523, 636649. https://doi.org/10.1016/j.jhydrol.2015.01.083
Mewes, B., & Oppel, H. (2019). A comparative analysis of machine learnin g tools for hydrograph separation. Frontiers in Water Complexity.
submitted
Mountrakis, G., Im, J., & Ogole, C. (2011). Support vector machines in remote sensing: A review. ISPRS Journal of Photogrammetry and
Remote Sensing,66(3), 247259. https://doi.org/10.1016/j.isprsjprs.2010.11.001
Mudarra, M., & Andreo, B. (2011). Relative importance of the saturated and the unsaturated zones in the hydrogeological functioning
of karst aquifers: The case of Alta Cadena (southern Spain). Journal of Hydrology,
397(34). https://doi.org/10.1016/j.jhydrol.2010.12.005
Mudarra, M., Hartmann, A., & Andreo, B. (2019). Combining experimental methods and modeling to quantify the complex recharge
behavior of karst aquifers. Water Resources Research,55, 13841404. https://doi.org/10.1029/2017WR021819
Nasrabadi, N. M. (2007). Pattern recognition and machine learning. Journal of Electronic Imaging ,16(4), 49,901.
Nourani, V., Komasi, M., & Mano, A. (2009). A multivariate ANNwavelet approach for rainfallrunoff modeling. Water Resources
Management,23(14), 2877. https://doi.org/10.1007/s1126900994145
Piotrowski, A., Wallis, S. G., Napiórkowski, J. J., & Rowiński, P. M. (2007). Evaluation of 1D tracer concentrat ion prole in a small river by
means of multilayer perceptron neural networks. Hydrology and Earth System Sciences,11(6), 18831896. https://doi.org/10.5194/hess
1118832007
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning,1(1), 81106. https://doi.org/10.1007/BF00116251
Raghavendra, N. S., & Deka, P. C. (2014). Support vector machine applications in the eld of hydrology: A review. Applied Soft Computing,
19, 372386. https://doi.org/10.1016/j.asoc.2014.02.002
Rimmer, A., & Hartmann, A. (2014). Optimal hydrograph separation lter to evaluate transp ort routines of hydrological models. Journal of
Hydrology,514, 249257. https://doi.org/10.1016/j.jhydrol.2014.04.033
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal,27(3), 379423. https://doi.org/
10.1002/j.15387305.1948.tb01338.x
Sharma, A. (2000). Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1A strategy for
system predictor identication. Journal of Hydrology,239(1), 232239.
Shortridge, J. E., Guikema, S. D., & Zaitchik, B. F. (2016). Machine learning methods for empirical streamow simulation: A comparison of
model accuracy, interpretability, and uncertainty in seasonal watersheds. Hydrology and Earth System Sciences,20(7), 26112628.
https://doi.org/10.5194/hess2026112016
Shrestha, D. L., & Solomatine, D. P. (2006). Machine learning approaches for estimation of prediction interval for the model output. Neural
Networks,19(2), 225235. https://doi.org/10.1016/j.neunet.2006.01.012
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 19 of 20
Tabari, H., Kisi, O., Ezani, A., & Hosseinzadeh Talaee, P. (2012). SVM, ANFIS, regression and climate based models for reference evapo-
transpiration modeling using limited climatic data in a semiarid highland environment. Journal of Hydrology,444,7889. https://doi.
org/10.1016/j.jhydrol.2012.04.007
Taormina, R., Chau, K.W., & Sivakumar, B. (2015). Neural network river forecasting through baseow separation and binarycoded
swarm optimization. Journal of Hydrology,529, 17881797.
Thomas, J. A., & Cover, T. M. (2006). Elements of Information Theory. NY, USA: Wiley New York.
Vapnik, V. (2013). The Nature of Statistical Learning Theory. New York: Springer science & business media.
Weiler, M., Seibert, J., & Stahl, K. (2017). Magic componentsWhy quantifying rain, snowand icemelt in river discharge isn't easy,
Hydrological Processes,32(1), 160166, doi:https://doi.org/10.1002/hyp.11361
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation,
1(1), 6782.
Wu, C. L., & Chau, K. W. (2011). Rainfallrunoff modeling using articial neural network coupled with singular spectrum analysis. Journal
of Hydrology,399(34), 394409.
Yaseen, Z. M., Jaafar, O., Deo, R. C., Kisi, O., Adamowski, J., Quilty, J., & ElShae, A. (2016). Streamow forecasting using extreme
learning machines: A case study in a semiarid region in Iraq. Journal of Hydrology,542, 603614. https://doi.org/10.1016/j.
jhydrol.2016.09.035
Yu, P.S., Yang, T.C., Chen, S.Y., Kuo, C.M., & Tseng, H.W. (2017). Comparison of random forests and support vector machine for real
time radarderived rainfall forecasting. Journal of Hydrology,552,92104. https://doi.org/10.1016/j.jhydrol.2017.06.020
Zheng, B., Myint, S. W., Thenkabail, P. S., & Aggarwal, R. M. (2015). A support vector machine to identify irrigated crop types using time
series Landsat NDVI data. International Journal of Applied Earth Observation and Geoinf ormation,34, 103112. https://doi.org/10.1016/
j.jag.2014.07.002
10.1029/2018WR024558
Water Resources Research
MEWES ET AL. 20 of 20
... High-frequency conductivity measurements were effective predictors of all major ions derived from weathering of mountaintop removal mined watersheds (Ross et al., 2018). High-frequency sulphate time series were produced with discharge as an input variable for multiple machine learning algorithms (Mewes et al., 2020). Kisi and Parmar (2016) predicted monthly chemical oxygen demand in an Indian river with nutrient and other water quality information. ...
... We accepted default parameters for the RF model, including the number of trees required for the ensemble (n = 500) and the number of variables tried at each split in an individual tree (mtry = 2). We chose the SVM and RF models because both have been previously applied in hydrological contexts with strong results (e.g., Kim et al., 2020;Mewes et al., 2020). The main difference between the two is the RF uses discrete predictions, which can help identify non-linear patterns, and the SVM is a continuous function. ...
Article
Stream solute monitoring has produced many insights into ecosystem and Earth system functions. Although new sensors have provided novel information about the fine‐scale temporal variation of some stream water solutes, we lack adequate sensor technology to gain the same insights for many other solutes. We used two machine learning algorithms – Support Vector Machine and Random Forest – to predict concentrations at 15‐min resolution for 10 solutes, of which eight lack specific sensors. The algorithms were trained with data from intensive stream sensing and manual stream sampling (weekly) for four full years in a hydrologic reference stream within the Hubbard Brook Experimental Forest in New Hampshire, USA. The Random Forest algorithm was slightly better at predicting solute concentrations than the Support Vector Machine algorithm (Nash‐Sutcliffe efficiencies ranged from 0.35 to 0.78 for Random Forest compared to 0.29 to 0.79 for Support Vector Machine). Solute predictions were most sensitive to the removal of fluorescent dissolved organic matter, pH and specific conductance as independent variables for both algorithms, and least sensitive to dissolved oxygen and turbidity. The predicted concentrations of calcium and monomeric aluminium were used to estimate catchment solute yield, which changed most dramatically for aluminium because it concentrates with stream discharge. These results show great promise for using a combined approach of stream sensing and intensive stream discrete sampling to build information about the high‐frequency variation of solutes for which an appropriate sensor or proxy is not available.
... T. Yang et al., 2016), water quality prediction (K. Chen et al., 2020;Lu & Ma, 2020;, data mining for sparse environmental data measurement (Mewes et al., 2020;Zhou, 2020), evapotranspiration estimation (Goyal et al., 2014;, groundwater management (Naghibi & Pourghasemi, 2016;Podgorski & Berg, 2020), and so on. It has been confirmed that machine learning is an effective tool to explore implicit relationships in complex nonlinear systems (Goodfellow et al., 2016;Lecun et al., 2015). ...
Article
Full-text available
Rivers play an important role in water supply, irrigation, navigation, and ecological maintenance. Forecasting the river hydrodynamic changes is critical for flood management under climate change and intensified human activities. However, efficient and accurate river modeling is challenging, especially with complex lake boundary conditions and uncontrolled downstream boundary conditions. Here, we proposed a coupled framework by taking the advantages of interpretability of physical hydrodynamic modeling and the adaptability of machine learning. Specifically, we coupled the Gated Recurrent Unit (GRU) with a 1‐D HydroDynamic model (GRU‐HD) and applied it to the middle and lower reaches of the Yangtze River, the longest river in China. We show that the GRU‐HD model could quickly and accurately simulate the water levels, streamflow, and water exchange rates between the Yangtze River and two important lakes (Poyang and Dongting), with most of the Kling‐Gupta efficiency coefficient (KGE $\mathrm{K}\mathrm{G}\mathrm{E}$) above 0.90. Using machine learning‐based predicted water levels, instead of the rating curve approach, as the downstream boundary conditions could improve the accuracy of modeling the downstream water levels of the lake‐connected river system. The GRU‐HD model is dedicated to the synergy of physical modeling and machine learning, providing a powerful avenue for modeling rivers with complex boundary conditions.
... Machine learning models are increasingly used to make hydrological predictions [71,72], and the most accurate versions tend to utilize ensemble models that combine inputs from independent algorithms before making final decisions [73][74][75]. Machine learning models can also be used to explore complex, non-linear relationships between predictor and target variables. ...
Article
Full-text available
Human agriculture, wastewater, and use of fossil fuels have saturated ecosystems with nitrogen and phosphorus, threatening biodiversity and human water security at a global scale. Despite efforts to reduce nutrient pollution, carbon and nutrient concentrations have increased or remained high in many regions. Here, we applied a new ecohydrological framework to ~12,000 water samples collected by the U.S. Environmental Protection Agency from streams and lakes across the contiguous U.S. to identify spatial and temporal patterns in nutrient concentrations and leverage (an indicator of flux). For the contiguous U.S. and within ecoregions, we quantified trends for sites sampled repeatedly from 2000 to 2019, the persistence of spatial patterns over that period, and the patch size of nutrient sources and sinks. While we observed various temporal trends across ecoregions, the spatial patterns of nutrient and carbon concentrations in streams were persistent across and within ecoregions, potentially because of historical nutrient legacies, consistent nutrient sources, and inherent differences in nutrient removal capacity for various ecosystems. Watersheds showed strong critical source area dynamics in that 2–8% of the land area accounted for 75% of the estimated flux. Variability in nutrient contribution was greatest in catchments smaller than 250 km ² for most parameters. An ensemble of four machine learning models confirmed previously observed relationships between nutrient concentrations and a combination of land use and land cover, demonstrating how human activity and inherent nutrient removal capacity interactively determine nutrient balance. These findings suggest that targeted nutrient interventions in a small portion of the landscape could substantially improve water quality at continental scales. We recommend a dual approach of first prioritizing the reduction of nutrient inputs in catchments that exert disproportionate influence on downstream water chemistry, and second, enhancing nutrient removal capacity by restoring hydrological connectivity both laterally and vertically in stream networks.
... Considering the recent wide applicability of "shallow" neural networks in hydrology, oceanology and atmospheric sciences (Crawford et al., 2019;Bergen et al., 2019;Berkhahn et al., 2019;Kulp and Strauss, 2019;Sezen et al., 2019;Flombaum et al., 2020;Jia et al., 2020;Diez-Sierra and del Jesus, 2020;Nourani et al., 2020;Mewes et al., 2020;Snieder et al., 2020), we hope the proposed method may be beneficial not only for stream temperature modelling, but possibly also other hydrological applications. ...
Article
For about two decades neural networks are widely used for river temperature modelling. However, in recent years one has to distinguish between the “classical” shallow neural networks, and deep learning networks. The applicability of rapidly developing deep learning networks to stream water temperature modelling may be limited, but some methods developed for deep learning, if properly re-considered, may efficiently improve performance of shallow networks. Dropout is widely considered the method that allows deep learning networks to avoid overfitting to training data, facilitating its implementations to versatile problems. Recently the successful applicability of dropout for river temperature modelling by means of shallow multilayer perceptron neural networks has been introduced. In the present study we propose to use dropout solely for input neurons of product unit neural networks for the purpose of stream temperature modelling. We perform tests on data collected from six catchments located in temperate climate zones on two continents in various orographic conditions. We show that the average performance of product unit neural networks trained with input dropout is better than the average performance of product units without dropout, product units with dropout applied to every layer of the networks, multilayer perceptron neural networks with or without dropout, and the semi-physical air2stream model. The advantage of product unit neural networks with input dropout is statistically significant on hilly or mountainous catchments; the performance on flat ones is similar to the performances obtained from competitive models.
... hervorgehobene Nützlichkeit von DDMs als Unterstützung von konzeptionellen Modellen. Beispielsweise als Kombination der Modellstrukturen zur Vorhersage einer Zielgröße, bspw.Ratto et al. (2007), oder als Verwendung von ML zur Aufbereitung und Validierung von Eingangsdaten(Mewes, Oppel & Hartmann, 2019; für konzeptionelle Modelle. Unabhängig von bestehenden Konzepten wird allgemein empfohlen so viele Informationen über das Zielgebiet wie möglich in die gewählte Modellstruktur einzubeziehen. ...
Book
Full-text available
In data scarce regions common hydrological models cannot be applied. Due to the missing data for calibration conceptual rainfall-runoff models cannot be parametrized and, hence, not be used for operational predictions or definition of design hydrographs. Geomorphological instantaneous unit hydrographs (GIUH) offer the unique possibility to adapt model structure to catchment structure and thereby increase model accuracy in data scarce regions. The parsimony as well as the incorporation of catchment structures is a valuable advantage for prediction in ungauged basins. The drawback of GIUH-models is the required parametrization for each individual event. Hence, applications of GIUH-models have been limited to scientific reanalysis of past rainfall-runoff events. In this study an ensemble of machine learning (ML) algorithms was applied for the estimation of the required parameters in ungauged basins. Indicators of meteorological forcing and initial catchments states were used as predictors for the estimation of drainage velocity and runoff coefficient. Eight algorithms were applied and their performance has been evaluated in a leave-one-out study in three major catchments in South-East Germany. Predictions provided by the algorithms were given to an improved GIUH-model to transform 2-dimensional precipitation data into an ensemble prediction of hydrographs in ungauged basins. The performance of the improved GIUH-model and the ML-Algorithms were evaluated separately. The GIUH-structure proved to be as flexible as demanded. In a synthetic case study it was able to incorporate different catchment shapes, flowpath distributions and characteristics into the shape of predicted hydrographs. A variation of drainage velocity by flowpath was implemented and improved simulation results. Moreover, a parametrization directly from rainfall-runoff event analysis seemed possible, yet calibrated parameters led to better performance. The setup of the ML-module has been evaluated with respect to the predictors and data segmentation by model approach. In a subsequent regional application, data from all available gauges were used to train the algorithms. Withheld data was used to imitate a prediction in ungauged basin. The models showed an average relative error for drainage velocity of 20% and 40% for runoff volume. The error were lower afterwards by selective data composition, considering only a limited number of similar catchments for model training. The combination of both model components were tested subsequently. The mean efficiencies considering hydrograph timing, volume and variance were close to optimum value. Yet the model worked only in ensemble mode, because a single ML-algorithms proved not to be capable of imitating the full range of hydrological complexity. A comparison to a regionalized HBV-model showed superior results for the GIUH-ML model in ungauged catchments and equal results for gauged catchments. Finally, the possibility of deriving assumptions about hydrological processes from trained ML-dependencies has been discussed. For the performed case studies an assumption about changing dependencies of driving factors and the resulting ratio of flood volume and peak was derived.
Article
Full-text available
Rainfall-runoff simulation is vital for planning and controlling flood control events. Hydrology modeling using Hydrological Engineering Center—Hydrologic Modeling System (HEC-HMS) is accepted globally for event-based or continuous simulation of the rainfall-runoff operation. Similarly, machine learning is a fast-growing discipline that offers numerous alternatives suitable for hydrology research’s high demands and limitations. Conventional and process-based models such as HEC-HMS are typically created at specific spatiotemporal scales and do not easily fit the diversified and complex input parameters. Therefore, in this research, the effectiveness of Random Forest, a machine learning model, was compared with HEC-HMS for the rainfall-runoff process. Furthermore, we also performed a hydraulic simulation in Hydrological Engineering Center—Geospatial River Analysis System (HEC-RAS) using the input discharge obtained from the Random Forest model. The reliability of the Random Forest model and the HEC-HMS model was evaluated using different statistical indexes. The coefficient of determination (R2), standard deviation ratio (RSR), and normalized root mean square error (NRMSE) were 0.94, 0.23, and 0.17 for the training data and 0.72, 0.56, and 0.26 for the testing data, respectively, for the Random Forest model. Similarly, the R2, RSR, and NRMSE were 0.99, 0.16, and 0.06 for the calibration period and 0.96, 0.35, and 0.10 for the validation period, respectively, for the HEC-HMS model. The Random Forest model slightly underestimated peak discharge values, whereas the HEC-HMS model slightly overestimated the peak discharge value. Statistical index values illustrated the good performance of the Random Forest and HEC-HMS models, which revealed the suitability of both models for hydrology analysis. In addition, the flood depth generated by HEC-RAS using the Random Forest predicted discharge underestimated the flood depth during the peak flooding event. This result proves that HEC-HMS could compensate Random Forest for the peak discharge and flood depth during extreme events. In conclusion, the integrated machine learning and physical-based model can provide more confidence in rainfall-runoff and flood depth prediction.
Article
Full-text available
Statistical learning methods offer a promising approach for low-flow regionalization. We examine seven statistical learning models (Lasso, linear, and nonlinear-model-based boosting, sparse partial least squares, principal component regression, random forest, and support vector regression) for the prediction of winter and summer low flow based on a hydrologically diverse dataset of 260 catchments in Austria. In order to produce sparse models, we adapt the recursive feature elimination for variable preselection and propose using three different variable ranking methods (conditional forest, Lasso, and linear model-based boosting) for each of the prediction models. Results are evaluated for the low-flow characteristic Q95 (Pr(Q>Q95)=0.95) standardized by catchment area using a repeated nested cross-validation scheme. We found a generally high prediction accuracy for winter (RCV2 of 0.66 to 0.7) and summer (RCV2 of 0.83 to 0.86). The models perform similarly to or slightly better than a top-kriging model that constitutes the current benchmark for the study area. The best-performing models are support vector regression (winter) and nonlinear model-based boosting (summer), but linear models exhibit similar prediction accuracy. The use of variable preselection can significantly reduce the complexity of all the models with only a small loss of performance. The so-obtained learning models are more parsimonious and thus easier to interpret and more robust when predicting at ungauged sites. A direct comparison of linear and nonlinear models reveals that nonlinear processes can be sufficiently captured by linear learning models, so there is no need to use more complex models or to add nonlinear effects. When performing low-flow regionalization in a seasonal climate, the temporal stratification into summer and winter low flows was shown to increase the predictive performance of all learning models, offering an alternative to catchment grouping that is recommended otherwise.
Chapter
Hydrology is the science of studying the natural flow of water and the effect of human activity on the water. Hydrological modeling is essential for the management and conservation of water. In recent decades, machine learning (ML) has been applied efficiently in hydrology. In this study, the application of ML in four subfields of hydrology, including flood, precipitation estimation, water quality, and groundwater, is presented. This review shows that ML performs better in flood prediction than traditional data-driven and physical hydrology modeling, particularly in short-term flood forecasting. In addition, using the ML technique helps to estimate precipitation from satellite datasets. This study provides a review of the potential of ML in water quality and groundwater modeling. The study shows that using an optimization algorithm for parameter selection can improve the performance of ML. Moreover, modeling accuracy is often improved through ML hybridization. Finally, it is recommended that hydrologists use ML in their modeling owing to their low computational cost and high performance.
Preprint
Full-text available
Statistical learning methods offer a promising approach for low flow regionalization. We examine seven statistical learning models (lasso, linear and non-linear model based boosting, sparse partial least squares, principal component regression, random forest, and support vector machine regression) for the prediction of winter and summer low flow based on a hydrological diverse dataset of 260 catchments in Austria. In order to produce sparse models we adapt the recursive feature elimination for variable preselection and propose to use three different variable ranking methods (conditional forest, lasso and linear model based boosting) for each of the prediction models. Results are evaluated for the low flow characteristic Q95 (Pr(Q>Q95) = 0.95) standardized by catchment area using a repeated nested cross validation scheme. We found a generally high prediction accuracy for winter (R2CV of 0.66 to 0.7) and summer (R2CV of 0.83 to 0.86). The models perform similar or slightly better than a Top-kriging model that constitutes the current benchmark for the study area. The best performing models are support vector machine regression (winter) and non-linear model based boosting (summer), but linear models exhibit similar prediction accuracy. The use of variable preselection can significantly reduce the complexity of all models with only a small loss of performance. The so obtained learning models are more parsimonious, thus easier to interpret and more robust when predicting at ungauged sites. A direct comparison of linear and non-linear models reveals that non-linear relationships can be sufficiently captured by linear learning models, so there is no need to use more complex models or to add non-liner effects. When performing low flow regionalization in a seasonal climate, the temporal stratification into summer and winter low flows was shown to increase the predictive performance of all learning models, offering an alternative to catchment grouping that is recommended otherwise.
Article
Full-text available
Novel observation techniques (e.g., “smart” tracers) for characterizing coupled hydrological and biogeochemical processes are improving understanding of stream network transport and transformation dynamics. In turn, these observations are thought to enable increasingly sophisticated representations within transient storage models (TSM). However, TSM parameter estimation is prone to issues with insensitivity and equifinality, which grow as parameters are added to model formulations. Currently, it is unclear whether (or not) observations from different tracers may lead to greater process inference and reduced parameter uncertainty in the context of TSM. Herein, we aim to unravel the role of in‐stream processes alongside metabolically active (MATS) and inactive storage zones (MITS) using variable TSM formulations. Models with one (1SZ) and two storage zones (2SZ) and with and without reactivity were applied to simulate conservative and “smart” tracer observations obtained experimentally for two reaches with differing morphologies. As we show, “smart” tracers are unsurprisingly superior to conservative tracers when it comes to partitioning MITS and MATS. However, when transient storage is lumped within a 1SZ formulation, little improvement in parameter uncertainty is gained by using a “smart” tracer, suggesting the addition of observations should scale with model complexity. Importantly, our work identifies several inconsistencies and open questions related to reconciling timescales of tracer observation with conceptual processes (“parameters”) estimated within TSM. Approaching TSM with multiple models and tracer observations may be key to gaining improved insight into transient storage simulation as well as advancing feedback loops between models and observations within hydrologic science.
Article
Full-text available
Integration of the abundant information derived from different sources, characterizing techniques and modeling methodologies, is crucial for extending our knowledge of karst aquifers and their available water resources. In this work, a numerically based approach derived from an improved version of the lumped VarKarst model is proposed, which jointly considers spring discharge and dye test results in calibration routine, to assess independently the contribution of the allogenic and autogenic components to the total recharge of a complex karst system with proved duality in its recharge mechanisms. A newly developed parameter estimation procedure based on rather soft performance rules is employed to confine the uncertainty of the water budget previously obtained with two other independent methods (Soil Water Balance and APLIS). Unlike other methodologies that lead to semiquantitative estimations of input sources, results from our approach display reliable ranges of calibrated values for recharge rate, recharge area, and, to a lesser extent, for water runoff infiltration coming from the streamflow. The integration of all these quantitative results with data (qualitative) previously derived from other experimental methodologies has meant a significant advance in understanding the behavior of the pilot system, allowing a more realistic and robust conceptual model to be developed. We conclude by emphasizing that a continuous transfer of improvements from conceptual to numerical modeling approaches, and vice versa, is necessary to enhance knowledge of carbonate aquifer functioning and ultimately achieve better evaluation and management of water resources. During this process, frequent mutual evaluation between the modeling approaches must be performed.
Article
Full-text available
Karst aquifers andwatersheds represent amajor source of drinkingwater around the world. They are also known as complex and often highly vulnerable hydrosystems due to strong surface–groundwater interactions. Improving the understanding of karst functioning is thus a major issue for the efficient management of karst groundwater resources. A comprehensive understanding of the various processes can be achieved only by studying karst systems across a wide range of spatiotemporal scales under different geological, geomorphological, climatic, and soil cover settings. The objective of the French Karst National Observatory Service (SNO KARST) is to supply the international scientific community with appropriate data and tools, with the ambition of (i) facilitating the collection of long-term observations of hydrogeochemical variables in karst, and (ii) promoting knowledge sharing and developing cross-disciplinary research on karst. This paper provides an overview of the monitoring sites and collective achievements, such as the KarstMod modular modeling platform and the PaPRIKa toolbox, of SNO KARST. It also presents the research questions addressed within the framework of this network, along with major research results regarding (i) the hydrological response of karst to climate and anthropogenic changes, (ii) the influence of karst on geochemical balance of watersheds in the critical zone, and (iii) the relationships between the structure and hydrological functioning of karst aquifers and watersheds.
Article
Full-text available
Flooding produces debris and waste including liquids, dead animal bodies and hazardous materials such as hospital waste. Debris causes serious threats to people's health and can even block the roads used to give emergency aid, worsening the situation. To cope with these issues, flood management systems (FMSs) are adopted for the decision-making process of critical situations. Nowadays, conventional artificial intelligence and computational intelligence (CI) methods are applied to early flood event detection, having a low false alarm rate. City authorities can then provide quick and efficient response in post-disaster scenarios. This paper aims to present a comprehensive survey about the application of CI-based methods in FMSs. CI approaches are categorized as single and hybrid methods. The paper also identifies and introduces the most promising approaches nowadays with respect to the accuracy and error rate for flood debris forecasting and management. Ensemble CI approaches are shown to be highly efficient for flood prediction.
Article
Full-text available
If properly applied, karst hydrological models are a valuable tool for karst water resource management. If they are able to reproduce the relevant flow and storage processes of a karst system, they can be used for prediction of water resource availability when climate or land use are expected to change. A common challenge to apply karst simulation models is the limited availability of observations to identify their model parameters. In this study, we quantify the value of information when water quality data (NO3− and SO42−) is used in addition to discharge observations to estimate the parameters of a process-based karst simulation model at a test site in southern Spain. We use a three-step procedure to (1) confine an initial sample of 500 000 model parameter sets by discharge and water quality observations, (2) identify alterations of model parameter distributions through the confinement, and (3) quantify the strength of the confinement for the model parameters. We repeat this procedure for flow states, for which the system discharge is controlled by the unsaturated zone, the saturated zone, and the entire time period including times when the spring is influenced by a nearby river. Our results indicate that NO3− provides the most information to identify the model parameters controlling soil and epikarst dynamics during the unsaturated flow state. During the saturated flow state, SO42− and discharge observations provide the best information to identify the model parameters related to groundwater processes. We found reduced parameter identifiability when the entire time period is used as the river influence disturbs parameter estimation. We finally show that most reliable simulations are obtained when a combination of discharge and water quality date is used for the combined unsaturated and saturated flow states.
Article
Full-text available
The understanding of alpine groundwater dynamics and the interactions with surface stream water is crucial for water resources research and management in mountain regions. In order to characterize local spring and stream water systems, samples at 8 springs, 5 stream gauges and bulk samples of precipitation at 4 sites were regularly collected between January 2012 and January 2016 in the Berchtesgaden Alps for stable water isotope analysis. The sampled hydro-systems are characterized by very different dynamics of the stable isotope signatures. To quantify those differences, we analyzed the stable isotope time series and calculated mean transit times (MTT) and young water fractions (YWF) of the sampled systems. Based on the data analysis, two groups of spring systems could be identified: one group with relatively short MTT (and high YWF) and another group with long MTT (and low YWF). The MTT and the YWF of the sampled streams were intermediate, respectively. The reaction of the sampled spring and stream systems to precipitation input was studied by lag time analysis. The average lag times revealed the influence of snow and ice melt for the hydrology in the study region. It was not possible to determine the recharge elevation of the spring and stream systems due to a lack of altitude effect in the precipitation data. For two catchments, the influence of the spring water stable isotopic composition on the streamflow was shown, highlighting the importance of the spring water for the river network in the study area.
Article
Full-text available
This study examines and compares the performance of four new attractive artificial intelligence techniques including artificial neural network (ANN), hybrid wavelet-artificial neural network (WANN), Genetic expression programming (GEP), and hybrid wavelet-genetic expression programming (WGEP) for daily mean streamflow prediction of perennial and non-perennial rivers located in semi-arid region of Zagros mountains in Iran. For this purpose, data of daily mean streamflow of the Behesht-Abad (perennial) and Joneghan (non-perennial) rivers as well as precipitation information of 17 meteorological stations for the period 1999–2008 were used. Coefficient of determination (R²) and root mean square error (RMSE) were used for evaluating the applicability of developed models. This study showed that although the GEP model was the most accurate in predicting peak flows, but in overall among the four mentioned models in both perennial and non-perennial rivers, WANN had the best performance. Among input patterns, flow based and coupled precipitation-flow based patterns with negligible difference to each other were determined to be the best patterns. Also this study confirmed that combining wavelet method with ANN and GEP and developing WANN and WGEP methods results in improving the performance of ANN and GEP models.
Book
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Article
Quantifying the components of rain, snowmelt, and glacier ice melt in river discharge is an important but difficult task in hydrology. Although it forms the basis of many climate impact assessments, many published modelling results do not clearly describe how they derived the discharge components. Consequently, reported components such as absolute amounts or relative percentages of snow and ice melt from different studies are rarely comparable. This commentary revisits the differences in the terminology used, the modelling approaches, and the possible conclusions for effects at different time scales. We argue that for questions related to changes in discharge, not particle tracking, for which methodology is widely available, but instead, an “effect tracking” of the input contributions is important, that is, the representation of the signals of rainfall, snowmelt, and glacier ice melt in the discharge at the catchment outlet. We introduce and briefly describe a method for effect tracking and discuss the differences and advantages compared to other methods. This comparison supports our call to the modelling community for more precise descriptions of how the generated input contributions into a catchment from rainfall, snowmelt, and glacier ice melt are tracked through the catchments' multiple stores to finally compose the presented hydrographs.