Information‐Based Machine Learning for Tracer
Signature Prediction in Karstic Environments
, H. Oppel
, V. Marx
, and A. Hartmann
Institute of Hydrology, Water Resources and Environmental Engineering, Ruhr‐University Bochum, Bochum, Germany,
Chair of Hydrological Modeling and Water Resources, Albert‐Ludwigs‐University of Freiburg, Freiburg, Germany
Abstract Karstic groundwater systems are often investigated by a combination of environmental or
artiﬁcial tracers. One of the major downsides of tracer‐based methods is the limited availability of tracer
measurements, especially in data sparse regions. This study presents an approach to systematically evaluate
the information content of the available data, to interpret predictions of tracer concentration from machine
learning algorithms, and to compare different machine learning algorithms to obtain an objective
assessment of their applicability for predicting environmental tracers. There is a large variety of machine
learning approaches, but no clear rules exist on which of them to use for this speciﬁc problem. In this study,
we formulated a framework to choose the appropriate algorithm for this purpose. We compared four
different well‐established machine learning algorithms (Support Vector Machines, Extreme Learning
Machines, Decision Trees, and Artiﬁcial Neural Networks) in seven different karst springs in France for
their capability to predict tracer concentrations, in this case SO
, from discharge. Our study
reveals that the machine learning algorithms are able to predict some characteristics of the tracer
concentration, but not the whole variance, which is caused by the limited information content in the
discharge data. Nevertheless, discharge is often the only information available for a catchment, so the ability
to predict at least some characteristics of the tracer concentrations from discharge time series to ﬁll, for
example, gaps or increase the database for consecutive analyses is a helpful application of machine learning
in data sparse regions or for historic databases.
Tracer‐based methods are often the only way to separate stream ﬂow components and to determine the ori-
gin of water (Kirchner, 2003;Klaus & McDonnell, 2013;Mei & Anagnostou, 2015;Mewes & Oppel, 2019;
Rimmer & Hartmann, 2014;Weiler et al., 2017). Especially in karstic environments, tracer investigations
allow a deeper understanding of the underlying karstic system and the interdependencies of discharge
and the current state of the subterraneous processes or storages (Aquilina et al., 2005;Gur et al., 2003;Lee
& Krothe, 2001).
The joint analysis of tracer data and discharge measurements is a common tool to derive information about
hydrological systems, for example, the identiﬁcation of the origin of water within a catchment. Despite their
advantages, these approaches demand long time series of tracer measurements covering a wide range of
hydrological system dynamics (Garvelmann et al., 2017; Lee & Krothe, 2001). To describe catchments with
hydrological models, the link between tracer signatures and the system's hydrological state is of interest to
set up suitable calibration strategies. Although the dependency on tracer data in model studies is high, the
information content of tracer measurements has rarely been analyzed. Furthermore, the information‐to‐
noise ratio in the data has to be high to derive the desired information about the system (Kelleher et al.,
2015). Another problem is the lack of available tracer databases that hinders many applications, especially
in data sparse regions. Here, machine learning could be useful because of the core concept to predict values
that are difﬁcult to measure with input data that are straightforward to measure. If the algorithms are able to
predict tracer concentrations from discharge time series, data‐driven interpolations of continuous tracer
concentrations time series can be obtained.
With the rise of machine learning technologies and further improvements in information technology, the
application of new approaches for data analysis and the interplay of data, information content, and results
have increased (Goodfellow et al., 2016; Kelleher et al., 2015). Machine learning is the umbrella term for
©2020. The Authors.
This is an open access article under the
terms of the Creative Commons
which permits use, distribution and
reproduction in any medium, provided
the original work is properly cited and
is not used for commercial purposes.
Big Data & Machine Learning
in Water Sciences: Recent
Progress and Their Use in
•Application of entropy and mutual
information reveals the information
content gap between discharge
derived from joint tracer and
•Understanding the information
content of hydrological data
enhances the interpretation of
machine learning prediction results
•Similarities in information could be
used for regionalization of
catchment characteristics of
•Supporting Information S1
Mewes, B., Oppel, H., Marx, V., &
Hartmann, A. (2020).
Information‐based machine learning
for tracer signature prediction in karstic
environments. Water Resources
Research,56, e2018WR024558. https://
Received 4 DEC 2018
Accepted 9 JAN 2020
Accepted article online 11 JAN 2020
MEWES ET AL. 1of20
processes that extract patterns from data automatically (Goodfellow et al., 2016). Machine learning‐based
algorithms are used in many hydrological applications (Raghavendra & Deka, 2014), like rainfall‐runoff
modeling with artiﬁcial neural networks (Hu et al., 2011;Nourani et al., 2009), precipitation forecasting
(Yu et al., 2017), evapotranspiration prediction (Tabari et al., 2012), baseﬂow separation (Corzo &
Solomantine, 2007), measurement setup design (Chacon‐Hurtado et al., 2017), streamﬂow forecasting
(Shortridge et al., 2016;Shrestha & Solomatine, 2006;Taormina et al., 2015;Yaseen et al., 2016), the separa-
tion of ﬂood events from time series of discharge (Mewes & Oppel, 2019), water resource management
(Fotovatikhah et al., 2018), and many more. In these studies, machine learning algorithms were mostly used
to replicate a system and transform a certain variable into the future. Machine learning was found a useful
tool to manipulate data in complex systems, like catchments, where the rules leading from input to output
are not completely describable. For example, using a Multi‐Layer‐Perceptron neural network, dispersion of a
tracer was evaluated for a small river in 1‐D proﬁle (Piotrowski et al., 2007).
For machine learning algorithms the information content of training data is important (Han & Kamber,
2010;Kelleher et al., 2015;Vapnik, 2013). The Shannon entropy is a common concept in information theory
to analyze the information content of given data (Shannon, 1948; see also Fernando et al., 2009). Until now,
no study tried to predict natural tracer concentrations in karstic environments from discharge dynamics by
the application of machine learning algorithms to ﬁll gaps between point measurements of tracer concentra-
tions. This strategy was chosen, because discharge is often the only available data source with an appropriate
temporal resolution for hydrological modeling at an event scale. In the database we used, some infrequent
tracer concentration measurements were available as point measurements. A machine learning tool capable
of ﬁlling these gaps would allow the application of databases of frequent discharge measurements and non-
frequent measured tracer concentrations. Additionally, an already trained algorithm could predict tracer
concentrations for catchments in which only a limited number of discharge measurements is available used.
Furthermore, it would qualify historic data for application in modeling approaches that require a higher
temporal resolution of tracer measurements. In karstic environments, the joint analysis of tracer data is
often the key for a deeper understanding of system states and behavior (Mudarra et al., 2019). Therefore,
we assume a high information content in the measured tracer data because they describe the complex inter-
action of subterraneous processes. Machine learning algorithms depend on information provided in the
data. Consequently, the available data sets of discharge and tracer measurements have to be analyzed on
explanatory power, what has not been done before for a database of karst springs. Furthermore, an informa-
tion content‐based analysis of the interpolated tracer measurements can be conducted by comparing the pre-
diction results with the information content of the input data.
In this study, we analyzed observed discharge and natural tracer data (sulfate, SO
, and nitrate, NO
from seven different karst springs across Europe regarding their information content. We took natural tracers
because they exist in varying concentrations and are measurable without any induced injection. We chose
nitrate and sulfate because they represent different residence times in the system. While nitrate represents
shallow fast ﬂowing water, sulfate represents the opposite origin: slow phreatic processes. We applied differ-
ent machine learning algorithms such as Support Vector Machines (SVM), Classiﬁcation and Regression
Trees (CART), Extreme Learning Machines (ELM), and Artiﬁcial Neural Networks (ANN), to estimate tracer
concentrations from discharge dynamics. We selected those four machine learning approaches that (a) are
well established in hydrology, (b) are used for pattern recognition in structured data sets, and (c) deliver to
a certain degree interpretable structures for the researcher. Furthermore, we compared different concepts
of prediction, including the univariate prediction that separately estimates each tracer with a specialized
machine and the multivariate estimation that tries to predict a set of tracers with a combined machine. We
tested each of the chosen approaches on the prediction capability in seven different catchments and created
a strategy to build a data‐driven interpolation tool set for the interpolation of continuous time series of tracer
measurements. Finally, we linked the prediction results with the observed information content in the data as
well as with the mutual information between the chosen tracers.
2. Methods and Data
Sound results from machine learning approaches require data with a high information‐to‐noise ratio.
Moreover, the choice of the appropriate machine learning algorithm for this task is difﬁcult to justify
Water Resources Research
MEWES ET AL. 2of20
without understanding the internal structure of the problem. Following the No‐Free‐Lunch‐Theorem, all
available approaches should be equally suitable to solve this problem but with a different performance and
different demands to the data in terms of amount and quality (Wolpert & Macready, 1997). Accordingly,
without information for an a priori selection of the best machine learning approach to use, we chose four
structurally different approaches to estimate tracer concentrations in seven catchments. To quantify the
information content within the data set, we introduce concepts like continuous entropy and mutual informa-
tion. After deﬁning these basic concepts, we explain the choice of machine learning algorithms in this study
and explain the further scheme of this application.
2.1. Entropy and Mutual Information
Shannon's model of entropy allows to quantify the amount of information gain by adding new data to the
analysis (Shannon, 1948). The entropy His deﬁned by the chance of a sample X
to be of one of the given
) as the probability that X
with a sample length N:
Because Shannon's entropy is only valid for discrete data, the concept was extended to the continuous
entropy for a continuous variableX
, which is in our case discharge:
where f(x) is the probability density function (PDF) of X
and Ωis the deﬁned domain of X
(Gong et al.,
2014). To determine the explanatory power of data concerning a variable, for example, how much of the
information of NO
is explained by discharge only, we further extend the concept of continuous entropy
to conditional entropy (Thomas & Cover, 2006), where yis the tracer concentration and xis the discharge
Conditional entropy describes how much of variable ycan be explained by variable x. To describe the shared
information between two data points given as xand y, we apply the mutual information (Shannon, 1948;
Sharma, 2000). In our case, we investigate the shared information between the two chosen tracers NO
. The mutual information between two measurements is deﬁned as
(x) and f
(y) are marginal PDFs of xand yf
(x,y) is the joint PDF of xand y(Sharma, 2000). After
Sharma, 2000, the mutual information score from equation (4) can be approximated by
In this approximation f
), and f
) are marginal functions and joint densities at the same point
of the same sample (Fernando et al., 2009;Sharma, 2000). To estimate the density, we apply a kernel estima-
tor (Fernando et al., 2009). Without the kernel estimator a theoretical distribution of the MI has to be
assumed, which adds more bias to the approach. As nearly all models rely on the interplay of input and out-
put data the shared information through mutual information has to be weighted stronger than the internal
information represented through the continuous entropy.
Water Resources Research
MEWES ET AL. 3of20
2.2. Machine Learning Algorithms
The main aim of the paper is to use discharge data as a predictor for tracer concentrations because discharge
in streams and rivers is more commonly measured than tracer concentrations, especially in regions where
access to the site is limited and research relies on public databases. Therefore, we train machine learning
algorithms using time series of runoff to predict time series of tracer concentrations.
The discharge dynamics are captured by a window of discharge data from the original time series with tracer
measurement t′as input for the machine learning algorithms. The machine learning algorithms predict the
tracer concentrations based on information from the discharge pattern (Figure 1). For training and valida-
tion, the predicted tracer concentrations are compared to the measured data (which is considered to repre-
sent the reality). To reduce overﬁtting due to complex input data, an optimal length for the window of
discharge data has to be identiﬁed, which is discussed in detail in section 2.3. Without deﬁning a window,
a Long‐Short‐Memory network can be applied, which requires a continuous time series of input and training
data. Due to a lack of continuous time series of tracer measurements, this approach was discarded.
Four structurally different machine learning algorithms are used in this study: SVM, CART, ELM, and Multi‐
Layer‐Perceptron ANN. These algorithms were chosen because of their suitability for regression problems
and their origin in two of the four main machine learning families: error‐based learning and information‐
based learning (Kelleher et al., 2015). Moreover, they are commonly applied in hydrology and deliver, to a
certain degree, structures that can be interpreted by the researcher. SVM and CART are not known to capture
temporal patterns in time series data. By the reduction from a complete time series to a window with a vari-
able length, temporal dependencies are reduced to dependencies of the relative position within the window.
Thus, the problem is diminished to a pattern recognition problem (Nasrabadi, 2007).
A SVM is an error‐based machine learning algorithm that tries to set up a regression to estimate the
unknown tracer concentration from the input discharge sequence (solid line in Figure 2(a)). This regression
is depicted through a hyperplane, for which the distance to the margin (dashed line in Figure 2(a)) and the
most distant feature, the so‐called support vector, is maximized (Cortes & Vapnik, 1995; Raghavendra &
Deka, 2014). For a linear problem, this ﬁtting of a regression can easily be done, but most of the machine
learning problems, as the one presented here, are highly nonlinear. Therefore, we have to transfer the exist-
ing problem to a higher dimension where the problem becomes linear with a kernel function (Chang et al.,
2010; Kelleher et al., 2015). As the choice of the mapping kernel is highly problem speciﬁc, a selection of sev-
eral kernel functions (radial basis function, linear, polynomial, and sigmoid) was tested and the best kernel
was chosen (in terms of numerical stability and computational demands), in our case the radial basis func-
tion kernel. For more information on the choice of the kernel, see Vapnik (2013). The created boundary layer
is used to predict the unknown tracer concentration Cin the feature space through the input discharge
dynamic, represented as a green dot (Figure 2). Accordingly, the SVM tries to solve the regression problem
by transferring the discharge data into either a single tracer concentration or a set of tracer concentrations in
the multivariate output. Hence, the hyperplane represents the regression function to estimate the respective
tracer concentration from the discharge sequence.
CART builds decision trees that are guidebooks to estimate the tracer concentration from the discharge
values. The tree shows the ramiﬁcations of decisions leading to the ﬁnal regression result (Breiman
et al., 1984; Kelleher et al., 2015; Quinlan, 1986). To build the tree, all discharge values are analyzed
in their ability to maximize the decrease of the residuum of the regression between observed and esti-
mated tracer concentration at each branch. The branching occurs on the descending order of error
reduction. As a result, the structure of the decision tree can be obtained as guidebook for unknown
values, in order to get the desired tracer concentration C(Figure 2(b)). In the given example, the dis-
charge value at position 0 has the highest inﬂuence on error reduction and results in the decision
between the major branches, which are themselves as diversiﬁed as certain discharge values, resulting
in the ﬁnal leaves with the target value Crepresented as a green dot. The error reduction within the
tree for each node is calculated with the root‐mean‐square error (RMSE) of the regression (see error
metrics section). The regression tree analyzes the discharge values to ﬁnd the values that have the high-
est inﬂuence on the regression problem to determine the predicted tracer concentration. The depth of
the CART tree was limited to the number of input values from the time series of runoff in order to cap-
ture all details of the variability of discharge.
Water Resources Research
MEWES ET AL. 4of20
ANN and a ELM (Figures 2(c) and 2(d)) are both variations of neural networks that try to solve the regres-
sion or classiﬁcation problem by imitating the structure of the human brain and by guiding the training data
through a network of hidden layers equipped with neurons (Haykin, 1999). Here, the input nodes are the
discharge values from the window of discharge values for estimation of the desired tracer concentration.
The hidden layers and nodes represent the underlying system, in this case the karst subsurface system.
The connection between nodes and layers is trained by the optimization of weights in order to minimize
the regression error. An ELM is a special case of an ANN: The nodes on the hidden layer receive their
weights only once. In the following, they remain constant over the process of network adaption. Only the
weights from the hidden layer to the output node are updated, which is called a feedforward network due
to the update direction of weight (Huang et al., 2004). Here, the discharge values are sent through the net-
work of nodes and hidden layers to identify the pattern and estimate the tracer amount. The network can
either be trained to estimate a single tracer or a set of tracers. Generally, the number of hidden layers is
restricted to a single hidden layer with half of the input window length as hidden nodes (and a minimum
of three hidden nodes for stability reasons).
To avoid overﬁtting of the data, the number of input data was reduced to a maximum of half of the available
runoff data in the window with a minimum of three remaining runoff values as input data. Furthermore, the
random selection of input values was shufﬂed 10 times and the mean prediction was taken to be represen-
tative for the speciﬁed window length.
Machine learning algorithms depend on the information content of the data (Goodfellow et al., 2016;
Kelleher et al., 2015). Consequently, we assume a link between the performance of the algorithm and the
information content of the data (deﬁned in section 2.1). We train the algorithms by two different ways: (1)
by a univariate strategy estimating each tracer individually and (2) by a multivariate strategy that trains
one algorithm to estimate both tracers simultaneously. We expect that the multivariate strategy performs
better than the univariate as the combination (i.e., interaction) of data should lead to more incorporated
information than just the information content of a single data set. A globally trained algorithm to predict
a set of natural tracers would lower the interpretability of the results. Thus, we discarded the idea of a uni-
versal machine for tracer concentration prediction but focused on the two mentioned natural tracers.
The discharge data have to be reduced to a window with an unknown length. This optimal length might be
highly subjective whether all information on the system's behavior is covered in the respective time span. The
window to be selected contains the tracer measurements and the number of discharge values depicting the
Figure 1. Workﬂow of the analysis including the clipping of the window for the discharge data, the prediction of tracer
signatures by the machine learning algorithms, and the following comparison with measured tracer measurements.
Water Resources Research
MEWES ET AL. 5of20
discharge dynamics. As we do not know whether the window length depends on the chosen approach or
region we varied it from 1 to 180 days in steps of [1, 3, 6, 30, 60, 90, 180] with equally sized borders to face
the unknown optimal length. The window lengths chosen here represent natural breaks within the
classiﬁcation of time to describe a system. We chose these different lengths of the window to include
short‐, medium‐, and long‐term processes in the discharge data and to minimize the number of data sets
analyzed. Therefore, we focused on time spans like a month, two months, and half a year. The discharge
in the sequence is normalized by the catchment speciﬁc average discharge to reduce the inﬂuence of the
peak. The measured tracer concentrations are also normalized by the speciﬁc mean of this tracer for the
catchment. The share of the training data is increased gradually to understand how simulation
performance is inﬂuenced by the size of the training data. Therefore, we varied the amount of data used
Figure 2. The major task for the machine learning algorithms in this study is presented in the upper part of the ﬁgure: To estimate the unknown tracer concentra-
tion, C, by training a machine learning algorithm to the pattern formed by a subset of discharge and a measured pair of tracers. The structure of the chosen algo-
rithm for this study are shown in subplots a, b, c and d.
Water Resources Research
MEWES ET AL. 6of20
for training from 10–90% of the available time series for the catchment. Using the length of the covered time
series instead would be insufﬁcient because the input data includes runoff sequences that might overlap.
Hence, the number of tracer measurements is important.
We train the algorithms with both a univariate and a multivariate strategy. We compare the results from the
two learning strategies to quantify the potential improvement of shared information and joint learning.
Furthermore, we discuss the inﬂuence of the window length on prediction quality. This is relevant as the
length of the input sequences can create a bias in the learning process. If we choose the length too short,
we might not cover all relevant processes, whereas sequences that are too long might confuse the algorithms
in ﬁnding a suitable system. In the last step, we elaborate on the transferability of the algorithms to be used
as predictors at catchments for which they were not trained. That way, we can test whether machine learn-
ing tools and their results might reveal hidden similarities in catchment responses or even more interesting
the application of machine learning is suitable for the prediction of missing tracer measurement data.
2.4. Evaluation and Error Measures
To compare the different machine learning approaches, training strategies and window lengths, quantitative
performance measures were used.
In order to show the general prediction performance, the RMSE was applied for observed and estimated tra-
cer measurements, which becomes 0 for a perfect prediction. To calculate RMSE for the tracer content, we
differentiated between measured and predicted c
, with Nbeing the number of samples in the validation:
We apply RMSE for both tracers individually and calculate the mean of both as an indication of the com-
bined error. Because of the variable window length, individual RMSEs are calculated for each approach
and each region. As the normalization in RMSE does not show the direction of error in contrast to the mean
error which is less robust against outliers, we also analyze the average concentration ratio cTthat provides
information about the general strength and direction of the error of prediction:
cTis able to show the direction and the strength of the error by its sign and its difference from one, respec-
tively. Again, because of the multitude of different window lengths, a range of cTvalues is calculated for each
region and approach.
As all the presented measures are merely a measure of quantitative performance, the qualitative perfor-
mance is measured with the accuracy of the internal ranking of the two tracer signatures. Therefore, we cal-
culated the accuracy by an error matrix of true and false combinations of ranking. The deducted measure of
accuracy acc is able to describe the qualitative information between the two tracers as an accuracy with a
ranking (Han & Kamber, 2010):
pos þnegðÞ (8)
as the ranking of the pair of tracers in concentration, for example,cTAobs>cTBobs but
cTAest<cTBest results in a neg prediction, whereas cTAobs>cTBobs and cTAest >cTBest counts as pos
shows the ability of the machine learning method to replicate the ranking of the tracer concentrations in
order to replicate changing tracer dynamics.
The three measures considered here to judge the performance represent the major key characteristics of the
prediction results. The overall goodness represented as the RMSE, deviation from the mean and the ranking
between both tracers. So, by a correct ranking the qualitative information that tracer concentration domi-
nates is still captured, even though the variance of the prediction is not high enough.
Water Resources Research
MEWES ET AL. 7of20
The target variables of the machine learning prediction are the concentrations of SO
that act as
a combined tracer signature. While NO
is known as an indicator for fast water ﬂuxes from the soil or epi-
karst, that is, the shallow subsurface (Hartmann et al., 2016; Mahler & Garner, 2009), SO
in karst systems
is usually derived from geogenic processes that dissolve evaporates in the phreatic subsurface that sustains
base ﬂow (Hartmann et al., 2017; Mudarra & Andreo, 2011). We chose these two tracers as an example for
any tracer combination. Due to their different origins, the shallow subsurface (NO
) and the phreatic zone
), we expect that their observations of dissolved evaporates include different information.
The data for our analyses originate from seven different karst springs in France (Table 1 and Figure 3). Tracer
measurements were normalized by individual mean values, leading to seven different means (Eaufrance,
2018a). The tracers analyzed in this study are natural tracers; no human‐induced injections were made.
The tracer concentrations were measured repeatedly, but not at ﬁxed intervals. There was a strong linear cor-
relation between both tracers SO
with r= 0.67. Measured discharge values were obtained from
Banque Hydrologique and have a daily resolution (Eaufrance, 2018b). Banque Hydrologique publically pro-
vides daily discharge data of continuously measured rivers and springs collected by French state agencies.
The two springs Baget and Fontestorbes are located in the Pyrénées Mountains (Ariège department) at a
median altitude of 1,000 m. The recharge areas are 13 and 80 km
for the Baget and Fontestorbes spring,
respectively. Mean daily discharge of the Fontestorbes spring, which is one of the largest intermittent karst
springs in the world, is 2.1 and 0.5 m
/s at the Baget spring. Due to the similarity of the two midaltitude
basins (Labat et al., 2002), mean annual precipitation of 1,178 mm (Bailly‐Comte et al., 2018) can be assumed
for both locations. The Durzon spring is located on the Larzac Plateau in the Grands Causses area in the
Massif Central (Aveyron department). It is a perennial, vauclusian‐type spring with a mean daily discharge
of 1.5 m
/s. The recharge area has been determined to be >100 km
(Jacob et al., 2008). The Fontaine de
Vaucluse spring is a well described and famous karst spring being the largest karstic outlet in France
(Vaucluse department). The mean daily discharge is over 20 m
/s and the low ﬂow discharge is always
higher than 4 m
/s. The recharge area is about 1,115 km
(Fleury et al., 2009). The Fontbelle spring is part
of the Ouysse karst system (Lot department) (Kavouri et al., 2011). The Source de la Touvre is the second
largest karst spring in France and the sole outlet of Rochefoucault karst system (Charente department).
The spring, fed by the losses of three large rivers, has a mean daily discharge of 13 m
/s and a recharge area
of about 126 km
. The water resources are used for the water supply of Angouleme city. The Source du Lez is
the main perennial outlet of the Lez karst system (Montpellier department) with a mean daily discharge of
/s. Pumping for the water supply of Montpellier city puts the aquifer under high anthropogenic pressure
(Bicalho et al., 2017).
More details about the springs are provided in Table 1 and Figure S1 (see supporting information) or at data
base webpage (hydro.eaufrance.fr).
3.1. Entropy and Mutual Information of Available Data Sets
Following the principle of continuous entropy, the information content of discharge and the mutual informa-
tion of the joint data sets (tracer signatures) was calculated. We resampled the complete set of sequences ten
times and looked at the mean entropy of each individual data set and the mutual information of two different
tracer signatures, SO
. Missing or erroneous results are labeled NA, which leads to gaps shown
in the information contents of springs like Fontaine de Vaucluse (see supporting information).
The Baget example shows that the entropy of discharge decreases when more data are used for training
(Figure 4). The mutual information between the two tracers exceeds the continuous entropy of discharge
by far. The information content shared between those two tracers is 35 times higher than the continuous
entropy of the discharge. That means that we need a lot of information to fully describe the variability of
the interplay between those two tracers and we might not successfully describe this variability with the dis-
charge data alone. Using more than 60% of the available tracer data sets, the mutual information reaches a
plateau where no further information is needed to describe the dynamics. The behavior of MI is similar for
all other catchments: The information content is by far higher than the continuous entropy of discharge and
Water Resources Research
MEWES ET AL. 8of20
Overview of Used Data
Source Lat Lon Department
Length of daily
42.9554 1.0304 Ariège 0.5 13 Dfb/Dfc 1,187 Lower Cretaceous
42.8925 1.9271 Ariège 2.1 80 Cfb 1,187 Cretaceous
43.9909 3.2617 Aveyron 1.5 124 Dfc/Cfb 400
Middle to upper
43.9177 5.1327 Vaucluse 20 1,115 Csb/Csa 960 Great, lower
44.7956 1.5640 Lot 0.1 Cfb 730
Source de la
45.6630 0.2546 Charente 13 126 Cfb 945 Upper Jurassic
43.7182 3.8842 Montpellier 2 Csb 942 Upper Jurassic and
Jourde et al. (2018)
Labat et al. (2002)
Bailly‐Comte et al. (2018)
Jacob et al. (2008)
Fleury et al. (2009)
Kavouri et al. (2011)
Obtained from Hartmann et al. (2015)
Le Moine et al. (2008)
Bicalho et al. (2017)
Figure 3. Location of analyzed carbonate rock dominated sources in France.
Water Resources Research
MEWES ET AL. 9of20
a plateau is reached using at least 60% of data. Therefore, we assume that we need at least 60% of the
available tracer measurements to cover the variability of the system's dynamics in the training. For more
details, we refer to the supplement (Figure S2) where the entropy and the mutual information for all
catchments is shown in detail.
3.2. Validation of Prediction Accuracy
For the validation of the prediction accuracy, we compared two different learning strategies: the univariate
strategy, focusing on only one tracer at a time, and the multivariate strategy, considering both tracers at the
same learning phase. The results shown here represent all considered sizes of the discharge window. The
prediction results are presented as a boxplot to show the variability and the inﬂuence of the different window
lengths without going into detail on the speciﬁcinﬂuence of the window (Figure 5). The average tracer con-
centration ratio cTindicates that the tracer signatures can be predicted better at some springs than at others.
Furthermore, they show a preference toward certain prediction techniques with a cTvalue close to the opti-
mum value. For the Fontaine de Vaucluse, Fontbelle, Sources de Fontestorbes and Source du Lez, cTcon-
verged to the optimal value 1.0. The differences between the machines were marginal, although ELM and
ANN results were less variable and thus less inﬂuenced the amount of training data. For the Baget catch-
ment, we could not predict the concentrations with any machine as the variability is high for all applied
approaches and amounts of training data. For the catchments Durzon and Source de la Touvre either
was overestimated or underestimated, although CART delivered acceptable results for the
Source de la Touvre.
The RMSE of the prediction from all investigated window lengths is presented as a boxplot in Figure 6. The
RMSE of the tracer concentrations shows similar results like cT. While for some catchments RMSEs were low
regardless of the chosen machine, for catchments like Baget the results are worse than for catchments like
Fontbelle and Source de la Touvre. If the cTof the catchment does not converge to 1.0 (like the SVM in
Source du Lez), the RMSE is higher than in regions like Fontaine de Vaucluse and Fontbelle where cTis also
close to the optimum. The choice of the machine has only small inﬂuence on the RMSE, apart from Source
du Lez where the SVM delivers worse results than any other method. Generally, a RMSE lower than 1.0 is an
acceptable value for the prediction of the normalized concentration. This limit is reached for all machines in
the catchments Fontaine de Vaucluse, Fontbelle, Source de Fontestorbes, and Source de la Touvre while at
Baget, Durzon, and Source du Lez the RMSE remains highly variable. Whether a univariate or a multivariate
approach results in a lower mean RMSE cannot be stated with certainty from these results, but in most cases
the mean RMSE of the multivariate approach was lower than the mean RMSE of the respective univariate
Figure 4. Mean continuous entropy and mutual information between NO
at Baget spring, showing the
shared information between both tracers and discharge and the singular information through the continuous entropy
of the isolated data sets.
Water Resources Research
MEWES ET AL. 10 of 20
Figure 5. cTof SVM, CART, ELM, and ANN. The variability within each boxplot expresses the performance according to the applied type of training data. While the
results are good for most catchments, some concentrations like SO
in certain catchments, like Source de la Touvre are overestimated while using an ELM or an ANN
Water Resources Research
MEWES ET AL. 11 of 20
Figure 6. RMSE of the normalized tracer concentrations of SVM, CART, ELM, and ANN for univariate and multivariate algorithms. The variability shows the
inﬂuence of the learning threshold on the development of RMSE in the catchment. The RMSE results are similar to the results from cTand show that the error
relates to the average tracer concentration and that for some catchments problems in the prediction occur, like catchment Baget. The choice of the machine has only
a small inﬂuence on the error and depends on the region.
Water Resources Research
MEWES ET AL. 12 of 20
The Acc value describing the correct ranking of tracer concentrations shows for all catchments that at least
40% of the rankings are estimated correctly (Figure 7). None of the machines reached mean Acc values >70%.
Here, the choice of machines has an inﬂuence on the dynamics of the tracer concentrations. The Acc values
were highest for catchment Baget compared to all other catchments, while showing the highest variability of
cT. The multivariate prediction does not automatically improve the results in terms of Acc at all catchments,
and the improvement or deterioration varies among the applied approaches (e.g., SVM and ELM in Durzon).
The reason behind this might be found in the interplay of information content, regional aspects of the catch-
ment, and the quality of the input data. Therefore, it is out of scope of this paper to check the causality of the
preferred choice. Nevertheless, in most catchments, the multivariate machines improve Acc. Again, the
choice of the machine has less impact on results and it is merely a catchment speciﬁc question.
The inﬂuence of the chosen window length on the prediction capability of NO
is exempliﬁed by
the cTvalues of all four (univariate) machines in catchment Source de Fontestorbes (Figure 8). Generally,
either very short windows (1–4 days) or long windows (>60 days) lead to good results, while window lengths
in between worsen the results for SVM, CART, and ELM. For further information on the window depen-
dency of the other catchments, which are very similar to the information we derived from our example,
we refer to the supporting information.
As a good example for choosing an approach with the required number of training data for a catchment, we
elaborate the case of Fontbelle (Figure 9). Here, ANN and SVM obtain cTvalues close to the optimum of 1.0,
but the ANN results in lower RMSE values than the SVM. Therefore, we chose the ANN to predict tracer
concentrations in this catchment. The resulting time series (predicted by an ANN trained with 70% of the
available measurements) reveals that the measured tracer concentrations and the predicted time series show
an acceptable agreement with the mean value of concentration captured as well as the general ability to pre-
dict concentrations at all levels measured.
Taking a closer look at the prediction capability for SO
, we can see that the multivariate approach inter-
polates concentration in the same range, even close to a concentration of 0.0 mg/L (red marked area in
Figure 9). The multivariate approach is able to cover the peaks, while the univariate approach predicts values
close to the mean concentration. Interestingly, the mean tracer concentration rises over time using the uni-
variate approach. However, the behavior NO
is different: The univariate prediction shows a variability that
reﬂects the measured tracer concentrations better, while the multivariate machine predictions show too low
variability around the observed mean concentration. As shown by the red marked area of Figure 8, the uni-
variate approach allows interpolating NO
concentrations from Day 2,000 to Day 3,200. The following
decreasing trend cannot be interpolated, and thus, the approach lacks a signiﬁcant performance here from
Day 3,200 until the end.
Missing tracer measurements in terms of gaps or irregular measurement campaigns are the major downside
in using these data to develop models for system characterization. In many cases, it is not possible to repeat
the measurements for the desired tracers, for instance, when data are obtained from online databases like
the U.S. Geological Survey. Furthermore, only limited knowledge is available on the information content
of the data used in tracer‐aided modeling (Hartmann et al., 2017; Kelleher et al., 2019). Our results indicate
that machine learning algorithms represent a valuable technique to predict some characteristics of tracer
concentrations in the karstic environments. Even though none of the machine learning methods were able
to describe the complete dynamics between the two tracers with high precision, our comparative approach
of using different machine learning methods allows us to choose the most appropriate method describing a
speciﬁc characteristic at a speciﬁc site. Hence, we are able to predict key characteristics like the mean con-
centration and the relative ranking of tracers in a joint tracer analysis. The reason that tracer concentration
dynamics could not entirely be predicted by discharge alone is the low information content thereof com-
pared to the shared information of the tracers. The use of ancillary data or more sophisticated approaches
to improve our prediction is hampered by data limitations or unsecure quality (in terms of measurement
quality). Consequently, the prediction capability of the algorithms is lowered by the limitation to discharge
data and the low temporal resolution of concentration measurements. Thus, results have to be interpreted
carefully and with special regard to the information content of the underlying data.
Water Resources Research
MEWES ET AL. 13 of 20
Figure 7. Accuracy Acc of the applied machines with both the univariate and the multivariate approach. The variability of the plots shows the inﬂuence of amount
of training data used for prediction.
Figure 8. Dependency of chosen window length on cTat source de Fontestorbes of all applied machines. Good results are either achieved by short sequences (1–
4 days) or long sequences (>60 days).
Water Resources Research
MEWES ET AL. 14 of 20
Like for other machine learning applications in hydrology, the choice of the most promising algorithm has to
be found through trial and error (He et al., 2014; Raghavendra & Deka, 2014). Hence, we adapted the research
design to the No‐Free‐Lunch‐Theorem (Wolpert & Macready, 1997) and compared four different algorithms
from two of the main machine learning families (Kelleher et al., 2015). We assumed that discharge data are
able to provide enough information to describe the interplay between tracer measurements and to predict the
concentrations. However, the continuous entropy of discharge and mutual information between NO
emphasized that the information needed to describe the interplay between this pair of tracers is far
higher than the continuous entropy of the discharge data alone. Although the algorithms were able to predict
certain aspects like the mean concentration and peaks quite well, the complete variability could not be pre-
dicted. In contrast to concentration‐discharge relations that require distinct knowledge on the measured data
and the catchment, our study shows that machine learning algorithms can be trained from databases with
few discontinuous measurements to provide continuous reconstructions of tracer concentrations.
With knowledge on the required information content and the delivered information content, we were not
able to distinguish properly among the different approaches and a further choice would depend strongly
on the focus of the task: Would we like to predict the tracer concentration, or is the ranking of tracer meth-
ods for the dynamic description more important? This lack of a clear preference of the chosen machine
learning methods can also be observed in other comparative machine learning studies in hydrology, for
example, in ﬂood event separation (Mewes & Oppel, 2019) and the simulation of streamﬂow (Shortridge
et al., 2016). Similarly to their results, there might not be a single machine for all purposes that works with
our data set, but a set of machines that work together to deliver the desired results, which was shown to be
useful for hydrological modeling in general (Clark et al., 2008). We assume that the interplay of the informa-
tion content of the tracers and discharge determines the choice of the best working algorithm. This assumed
link between information content of data, prediction performance, and method preference might be a way to
regionalize karst catchments by a data‐driven approach (Abdollahi et al., 2017).
Figure 9. Interpolated time series of SO
predicted for catchment Fontbelle with a univariate and a multivariate ANN and 80% of data used for train-
ing. The black line indicates the observed discharge dynamics.
Water Resources Research
MEWES ET AL. 15 of 20
Consequently, our comparative analysis of algorithms and learning approaches allowed setting up a strategy
to use the aforementioned algorithms to predict tracer signatures. Interestingly, the length of the input
sequence of discharge consists of two groups: a group that prefers short windows and a group that prefers
long windows. This might be related to different processes that relate to the transition time of the karst
spring, which means that we use the information of the time spent by the water in the karst system
(Hartmann et al., 2016). While SO
requires long times to dissolve from the karstic rock to the water,
dissolves faster. This is the reason that the two tracers are investigated: to separate slow from fast
water. Here, SO
could be predicted better by long windows of input data, while NO
had higher perfor-
mances with short input windows.
Apparently, the information that we use right now is sufﬁcient for peak concentrations and the mean values,
but concentrations of SO
close to nearly 0.0 mg/L lead to errors (Figure 9). Hence, processes that lead to
concentrations in the discharge are not yet covered by the discharge data and should be included
with ancillary data. Such multi‐input machine learning applications are widely used in remote sensing and
other applications but underrepresented in hydrology because knowledge on the information content of the
input data is crucial for their application and that remains unknown in many hydrological studies
(Mountrakis et al., 2011; Piotrowski et al., 2007; Zheng et al., 2015).
Overall, our investigations show that we cannot state a clear preference toward a single approach. However,
the introduction of a comparative framework helps to identify the most appropriate solution to predict tracer
concentrations for a speciﬁc catchment. In the following parts of the discussion, we adapt our concept of
entropy and present a preliminary framework that could be used to predict tracer concentrations.
4.1. Improvements for Concept of Entropy
Due to the mixed results of the multivariate approach, we analyzed the results of both approaches, univari-
ate and multivariate, as an example and learned that we need one tracer to predict the other. As we can see
from the interpolated time series of catchment Fontbelle, the multivariate approach performed better for
than for NO
. Therefore, the additional information from NO
helped the algorithm to ﬁnd the pat-
tern in SO
. Hence, a framework should consist of a univariate ANN to predict NO
, which acts as addi-
tional information to predict SO
To reveal the relationship of explanatory power between predictors and variables, we transfer the concept of
mutual information to conditional, or relative, entropy (Chacon‐Hurtado et al., 2017; Corzo & Solomantine,
2007; Keum & Coulibaly, 2017). The conditional entropy shows that NO
has a higher conditional explana-
tory power than SO
to be predicted by discharge (Figure 10). This means that a univariate approach is
Figure 10. Conditional or relative entropy of NO
of catchment Fontbelle. Relative entropy shows the expla-
natory power how much of the entropy of discharge can be used to predict a single tracer.
Water Resources Research
MEWES ET AL. 16 of 20
more beneﬁcial to predict NO
than it is for predicting SO
. Consequently, we can use the concept of con-
ditional entropy to decide whether a univariate or a multivariate approach should be preferred and which
tracer measurement can be used as ancillary data for the prediction of other tracer concentrations.
4.2. Application of Machine Learning in Interpolation of Tracer Time Series
Discharge separation by tracers relies on tracer observations which are often limited in availability (Birkel &
Soulsby, 2015; Klaus & McDonnell, 2013). We assumed that machine learning is a tool to interpolate time
series of tracers by discharge observations. Keeping the aforementioned downsides of machine learning in
mind, the shown interpolation capability of the algorithms is a valuable addition to discharge separation
applications (Garvelmann et al., 2017; Klaus & McDonnell, 2013).
As the explanatory power of discharge alone is too low to describe the interplay between the tracers in all its
variations, the question toward the ﬁlling of the gaps by machine learning tools has to be precise. In our fra-
mework, an extensive preanalysis was conducted to show the general applicability in terms of RMSE and cT
for all considered algorithms and amounts of available training data. The length of the input sequence again
is a source for uncertainty in our approach, but we were able to link good prediction results with the geo-
chemical residence time of the tracer in the system. So, for hypothesis testing on transit times, the machine
learning approach can be utilized. To describe the uncertainty of the prediction, both lengths of input
sequences should be used: a short window length of discharge to catch short residence time processes and
a long window of discharge to catch slow processes. Nevertheless, the deﬁnition of short and long windows
is catchment speciﬁc and has to be determined either by a data‐driven preanalysis or detailed knowledge of
the respective catchment, which would be identical to the calibration of a hydrological model (Hartmann
et al., 2014; Wu & Chau, 2011).
Our initial study focus explored the use of machine learning algorithms for the prediction of tracer measure-
ments. Since time series of tracer measurements are often too sparse for modeling, machine learning tools
can potentially be useful for researchers with limited access to environmental tracer data or limited resources
to obtain additional measurements. We could show that our selected machine learning tools were able to
identify some characteristics of the observed tracer concentrations like average concentrations or the appro-
priate constellations of tracer concentrations at the selected test sites. Our analysis also revealed that the
information content of discharge alone is not sufﬁcient to predict tracer concentrations with all its entire
variability, as the mutual information between the pairs of tracers is higher than the continuous entropy of
the discharge data. For that reason, the prediction capability of the machine learning algorithms is lowered
substantially. The interpretation of the predicted time series has to be done with care, because the predicted
time series lack extreme concentrations that are abundant in the observations.
Moreover, we were able to build a preliminary framework that creates an ensemble of predictions addressing
the uncertainty of a machine learning‐based approach by eliminating the bias of the chosen input sequence
length and the learning approach of the algorithms. All methods considered in this paper deliver acceptable
results in comparison, but the choice of the most suitable algorithm remains catchment speciﬁc and should
be based on site‐speciﬁc knowledge (e.g., residence time estimations) or extensive data‐driven preanalysis.
We found that the amount of required training data is high, as the mutual information between the pair
of tracers requires at least 60% of the available data to reach a plateau. Hence, the training of the machines
is not likely to be successful in data‐poor regions.
We conclude from our investigations that the setup of a framework to predict tracer concentrations with
machine learning tools remains challenging. Nevertheless, we show that the process of setting up the
machine learning‐based ensemble framework can be facilitated by information‐based analyses like the con-
cept of entropy, conditional entropy, and mutual information. Knowledge on the information content of the
data helps to justify the nonobvious choice of methods facing “black‐box”machine learning approaches.
Moreover, they could be the basis for future regionalization of catchments and the transfer of trained
machines to data‐poor regions, in case the machine learning approaches were trained in information‐rich
environments. By the training with information‐rich training data, linkages between processes that are hid-
den in data, like discharge data, become transferable and quantiﬁable. Hydrological models, on the other
Water Resources Research
MEWES ET AL. 17 of 20
hand, require the same amount of data regardless of their information content. So measurements too few for
traditional hydrological models may still contain sufﬁcient information to improve machine learning mod-
els. Overall, we are just at the doorstep to use data‐driven approaches in hydrology, especially in complex
environments like karst. Disregarding the problems that we still have to face in the future, advanced data‐
driven machine learning approaches may allow further improvements of data analysis, model calibration,
and model development.
Although there is no silver bullet in predicting tracer concentrations, we could show by the input win-
dow analysis that the characteristics of the assumed transit time of tracers becomes visible in the most
suitable input window lengths for the prediction. However, through analyzing on how a machine learns
data patterns and investigating the results of the prediction, our study highlights the importance of an
information content analysis. This opens the ﬁeld of further entropy‐based approaches of data mining in
hydrological contexts, especially in often data‐sparse applications like karst hydrology.
The analysis and writing was conducted by B. M. and supported and advised by H. O. and A. H. V. M. con-
ducted the literature review on the investigated karst regions, the site selection, and data preparation.
Abdollahi, S., Raeisi, J., Khalilianpour, M., Ahmadi, F., & Kisi, O. (2017). Daily mean streamﬂow prediction in perennial and non‐perennial
rivers using four data driven techniques. Water Resources Management,31(15), 4855–4874.
Aquilina, L., Ladouche, B., & Dörﬂiger, N. (2005). Recharge processes in karstic systems investigated through the correlation of chemical
and isotopic composition of rain and spring‐waters. Applied Geochemistry,20(12), 2189–2206.
Bailly‐Comte, V., Ladouche, B., Allanic, C., Bitri, A., Moiroux, F., Monod, B., et al. (2018). Evaluation des ressources en eaux souterraines
du Plateau de Sault ‐Amélioration des connaisances sur les potentialiés de la ressource et cartographi e de la vulnérabilité. Rapport ﬁnal.
BDLisa. (2019). Retrieved from https://bdlisa.eaufrance.fr
Bicalho, C. C., Batiot‐Guilhe, C., Taupin, J. D., Patris, N., Van Exter, S., & Jourde, H. (2017). A Conceptual Model for Groundwater
Circulation Using Isotopes and Geochemical Tracers Coupled with Hydrodynamics: A Case Study of the Lez Karst System. Chemical
Geology, (February 2016), 0–1: France. https://doi.org/10.1016/j.chemgeo.2017.08.014
Birkel, C., & Soulsby, C. (2015). Advancing tracer‐aided rainfall–runoff modelling: A review of progress, problems and unrealised potential .
Hydrological Processes,29(25), 5227–5240. https://doi.org/10.1002/hyp.10594
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classiﬁcation and Regression Trees. Taylor &: Francis.
Chacon‐Hurtado, J. C., Alfonso, L., & Solomatine, D. P. (2017). Rainfall and streamﬂow sensor network design: A review of applications,
classiﬁcation, and a proposed framework. Hydrology and Earth System Sciences,21(6), 3071–3091. https://doi.org/10.5194/hess‐21‐3071‐
Chang, Y.‐W., Hsieh, C.‐J., Chang, K.‐W., Ringgaard, M., & Lin, C.‐J. (2010). Training and testing low‐degree polynomial data mappings via
linear SVM. Journal of Machine Learning Research,11, 1471–1490.
Clark, M. P., Slater, A. G., Rupp, D. E., Woods, R. A., Vrugt, J. A., Gupta, H. V., et al. (2008). Framework for understanding structural errors
(FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research,44(12).
Cortes, C., & Vapnik, V. (1995). Support‐vector networks. Machine Learning,20(3), 273–297.
Corzo, G., & Solomantine, D. (2007). Baseﬂow separation techniques for modular artiﬁcial neural network modelling in ﬂow forecasting.
Hydrological Sciences Journal,52(3), 491–507. https://doi.org/10.1623/hysj.52.3.491
Eaufrance (2018a). ADES: Portail nationale d'Accès aux Données sur les Eaux Souterraines, http://www.ades.eaufrance.fr/
Eaufrance (2018b). Banque Hydro, http://hydro.eaufrance.fr/
Fernando, T. M. K. G., Maier, H., & Dandy, G. (2009). Selection of input variables for data driven models: An average shifted histogram
partial mutual information estimator approach. Journal of Hydrology,367. https://doi.org/10.1016/j.jhyd rol.2008.10.019
Fleury, P., Ladouche, B., Conroux, Y., Jourde, H., & Dörﬂiger, N. (2009). Modelling the hydrologic functions of a karst aquifer under active
water management—The Lez spring. Journal of Hydrology,365(3–4), 235–243. https://doi.org/10.1016/j.jhydrol.2008.11.037
Fotovatikhah, F., Herrera, M., Shamshirband, S., Chau, K.‐W., Ardabili, S. F., & Piran, M. J. (2018). Survey of computational intelligence as
basis to big ﬂood management: Challenges, research directions and future work. Engineering Applications of Computational Fluid
Mechanics,12(1), 411–437. https://doi.org/10.1080/19942060.2018.1448896
Garvelmann, J., Warscher, M., Leonhardt, G., Franz, H., Lotz, A., & Kunstmann, H. (2017). Quantiﬁcation and characterization of the
dynamics of spring‐and stream water systems in the Berchtesgaden Alps with a long‐term stable isotope dataset. Environmental Earth
Sciences,76(22), 1–17. https://doi.org/10.1007/s12665‐017‐7107‐6
Gong, W., Yang, D., Gupta, H. V., & Nearing, G. (2014). Estimating information entropy for hydrological data: One‐dimensional case.
Water Resources Research,50(6), 5003–5018. https://doi.org/10.1002/2014WR015874
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge: MIT Press.
Gur, D., Bar‐Matthews, M., & Sass, E. (2003). Hydrochemistry of the main Jordan River sources: Dan, Banias, and Kezinim springs, north
Hula Valley, Israel. Israel Journal of Earth Sciences,52. https://doi.org/10.1560/RRMW‐9WXD‐31VU‐MWHN
Han, J., & Kamber, M. (2010). Data Mining: Concepts and Techniques, the Morgan Kaufmann Series in Data Management Systems.
Hartmann, A., Barberá, J. A., & Andreo, B. (2017). On the value of water quality data and informative ﬂow states in karst modelling.
Hydrology and Earth System Sciences,21(12), 5971–5985. https://doi.org/10.5194/hess‐21‐5971‐2017
Water Resources Research
MEWES ET AL. 18 of 20
Support to Andreas Hartmann was
provided by the Emmy Noether‐
Programme of the German Research
Foundation (DFG; Grant HA 8113/1‐1;
project “Global Assessment of Water
Stress in Karst Regions in a Changing
Hartmann, A., Gleeson, T., Rosolem, R., Pianosi, F., Wada, Y., & Wagener, T. (2015). A large‐scale simulation model to assess karstic
groundwater recharge over Europe and the Mediterranean. Geoscientiﬁc Model Devel opment,8(6), 1729–1746. https://doi.org/10.5194/
Hartmann, A., Goldscheider, N., Wagener, T., Lange, J., & Weiler, M. (2014). Karst water resources in a changing world: Review of
hydrological modeling approaches. Reviews of Geophysics,52(3), 218–242. https://doi.org/10.1002/2013RG000443
Hartmann, A., Kobler, J., Kralik, M., Dirnböck, T., Humer, F., & Weiler, M. (2016). Model‐aided quantiﬁcation of dissolved carbon and
nitrogen release after windthrow disturbance in an Austrian karst system. Biogeosciences,13(1), 159–174. https://doi.org/10.5194/bg‐13‐
Haykin, S. (1999). Neural Networks: A Comprehensive Foundation. Upper Saddle River: Prentice Hall.
He, Z., Wen, X., Liu, H., & Du, J. (2014). A comparative study of artiﬁcial neural network, adaptive neuro fuzzy inference system and
support vector machine for forecasting river ﬂow in the semiarid mountain region. Journal of Hydrology,509, 379–386. https://doi.org/
Hu, C, J.‐j. Wang, Z.‐n. Wu, and Lina‐Liu (Eds.) (2011). Application of the support vector machine on precipitation‐runoff modelling in
Fenhe River, 2011 International Symposium on Water Resource and Environmental Protection, 1099–1103, vol. 2.
Huang, G.‐B., Zhu, Q.‐Y., & Siew, C.‐K. (2004). Extreme learning machine: A new learning scheme of feedforward neural networks. Neural
Jacob, T., Bayer, R., Chery, J., Jourde, H., Le Moigne, N., Boy, J.‐P., et al. (2008). Absolute gravity mon itoring of water storage variation in a
karst aquifer on the Larzac plateau (southern France). Journal of Hydrology,359(1–2), 105–117. https://doi.org/10.1016/j.
Jourde, H., Massei, N., Mazzili, N., & Binet, S. (2018). SNO KARST: A French network of observatories for the multidisciplinary study of
critical zone processes in karst watersheds and aquifers. Vadose Zone Journal,17(1).
Kavouri, K., Plagnes, V., Tremoulet, J., Dörﬂiger, N., Rejiba, F., & Marchet, P. (2011). PaPRIKa: A method for estimating karst resource and
source vulnerability—Application to the Ouysse karst system (Southwest France). Hydrogeology Journal,19(2), 339–353. https://doi.org /
Kelleher, C., Ward, A., Knapp, J. L. A., Blaen, P. J., Kurz, M. J., Drummond, J. D., et al. (2019). Exploring tracer information and model
framework trade‐offs to improve estimation of stream transient storage processes. Water Resources Research,55,3481–3501. https://doi.
Kelleher, J. D., Mac Namee, B., & D'Arcy, A. (2015). Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked
Examples, and Case Studies. Cambridge: MIT Press.
Keum, J., & Coulibaly, P. (2017). Information theory‐based decision support system for integrated design of multivariable hydrometric
networks. Water Resources Research,53, 6239–6259. https://doi.org/10.1002/2016WR019981
Kirchner, J. W. (2003). A double paradox in catchment hydrology and geochemistry. Hydrological Processes,17(4), 871–874.
Klaus, J., & McDonnell, J. J. (2013). Hydrograph separation using stable isotopes: Review and evaluation. Journal of Hydrology,505,47–64.
Labat, D., Mangin, A., & Ababou, R. (2002). Rainfall‐runoff relations for karstic springs: Multifractal analyses. Journal of Hydrology,256,
Le Moine, N., Andréassian, V., & Mathevet, T. (2008). Confronting surface‐and groundwater balances on the La Rochefoucauld‐Touvre
karstic system (Charente, France). Water Resources Research,44, W03403. https://doi.org/10.1029/2007WR005984
Lee, E. S., & Krothe, N. C. (2001). A four‐component mixing model for water in a karst terrain in south‐Central India na, USA. Using solute
concentration and stable isotopes as tracers. Chemical Geology,179(1), 129–143.
Mahler, B. J., & Garner, B. D. (2009). Using nitrate to quantify quick ﬂow in a karst aquifer. Ground Water,47(3), 350–360. https://doi.org/
Mei, Y., & Anagnostou, E. N. (2015). A hydrograph separation method based on information from rainfall and runoff records. Journal of
Hydrology,523, 636–649. https://doi.org/10.1016/j.jhydrol.2015.01.083
Mewes, B., & Oppel, H. (2019). A comparative analysis of machine learnin g tools for hydrograph separation. Frontiers in Water Complexity.
Mountrakis, G., Im, J., & Ogole, C. (2011). Support vector machines in remote sensing: A review. ISPRS Journal of Photogrammetry and
Remote Sensing,66(3), 247–259. https://doi.org/10.1016/j.isprsjprs.2010.11.001
Mudarra, M., & Andreo, B. (2011). Relative importance of the saturated and the unsaturated zones in the hydrogeological functioning
of karst aquifers: The case of Alta Cadena (southern Spain). Journal of Hydrology,
Mudarra, M., Hartmann, A., & Andreo, B. (2019). Combining experimental methods and modeling to quantify the complex recharge
behavior of karst aquifers. Water Resources Research,55, 1384–1404. https://doi.org/10.1029/2017WR021819
Nasrabadi, N. M. (2007). Pattern recognition and machine learning. Journal of Electronic Imaging ,16(4), 49,901.
Nourani, V., Komasi, M., & Mano, A. (2009). A multivariate ANN‐wavelet approach for rainfall–runoff modeling. Water Resources
Management,23(14), 2877. https://doi.org/10.1007/s11269‐009‐9414‐5
Piotrowski, A., Wallis, S. G., Napiórkowski, J. J., & Rowiński, P. M. (2007). Evaluation of 1‐D tracer concentrat ion proﬁle in a small river by
means of multi‐layer perceptron neural networks. Hydrology and Earth System Sciences,11(6), 1883–1896. https://doi.org/10.5194/hess‐
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning,1(1), 81–106. https://doi.org/10.1007/BF00116251
Raghavendra, N. S., & Deka, P. C. (2014). Support vector machine applications in the ﬁeld of hydrology: A review. Applied Soft Computing,
19, 372–386. https://doi.org/10.1016/j.asoc.2014.02.002
Rimmer, A., & Hartmann, A. (2014). Optimal hydrograph separation ﬁlter to evaluate transp ort routines of hydrological models. Journal of
Hydrology,514, 249–257. https://doi.org/10.1016/j.jhydrol.2014.04.033
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal,27(3), 379–423. https://doi.org/
Sharma, A. (2000). Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1—A strategy for
system predictor identiﬁcation. Journal of Hydrology,239(1), 232–239.
Shortridge, J. E., Guikema, S. D., & Zaitchik, B. F. (2016). Machine learning methods for empirical streamﬂow simulation: A comparison of
model accuracy, interpretability, and uncertainty in seasonal watersheds. Hydrology and Earth System Sciences,20(7), 2611–2628.
Shrestha, D. L., & Solomatine, D. P. (2006). Machine learning approaches for estimation of prediction interval for the model output. Neural
Networks,19(2), 225–235. https://doi.org/10.1016/j.neunet.2006.01.012
Water Resources Research
MEWES ET AL. 19 of 20
Tabari, H., Kisi, O., Ezani, A., & Hosseinzadeh Talaee, P. (2012). SVM, ANFIS, regression and climate based models for reference evapo-
transpiration modeling using limited climatic data in a semi‐arid highland environment. Journal of Hydrology,444,78–89. https://doi.
Taormina, R., Chau, K.‐W., & Sivakumar, B. (2015). Neural network river forecasting through baseﬂow separation and binary‐coded
swarm optimization. Journal of Hydrology,529, 1788–1797.
Thomas, J. A., & Cover, T. M. (2006). Elements of Information Theory. NY, USA: Wiley New York.
Vapnik, V. (2013). The Nature of Statistical Learning Theory. New York: Springer science & business media.
Weiler, M., Seibert, J., & Stahl, K. (2017). Magic components—Why quantifying rain, snow‐and icemelt in river discharge isn't easy,
Hydrological Processes,32(1), 160–166, doi:https://doi.org/10.1002/hyp.11361
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation,
Wu, C. L., & Chau, K. W. (2011). Rainfall–runoff modeling using artiﬁcial neural network coupled with singular spectrum analysis. Journal
of Hydrology,399(3–4), 394–409.
Yaseen, Z. M., Jaafar, O., Deo, R. C., Kisi, O., Adamowski, J., Quilty, J., & El‐Shaﬁe, A. (2016). Stream‐ﬂow forecasting using extreme
learning machines: A case study in a semi‐arid region in Iraq. Journal of Hydrology,542, 603–614. https://doi.org/10.1016/j.
Yu, P.‐S., Yang, T.‐C., Chen, S.‐Y., Kuo, C.‐M., & Tseng, H.‐W. (2017). Comparison of random forests and support vector machine for real‐
time radar‐derived rainfall forecasting. Journal of Hydrology,552,92–104. https://doi.org/10.1016/j.jhydrol.2017.06.020
Zheng, B., Myint, S. W., Thenkabail, P. S., & Aggarwal, R. M. (2015). A support vector machine to identify irrigated crop types using time‐
series Landsat NDVI data. International Journal of Applied Earth Observation and Geoinf ormation,34, 103–112. https://doi.org/10.1016/
Water Resources Research
MEWES ET AL. 20 of 20