Content uploaded by Korbinian Breinl
Author content
All content in this area was uploaded by Korbinian Breinl on Dec 17, 2018
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=thsj20
Hydrological Sciences Journal
ISSN: 0262-6667 (Print) 2150-3435 (Online) Journal homepage: http://www.tandfonline.com/loi/thsj20
Model averaging versus model selection:
estimating design floods with uncertain river flow
data
Kenechukwu Okoli, Korbinian Breinl, Luigia Brandimarte, Anna Botto, Elena
Volpi & Giuliano Di Baldassarre
To cite this article: Kenechukwu Okoli, Korbinian Breinl, Luigia Brandimarte, Anna Botto, Elena
Volpi & Giuliano Di Baldassarre (2018) Model averaging versus model selection: estimating design
floods with uncertain river flow data, Hydrological Sciences Journal, 63:13-14, 1913-1926, DOI:
10.1080/02626667.2018.1546389
To link to this article: https://doi.org/10.1080/02626667.2018.1546389
© 2018 The Author(s). Published by Informa
UK Limited, trading as Taylor & Francis
Group
View supplementary material
Accepted author version posted online: 09
Nov 2018.
Published online: 06 Dec 2018.
Submit your article to this journal
Article views: 94
View Crossmark data
Model averaging versus model selection: estimating design floods with
uncertain river flow data
Kenechukwu Okoli
a,b
, Korbinian Breinl
a,b
, Luigia Brandimarte
c
, Anna Botto
d
, Elena Volpi
e
and Giuliano Di Baldassarre
a,b
a
Department of Earth Sciences, Uppsala University, Uppsala, Sweden;
b
Centre of Natural Hazards and Disaster Science (CNDS), Uppsala,
Sweden;
c
Department of Sustainable Development, Environmental Science and Engineering, Royal Institute of Technology, Stockholm,
Sweden;
d
Department of Civil, Environmental and Architectural Engineering, University di Padova, Padova, Italy;
e
Department of “Scienze
dell’Ingegneria Civile”, University of “Roma Tre”, Rome, Italy
ABSTRACT
This study compares model averaging and model selection methods to estimate design floods,
while accounting for the observation error that is typically associated with annual maximum flow
data. Model selection refers to methods where a single distribution function is chosen based on
prior knowledge or by means of selection criteria. Model averaging refers to methods where the
results of multiple distribution functions are combined. Numerical experiments were carried out
by generating synthetic data using the Wakeby distribution function as the parent distribution.
For this study, comparisons were made in terms of relative error and root mean square error
(RMSE) referring to the 1-in-100 year flood. The experiments show that model averaging and
model selection methods lead to similar results, especially when short samples are drawn from
a highly asymmetric parent. Also, taking an arithmetic average of all design flood estimates gives
estimated variances similar to those obtained with more complex weighted model averaging.
ARTICLE HISTORY
Received 9 March 2018
Accepted 13 September 2018
EDITOR
A. Castellarin
ASSOCIATE EDITOR
S. Vorogushyn
KEYWORDS
model averaging; model
selection; design flood;
Akaike information criterion
1 Introduction
A common task in applied hydrology is the estima-
tion of the design flood, i.e. a value of river dis-
charge corresponding to a given exceedence
probability that is often expressed as a return per-
iod in years. Flood risk assessment, floodplain map-
ping and the design of hydraulic structures are
a few examples of applications where estimates of
design floods are required. Two common
approaches for estimating a design flood are either
rainfall–runoff modelling (e.g. Moretti and
Montanari 2008, Beven 2012,Breinl2016)orthe
fitting of a probability distribution function to
a record of annual maximum or peak-over-
threshold flows (Viglione et al.2013, Yan and
Moradkhani 2016).Thelatterapproach,whichis
the focus of this paper, has been referred to in the
literature as the “standard approach”to the fre-
quency analysis of floods (Klemeš1993). The stan-
dard approach is affected by various sources of
uncertainty, including: the choice of the sample
technique (peak-over-threshold or annual maxi-
mum flows), a limited sample size, the selection
of a suitable probability distribution function, the
method of parameter estimation for the chosen
distribution function, and errors in the observed
annual peak flows derived from a rating curve
(Sonuga 1972;Laioet al.2009; Di Baldassarre
et al.2012)
It is common practice in any form of modelling
or statistical analysis (including flood frequency
analysis) to consider a range of models as possible
representations of the observed reality. A single
model is usually selected based on different criteria,
such as (a) goodness-of-fit statistics, e.g. by using
the chi-squared (χ
2
) test; (b) prior selection of
a distribution function as a result of what
Chamberlain (1965) referred to as “parental affec-
tion”towards a given model; or (c) standardization,
such as the log-Pearson Type III distribution used
for flood frequency analysis in the USA (US Water
Resources Council 1982). In the field of flood fre-
quency analysis, the selection of a single best dis-
tribution function represents an implicit
assumption that the selected model can adequately
describe the frequency of observed and future
floods, including the extreme ones. This
CONTACT Kenechukwu Okoli kenechukwu.okoli@geo.uu.se
Supplementary data for this article can be accessed here
HYDROLOGICAL SCIENCES JOURNAL
2018, VOL. 63, NOS. 13–14, 1913–1926
https://doi.org/10.1080/02626667.2018.1546389
© 2018 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-
nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built
upon in any way.
assumption departs from the understanding that
low and ordinary floods (which usually make up
the annual peak flow record) are dominated by
different processes compared to extreme floods,
which are often the main focus in flood risk man-
agement. Therefore, the selection of a single distri-
bution model, which is valid for the whole range of
flows, may lead to uncertainty in the design flood
estimates. Also, smaller floods are known to influ-
ence the smoothing and extrapolation of the largest
discharges in the record and in turn may lead to
uncertain estimates of the design flood (Klemeš
1986).
Experience from evaluating probability plots of
discharge records shows that different distribution
functions commonly used in flood frequency ana-
lysis give similar fits to data. The reason for this is
that a majority of the parametric models used in
flood frequency analysis have two or three para-
meters and are built to preserve the mean and
variance of the calibration data (Koutsoyiannis
2004). Hence, there is always a model choice uncer-
tainty when a particular distribution function is
selected for estimation purposes. Within the hydro-
logical modelling community, the phenomenon
where different models give a similar fit to data
has been referred to as “equifinality”(Beven 1993,
2006). In that context, Beven and Binley (1992)
developed the generalized likelihood uncertainty
estimation (GLUE) to support ensemble predictions
of a model output variable. Just like GLUE, other
techniques on how to combine estimates from dif-
ferent model structures (and parameter sets) using
weights were developed and are generally referred
to as model averaging (Hoeting et al.1999,
Burnham and Anderson 2002). Bayesian model
averaging (BMA), for example, is used extensively
in hydrogeology (Tsai and Li 2008,Yeet al.2010,
Foglia et al.2013) to quantify predictive uncer-
tainty when diverse conceptual models are used
for recharge and/or hydraulic conductivity esti-
mates. The reader is referred to Schöniger et al.
(2014) and Volpi et al.(2017)foradetaileddis-
cussion on Bayesian model evidence (BME) for
hydrological applications, especially when the pro-
blem of model selection is addressed using BMA.
Uncertainties present in the record of annual maximum
flows are often neglected. For example, flood discharges,
which are considerably larger than the directly measured
discharges, and are therefore derived by extrapolating the
rating curve, are subject to major errors, which may in turn
impact the estimate of sample statistics such as the skew-
ness (Potter and Walker 1985). Kuczera (1996,1992)
showed that significant uncertainty in the design flood
estimate is often caused by errors in discharge data derived
from a rating curve. Other studies made use of numerical
approaches based on hydraulic modelling or Monte Carlo
sampling to quantify the uncertainty in flow data due to
rating curve errors (Di Baldassarre and Montanari 2009,
Westerberg and McMillan 2015). According to their find-
ings, the uncertainty present in derived discharges may
add up to 30% or more.
Given this background, in this study we account for
two sources of uncertainty that can significantly affect
the design flood estimate: errors in the river flow data,
i.e. annual maximum flows derived from a rating
curve, and the choice of distribution function.
We compare model selection (denoted here as MS) with
two different types of model averaging: arithmetic model
averaging (denoted as MM) and weighted model averaging
(MA). Model selection refers to a case where a single best
distribution function is selected based on a selection
criterion; MM describes the averaging by applying the
arithmetic average of all estimated design floods; and MA
refers to the application of a weighted average of design
flood estimates from different probability functions (with
weights based on a selection criterion). We used the Akaike
information criterion (AIC) as a selection criterion for both
MS and MA. The study was conducted in a simulation
framework using the Wakeby distribution as the parent
model for generating synthetic annual maximum flows of
different sample sizes.
The aims of our study are as follows: (a) to simu-
late the systemic uncertainty in the real-world sce-
nario; that is, in the real world, the parent
distribution is unknown and likely more complex
than the simpler distribution functions used for fit-
ting and estimation purposes; (b) to make
a systematic assessment and comparison of the per-
formance of alternative methods for estimating
design floods (MS vs MA vs MM) and; (c) to analyse
the effect of flood data errors.
Thecomparisonisbasedontherelativeerrors
across the three techniques and the respective can-
didate distribution functions. The 1-in-100 year
flood,i.e.thedischargevaluecorrespondingto
a return period of 100 years (hereafter 100-year
flood), is selected as the design flood of interest
due to its wide use as a design standard in flood
risk management (Brandimarte and Di Baldassarre
2012). For example, the current policy in the USA
for flood defence design refers to the 100-year
flood (Commission on Geosciences Environment
and Resources). The analyses presented in this
study are built on the assumption of stationarity,
which has been widely discussed in hydrology (e.g.
1914 K. OKOLI ET AL.
Milly et al.2008, Montanari and Koutsoyiannis
2014, Serinaldi and Kilsby 2015,Lukeet al.2017)
and is not further discussed here.
2 Methods
The problem of the MS and MA methods is formu-
lated as follows: a record of a random variable Xis
available and sampled from an unknown parent dis-
tribution g(x). The samples are arranged in ascend-
ing order x
1
≤x
2
…≤x
N
. A set of probability
distribution functions, whose general mathematical
form can be written as fx
ijθðÞwith θas model para-
meter, are specified as potential candidates for design
flood estimation. To implement the MS and MA
techniques, we used the Akaike selection criterion,
which is a commonly used method for model com-
parisoninhydrology(e.g.Mutua1994, Strupczewski
et al.2001). MS techniques based on information
theory require the estimation of a measure of dis-
crepancy, or amount of information loss, when
a model is used to approximate the full reality
(Linhart and Zucchini 1986). Akaike (1973)formu-
lated the AIC as an estimator of information loss or
gain when a model is fitted to data. The AIC index
(I)isexpressedas:
I¼2L^
θ
þ2K(1)
where Kis the number of parameters, L^
θ
is the
numerical value for the log-likelihood at its maximum
point for the selected model and ^
θis the maximum
likelihood estimator of model parameters. For
a detailed mathematical description, the reader is
referred to Linhart and Zucchini (1986) and Burnham
and Anderson (2002). A heuristic interpretation of
Equation (1) suggests that the first term decreases
with an increase in the second term. This shows
a distinct property of the AIC in finding a trade-off
between bias and variance of an estimator. The AIC is
relative and –since the “truth”is not known –the
relationship between AIC values of respective models
indicates the model of choice, not AIC values per se
(Burnham and Anderson 2002).
An extension of the AIC, denoted AIC
c
, was pro-
posed by Sugiura (1978) to correct for bias due to
a short sample size n, the AIC
c
index (I
c
)is
expressed as:
Ic¼2L^
θ
þ2Kþ2KKþ1ðÞ
nK1(2)
Burnham and Anderson (2002) suggested using AIC
c
when the ratio n/Kis small (e.g. <40), and the
original formulation when the ratio is sufficiently
large. We considered both AIC and AIC
c
in this
study. The AIC
c
was used for the short samples,
which in this application is a sample size of
30 years, and AIC was used for large sample sizes
generated in our numerical experiments, as detailed
in the following sections. In principle, the model
with a minimum AIC (or AIC
c
)valuewasconsid-
ered the most suitable model.
2.1 Model selection
The aim of MS is to identify an optimal model from a set of
possible candidates using a selection criterion (such as the
aforementioned AIC). The MS technique can also be seen
as a special case of model averaging (see Section 2.2 for
details), where a weight of 1 is given to one distribution
function and a weight of 0 is assigned to all other models
considered. The efficiency of selecting the right parent
model using various model selection techniques and their
effect on design flood estimation has been discussed in
detail in the hydrological literature (e.g. Turkman 1985;Di
Baldassarre et al.2009;Laioet al.2009).
2.2 Model averaging (MA and MM)
Both model averaging methods (MA and MM) address
the issue of uncertainty in the choice of probability dis-
tribution functions, by combining all model estimates of
the design flood. Several studies have demonstrated the
use of MA in dealing with model structure uncertainty
(Bodo and Unny 1976, Tung and Mays 1981a,1981b,
Laio et al.2011, Najafi et al.2011, Najafi and Moradkhani
2015, Yan and Moradkhani 2016). Model averaging is
similar to the concept of multiple working hypotheses
(Chamberlain 1965), which is thought to cope better with
the unavoidable bias of using a single model.
The weighted MA technique assigns different weights
to the distribution functions considered for estimation. In
order to compute these weights, models are first ranked
based on their estimated AIC values, followed by the
computation of weights for all the distribution functions.
The distribution with the minimum AIC is assigned the
highest weight. These weights are referred to as Akaike
weights (w
i
) (Burnham and Anderson 2002):
wi¼exp 1
2Δi
PR
r¼1exp 1
2Δr
;i¼1;2;...;R(3)
where Ris the number of models considered and Δ
i
is
called the Akaike difference, which represents the dis-
crepancy between the best model with the minimum
AIC and the ith model, and is expressed as:
HYDROLOGICAL SCIENCES JOURNAL 1915
Δi¼AICimini¼1;::R;i¼1;2;...;R(4)
A zero value for the Akaike difference (i.e.Δi¼0)
points to the best distribution function to be used to
fit the data. The arbitrariness in the use of Akaike
weights is recognized in this work since, in practice,
the “true”design flood is not known and the weighting
only gives information about the adequacy of a model
to fit the observations, not about the accuracy of the
estimated discharge.
Let us consider Rcompeting probability distribution
functions that are denoted f
i
. A posterior predictive dis-
tribution of a quantity of interest φ(e.g. a design flood)
given the vector of observed data Xcan be expressed as:
pφjXðÞ¼
X
R
i
pφjX;fi
ðÞpf
ijXðÞ (5)
where p:jXðÞrepresents the conditional probability dis-
tribution function and pf
ijXðÞis the posterior prob-
ability for a given model. Equation (5) was adapted
from Hoeting et al.(1999) and provides a way of
averaging the posterior distributions of the design
flood under each of the models considered, weighted
by the posterior model probability pf
ijXðÞ. The poster-
ior model probability represents the degree of fit
between a particular distribution function and the
data, and can be assigned by expert judgement (Merz
and Thieken 2005), estimated using Bayesian or Akaike
techniques, with the latter already described earlier as
Akaike weights w
i
.
Uncertainty in the parameters of individual pdfs, and
their effect on the accuracy of the estimated design flood is
not considered in this study. However, the focus was on
evaluating point estimates and not the posterior probability
distribution of the design flood; a simplification of
Equation(5)isrequiredtoimplementMAandisexpressed
as:
^
QT¼X
R
i¼1
wi^
QT;i(6)
where ^
QTis the estimated design flood for a given
return period T. The estimated model weights w
i
are
assigned to candidate models, with the model that
fits the data best having the highest weight.
As for MM, a simple arithmetic average is applied over
the design flood estimates of all models, i.e. all models have
equal weights. Similar to Graefe et al.(2015), we use it as
abenchmarktoassesstheskillofMS.
3 Numerical experiments
3.1 Choice of parent distribution
The Wakeby distribution function was used as the
parent distribution to generate synthetic annual max-
imum flows of different sample sizes. The synthetic
samples were used for the systematic assessment of
the MS, MA and MM techniques. Various distribution
functions (see Table 1) were then used to fit these
synthetic time series and estimate the 100-year flood.
The Wakeby distribution function is a five-parameter
distribution and was defined by Houghton (1977,
1978). The use of the Wakeby distribution first came
about as a result of findings by Matalas et al.(1975),
who showed that many commonly used distribution
functions are not capable of reproducing the instability
observed in sample estimates of skewness derived from
flow records. In other words, the standard deviation of
sample estimates of skewness derived from real-world
flow data is higher than that derived from synthetic
flow data. Matalas et al.(1975) called this behaviour
the “separation-effect”, a contradiction similar to the
Hurst effect. The Wakeby quantile function is
described as follows:
x¼a11FðÞ
b
hi
c11FðÞ
d
hi
þm(7)
where F;FxðÞ¼PXxðÞand x. The density func-
tion f;fxðÞis defined as:
f¼dF
=
dx ab 1FðÞ
b1þcd 1FðÞ
d1
hi
1(8)
The distribution can be thought of in two parts: a
left-hand tail a11FðÞ
b
hi
(small flows) and a right-
hand tail c11FðÞ
d
hi
þm(large flows). The
letters a, b, c, d and mrepresent the distribution parameters,
xis the flood quantile (or design flood) for a given return
Table 1. Probability distribution functions used in this study as operative models.
Probability model Parameters pdf or cdf
Gumbel or EV1 (θ
1
,θ
2
)Fx;θðÞ¼exp exp xθ1
ðÞ=θ2
ðÞ½
Generalized extreme value (GEV) (θ
1
,θ
2
,θ
3
)Fx;θðÞ¼exp 1θ3xθ1
ðÞ=θ2
ðÞ
1=
θ3
Pearson Type III (P3) (θ
1
,θ
2
,θ
3
)fx;θðÞ¼1=θjjΓθ3þ1ðÞðÞ½xθ1
ðÞ=θ2
ðÞ
θ3exp xθ1
ðÞ½=θ2
ðÞ
Lognormal (LN) (θ
1
,θ
2
)fx;θðÞ¼ 1
xffiffiffiffiffiffiffiffi
2πθ2
pexp 1
2
log xθ1
θ1
2
1916 K. OKOLI ET AL.
period T, and Fis the non-exceedence probability, i.e.
F¼11=T.IfF¼0, then x¼mand f¼1=
ab þcdðÞ.Notethatsincef0"x;ab þcdðÞ0, for
F¼1;the values of xand fdepend upon the values of
the parameters of the distribution, the upper bound on
xbeing þ1or mþacðÞ:Not all parameterizations
of the Wakeby distribution are capable of accounting for the
conditions of separation mentioned above. However, in an
extensive Monte Carlo experiment, Landwehr and Wallis
(1978) found that when b>1 and d>0(i.e.longstretched
upper tails) the Wakeby distribution accounts for condi-
tions of separation. The parameter combinations used in
this study (i.e. fixed values) in defining a Wakeby parent are
listed in Table 2 and were taken from Landwehr and
Matalas (1979). A detailed presentation about parameter
limits and valid parameter combinations for the Wakeby
distribution is provided by Landwehr and Wallis (1978).
We chose the Wakeby distribution for the following
reasons: first, we want to simulate the epistemic uncer-
tainty that affects any design flood estimation exercise,
i.e. the understanding that the flood generation pro-
cesses are complex (and not completely known), while
simpler models are commonly used for fitting and
estimation purposes. The Wakeby distribution has
a higher level of complexity in the form of more para-
meters than the other distribution functions commonly
used for estimation purposes. Second, it mimics the
upper tail structures typical of flood distributions,
which are essential to capture in any synthetic data,
i.e. the occasional presence of an outlier (in this case an
extreme flood peak), which is not expected, but prob-
able. Third, its quantile function is expressed explicitly
in terms of the unknown variable, making the genera-
tion of synthetic data straightforward (Hosking and
Wallis 1997).
It should be noted that previous numerical studies on
flood frequency analysis used more common distribu-
tion functions, e.g. lognormal (Matalas et al.1975, Slack
et al.1975, Matalas and Wallis 1978) as the parent
model. However, our choice was based on the need to
simulate the fact that, in the real world, the parent
distribution is unknown and likely more complex than
the simpler distribution functions.
3.2 Choice of probability distribution functions
Four commonly used distribution functions were selected
as operational models (i.e. R= 4) to fit the synthetic flows
and to estimate the 100-year event. The distribution
functions considered are (i) the EV1 (Gumbel) distribu-
tion, (ii) the generalized extreme value (GEV) distribu-
tion, (iii) the generalized gamma or Pearson Type III (P3)
distribution, and (iv) the lognormal (LN) distribution.
Table 1 provides their cumulative distribution functions
(cdf), Fx;θðÞ, and the probability density functions
(pdf), fx;θðÞ; the latter are shown for those distribution
functions whose cdf is not invertible.
3.3 Simulation framework
We set up a Monte Carlo simulation framework con-
sisting of the following steps, in which the procedure is
repeated for each of the Wakeby parent distributions
fully determined by the five sets of parameters reported
in Table 2. We also let the sample size nvary by
assuming values of 30, 50, 100 and 200 years.
(1) One of the Wakeby pdfs, with a fixed set of
parameters (Table 2) is selected as parent dis-
tribution gxðÞ:As parameters are fixed, the
“true”design flood value Q100 is the quantile
corresponding to a return period of 100 years,
which is computed using Equation (7),
with F¼11=100.
(2) The Wakeby cdf described in Equation (7) is used
to generate a sample of synthetic annual maxi-
mum flows Qof fixed length; these values are
considered true discharges. Introducing observa-
tion error, corrupted discharges Qare generated
using the error model for uncorrelated observa-
tion error (Kuczera 1992)asfollows:
Q¼QþβQε(9)
where εdenotes a standard Gaussian random variable
(i.e. zero mean and standard deviation of 1), Qis the true
discharge, and βis a positive valued coefficient denoting
the magnitude of observation error. Values for βof 0.00,
0.15 and 0.30 (i.e. 0%, 15% and 30%) are magnitudes of
observation error considered in this study and taken from
Di Baldassarre et al.(2012). A βvalue of 0% represents
the scenario in which observed discharge equals the true
discharge; thus there is no observation error.
Table 2. Wakeby distribution functions; μ,σ, Cv, γand λ
denote: mean, standard deviation, coefficient of variation,
skewness and kurtosis, respectively.
Parameters Statistical characteristics
Distribution ma b c d μσCv γλ
Wakeby-1 0 1 16.0 4 0.20 1.94 1.34 0.69 4.14 63.74
Wakeby-2 0 1 7.5 5 0.12 1.56 0.90 0.58 2.01 14.08
Wakeby-3 0 1 1.0 5 0.12 1.18 1.03 0.87 1.91 10.73
Wakeby-4 0 1 16.0 10 0.04 1.36 0.51 0.38 1.10 7.69
Wakeby-5 0 1 1.0 10 0.04 0.92 0.70 0.76 1.11 4.73
Source: Landwehr and Matalas (1979)
HYDROLOGICAL SCIENCES JOURNAL 1917
(3) Using the corrupted discharges Q,thepara-
meters of the four pdfs (Table 1) are esti-
mated using the method of maximum
likelihood. For the P3 and GEV distributions,
maximum likelihood estimators are either not
available or asymptotically efficient in a few
non-regular cases. Due to this, Smith’sesti-
mators (Smith 1985) were used instead of
maximum likelihood estimators.
(4) The four pdfs are used to estimate the design
flood as the quantile corresponding to a return
period of 100 years.
(5) AIC is applied for both MS and MA:
(i) MS: AIC or AIC
c
(depending on the sample
size generated, see Section 2) is applied by
using Equation (1) or (2), respectively, for
the four distribution functions and the opti-
mal distribution is used to estimate the
design flood as the flood quantile corre-
sponding to a return period of Tyears.
(ii) MA: Using Equation (4), the Akaike differences
Δ
i
(i=1, 2, 3, …,R) are evaluated and used for
the computation of model weights using
Equation (3); the estimated design floods for
each of the candidate distribution functions are
combined by applying Equation (6).
(6) The arithmetic average (MM) of design floods
estimated using the candidate distribution
functions (Step 4) is implemented.
(7) A percentage relative error is computed in order
to compare the true design flood (derived in Step
1) with the design floods estimated by: each of
the four candidate models (as in Step 4), model
selection (MS, Step 5(a)), weighted model aver-
aging (MA, Step 5(b)), and arithmetic model
averaging (MM, Step 6). Thus, we obtained
seven relative error estimates (four candidate
distribution functions, MS, MA, and MM).
Steps 2–7 are repeated 1000 times, generating
1000 synthetic flow samples from a given parent
Wakeby distribution and of a fixed sample size.
A generated sample size of 30 and 50 years reflects
the typical length of historical observations, while
samples of length 100 and 200 years represent an
optimistic case in hydrology.
4 Results
Boxplotsareusedtosupportthecomparison
between the different techniques (four candidate
distribution functions, MS, MA and MM). Figures
1to Figures 5 show the results of the numerical
experiments and summarize the performance of
MA, MM and MS, and also the candidate distribu-
tions, in estimating the 100-year flood, for different
statistical characteristics of the underlying parent
distribution, different record lengths and levels of
observation uncertainty.
Figure 1 shows box plots of percentage relative
estimation errors when Wakeby-1 is used as the parent
distribution. Observation errors for a given sample size
increase from the left to the right panels, while the
sample size for a given observation error increases
from the top to the bottom panels. In general,
a tendency towards underestimation is observed for
all techniques, namely MS, MA and MM, and the
individual distribution functions when the parent is
highly skewed, as shown in Figures 1 and 2, respec-
tively. For instance, considering Wakeby-1 as the par-
ent model, an error magnitude of 15% and a sample
size of 50 years, on average, MA underestimates the
true design flood by 19.6%, while MS and MM give an
equal underestimation of 22.3%. Major deviations
across all techniques and distribution functions appear
to be reasonable, as the underlying population was
based on a complex parent distribution with five para-
meters, while the fitting is conducted using distribution
functions with only two or three parameters.
Figures 3–5show the boxplots obtained by using
Wakeby-3 to Wakeby-5 as parent distributions, and
in that order refer to the reduction in skewness of
the parent distributions (see Table 2 for details
about the value of skewness for each Wakeby par-
ent). These diagrams show that, in general, all three
techniques (MS, MA and MM) tend towards over-
estimation. For instance, considering Wakeby-3 as
the parent model, with an error magnitude of 15%
and sample size of 50 years, on average, MS, MA
and MM overestimate the true value by 1.3, 3.6 and
6.88%, respectively. Looking at the panels of
Figures 3–5from left to right, this overestimation
is influenced by increasing observation errors. This
is due to the fact that these errors tend to increase
the variance of the sample (see Equation (7)),
which in turn leads to increased variance of the
design flood estimates (Di Baldassarre et al.2012).
Another set of box plots was produced to help
understand the influence of Akaike weights used in
MA on the overall accuracy and variance of design
flood estimates. For example, if one considers the
centre panel of the first row in Figure 6 (i.e. the
case of β=15% and sample size 30), the interpre-
tation is as follows: on average, the best model
1918 K. OKOLI ET AL.
Figure 1. Box plots of percentage relative error for MS, MA, MM and all candidate models, with Wakeby-1 as parent model. The red
line represents the median (50th percentile) and the lower and upper ends of the blue box represent the 25th and 75th percentiles,
respectively. Outliers are represented by red crosses.
Figure 2. Box plots of percentage relative error for MS, MA, MM and all candidate models, with Wakeby-2 as parent model. Symbols
as in Figure 1.
HYDROLOGICAL SCIENCES JOURNAL 1919
Figure 4. Box plots of percentage relative error for MS, MA, MM and all candidate models, with Wakeby-4 as parent model. Symbols
as in Figure 1.
Figure 3. Box plots of percentage relative error for MS, MA, MM and all candidate models, with Wakeby-3 as parent model. Symbols
as in Figure 1.
1920 K. OKOLI ET AL.
Figure 5. Box plots of percentage relative error for MS, MA, MM and all candidate models, with Wakeby-5 as parent model. Symbols
as in Figure 1.
Figure 6. Box plots of Akaike weights for all candidate distribution functions with Wakeby-1 as parent model. Symbols as in Figure 1.
HYDROLOGICAL SCIENCES JOURNAL 1921
among the candidates is P3, and it accounts for
approximately 50% of the weighted average; LN
and GEV account for 20 and 18% respectively;
while EV1 accounts for 12% of the weighted aver-
age. Thus, P3 is clearly the best distribution func-
tion in terms of Akaike weights when the parent is
Wakeby-1. Figure 1 shows that –for the same 15%
error –P3 has a highly biased estimate (with less
variance) compared to the GEV, which has less bias
but increased variance. The selection of P3 as the
best distribution that fits the data in Figure 6 is
fairly consistent, as the sample size increases from
30 to 100 for an error of 15%. Note that this
behaviour changes when the sample size increases
to 200, with GEV as the best distribution function
followed by P3. If Wakeby-2 is selected as the
parent, Figure 7 suggests that LN is the distribution
function that fits the data best for almost all sample
sizes and error magnitudes considered. Comparing
Figures 2 and 7,itisobservedthatallmodelshave
almost the same accuracy and variance, except for
EV1, which is slightly more biased. In summary, it
is observed that, on the one hand, the performance
between P3 and GEV (comparing Figs. 1 and 6)
presents a scenario where a distribution function
with the highest weight has less accuracy but small
variance, and, on the other, comparing LN with all
other distribution functions (Figs. 2 and 7)presents
a scenario where distribution functions with
a smaller weight have almost equal variance, with
the distribution function having the highest weight.
That is, having a higher (or lower) Akaike weight
for a given distribution function does not necessa-
rily translate to better (or worse) estimates of the
design flood. The reason is that the Akaike weight,
or any other model selection criterion, refers only
tohowwellthemodelfitsthedataandnottohow
good the estimation is. Box plots of Akaike weights
for Wakeby-3 to Wakeby-5 can be found in the
Supplementary material.
Tables 3 and 4show the root mean square error
(RMSE) and average percentage relative error (RE
%), respectively, for the three methods and the four
candidate distribution functions. Table 3 shows that
for Wakeby-1, which has a true design flood of
7.05 m
3
/s, MA has a slightly better accuracy
(±1.71 m
3
/s) when compared to MS (±1.87 m
3
/s)
and MM (±1.77 m
3
/s).Thesamepatternwas
observed (Table 4) for the three techniques in
terms of RE%. As skewness reduces, i.e. from
Wakeby-2 to Wakeby-5, we see that the three tech-
niques have similar performance in terms of RMSE
Figure 7. Box plots of Akaike weights for all candidate distribution functions with Wakeby-2 as parent model. Symbols as in Figure 1.
1922 K. OKOLI ET AL.
and average RE%. For instance, for Wakeby-3, with
a true design flood of 4.68 m
3
/s, MS has an accu-
racy of ±0.68 m
3
/s when compared to MA
and MM, with an RMSE of ±0.77 and ±1.04 m
3
/s,
respectively. However, all three models (MS, MA
and MM) have an average RE% of 2.8%. Table 4
showsthat,overall,AICtechniquesalwayshave
a smaller average RE%, except for the case of
Wakeby-1, where GEV has the lowest value. This
may be seen as a positive outcome of model selec-
tion methods in selecting distribution functions for
estimation purposes.
5 Discussion and conclusions
When it comes to flood frequency analysis, the true
distribution of floods (which includes the true
design flood corresponding to a given return per-
iod) is not known apriori.Therefore,thetaskfor
model selection –leading to a single best distribu-
tion function –and model averaging methods is
driven towards better estimation, rather than the
search for the true distribution that generated the
data.
In this study, the MM approach assigns equal
weights to all candidate distribution functions with-
out taking into account how well these distribution
functions fit the data. The MA approach is differ-
ent from MM in the sense that the former takes
into account individual performance of all distribu-
tion functions in fitting the data. The MA approach
assigns higher weights to distribution functions that
give better fits to the data.
There are certainly situations in which the dis-
tribution functions are all similar, i.e. having
almost the same AIC values and Akaike weights,
which will lead to similar estimates as MM. It
seems this behaviour, where candidate distribution
functions have similar AIC values, may be the
norm rather than the exception, as seen in studies
by Mutua (1994) and Strupczewski et al.(2001).
This might be the reason why, in our study, MA
hasaboutthesamelevelofperformanceastheMM
approach. However, MA can only surpass MM in
terms of accuracy of estimates if one or more dis-
tributions have sufficient weights, and their esti-
mates are close to the true value of the design
flood.
The MM approach is usually neglected as a sort
of outcast because of its obvious simplicity when
compared to Bayesian and Akaike approaches for
modelaveraging.However,studiesinsocial
sciences have demonstrated that MM can perform
well compared to ensemble Bayesian model aver-
aging (e.g. Graefe et al.2015), and similar conclu-
sions were drawn from studies focusing on
operational and financial forecasts (Clark and
McCracken 2010, Graefe et al.2014). By assigning
equal weights to all candidate distribution functions
when implementing the MM method, one ignores
the relative adequacy of fit of individual distribu-
tions, thereby deliberately introducing bias by tak-
ing into account distribution functions with
inadequate fit. The MA approach, however, tends
to assign higher weights to models with large
degrees of freedom, even though the AIC is for-
mulated to take into account overfitting. The effect
of overfitting due to the MA approach may lead to
improved accuracy, but at the expense of increased
variance of the estimated design floods. However,
introducing bias by implementing the MM
approach may lead to less overfitting but reduced
variance.
The trade-off between accuracy and variance
observed for some candidate distribution functions
mightbethereasonforthesimilarperformance
between MA and MM. To illustrate that trade-off,
let us consider the top right corner of Figure 1:the
GEV model provides good accuracy, but high var-
iance when compared to the two averaging
approaches MA and MM. The same figure shows
that LN has less variance but also less accuracy
when compared to GEV. However, a comparison
of the distribution of Akaike weights (see Fig. 6,
Table 3. Root mean square error (RMSE) for all techniques and
distribution functions for a sample size of 50 and a magnitude
error of 15%.
Distribution True 1 in 100
(m
3
/s)
RMSE
MS MA MM LN GEV P3 EV1
Wakeby-1 7.05 1.87 1.71 1.77 2.05 2.19 1.88 2.50
Wakeby-2 4.69 0.79 0.81 0.82 0.78 0.90 0.87 0.98
Wakeby-3 4.68 0.68 0.77 1.04 1.69 2.78 0.62 0.99
Wakeby-4 3.02 0.37 0.36 0.36 0.41 0.46 0.38 0.35
Wakeby-5 3.01 0.59 0.67 0.83 1.50 1.73 0.59 0.31
Table 4. Average percentage relative error (RE%) for all tech-
niques and distribution functions for a sample size of 50 and
a magnitude error of 15%.
Distribution True 1
in-100
(m
3
/s)
Average RE (%)
MS MA MM LN GEV P3 EV1
Wakeby-1 7.05 5.74 5.83 5.91 –26.95 2.72 –24.42 –34.63
Wakeby-2 4.69 4.09 4.06 4.06 –12.72 –8.19 –14.96 –19.39
Wakeby-3 4.68 4.70 4.92 5.16 29.36 32.09 –0.75 –19.09
Wakeby-4 3.02 2.82 2.82 2.82 –10.80 –1.88 –7.52 –9.42
Wakeby-5 3.01 3.39 3.48 3.58 44.45 24.67 12.47 –4.31
HYDROLOGICAL SCIENCES JOURNAL 1923
top right corner) shows that GEV has less weight
compared to LN. This shows that there is no clear
relationship between the calculated weights and the
accuracy and variability of the estimates, and there-
fore demands that one must give some thought to
the estimation problem before making up a list of
distribution functions suitable for reliable design
flood estimates. Also, one can speculate that the
similar performance provided by the MA and MM
approaches relates to the fact that none of the
candidate distributions deviate too much from the
parent distribution.
For water management, selecting a distribution
function with a high variance of the estimated
design flood will complicate the design of an infra-
structure, i.e. there is potential for substantial over-
design if the upper limit of the confidence interval
is considered. An unbiased estimate in flood esti-
mation is desirable, but, given the numerous
sources of uncertainty, engineers and planners
usually do not mind sacrificing accuracy in
exchange for reduced variability in estimations
(Slack et al.1975). However, this ethos of prefer-
ring a distribution function with reduced variability
can be problematic, since the true design flood is
not known in advance; it may lead to an increased
risk of over- or under-design of water-related infra-
structure if the true design flood is outside the
calculated confidence intervals. Furthermore, the
decision on the design flood for a given infrastruc-
ture does not only depend on estimates based on
distribution functions, but also on risk perception
and economic feasibility.
Our study likewise shows that, when facing short
sample sizes (30–50 years), which are common in
hydrology and water resources engineering applica-
tions, model averaging (MA and MM) and model
selection (MS) lead to better results than arbitrarily
selecting a single distribution function. Moreover,
for very large sample sizes (100–200 years), which
are rare in real-world applications, our study shows
that MS, MA and MM have similar variance also
when observation uncertainty is introduced. This is
related to the fact that thesamplesizesarelarge
enough for a better estimation of parameters (even
for highly parameterized distribution functions
such as GEV), but may not lead to reduced var-
iance due to over-fitting.
It is important to note that our work is focused
purely on the estimation of design floods using
statistical techniques. Several limitations, such as
the distribution functions considered, have una-
voidably influenced our results. Future studies on
design flood estimation couldbeextendedtocon-
sider the physical processes behind flood
generation.
Acknowledgements
This research was carried out within the CNDS (Centre of
Natural Hazards and Disaster Science) research school, www.
cnds.se. We thank Francesco Laio, two anonymous reviewers
and the editor for providing critical comments that helped to
improve an earlier version of this paper.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Kenechukwu Okoli http://orcid.org/0000-0002-5880-607X
References
Akaike, H., 1973. Information theory and an extension of the
maximum likelihood principle. In: B.N. Petrov and F.
Csáki, eds. 2nd International Symposium on Information
Theory, Tsahkadsor, Armenia, USSR, September 2–8,
1971, Budapest: Akadémiai Kiadó, 267–281.
Beven, K., 1993. Reality and uncertainty in distributed
hydrological modelling. Advances in Water Resources, 16,
41–51.
Beven, K., 2006. A manifesto for the equifinality thesis.
Journal of Hydrology, 320 (1–2), 18–36. doi:10.1016/j.
jhydrol.2005.07.007
Beven, K., 2012.Rainfall - runoff modelling the primer. 2nd
ed. West Sussex: John Wiley & Sons.
Beven, K. and Binley, A., 1992. The future of distributed
models: model calibration and uncertainty prediction.
Hydrological Processes, 6, 279–298.
Bodo, B. and Unny, T.E., 1976. Model uncertainty in flood
frequency analysis and frequency-based design. Water
Resources Research, 12 (6), 1109–1117.
Brandimarte, L. and Di Baldassarre, G., 2012. Uncertainty in
design flood profiles derived by hydraulic modelling.
Hydrology Research, 43 (6), 753. doi:10.2166/nh.2011.086
Breinl, K., 2016. Driving a lumped hydrological model with
precipitation output from weather generators of different
complexity. Hydrological Sciences Journal, 61 (8),
1395–1414.
Burnham, K.P. and Anderson, D.R., 2002.Model selection
and multimodel inference. 2nd ed. New York: Springer.
Chamberlain, T.C., 1965. The method of multiple working
hypotheses [reprint of 1890 science article]. Science, 148,
754–759.
Clark, T.E., and McCracken, M.W., 2010. Averaging forcasts
from VARs with uncertain instabilities. Journal of Applied
Econometrics, 25 (1), 5-29. doi:10.1002/jae.1127
Di Baldassarre, G., Laio, F., and Montanari, A., 2009. Design
flood estimation using model selection criteria. Physics
and Chemistry of the Earth, 34 (10–12), 606–611, Parts
A/B/C. doi:10.1016/j.pce.2008.10.066
1924 K. OKOLI ET AL.
Di Baldassarre, G., Laio, F., and Montanari, A., 2012. Effect
of observation errors on the uncertainty of design floods.
Physics and Chemistry of the Earth,42–44, 85–90.
Di Baldassarre, G. and Montanari, A., 2009. Uncertainty in
river discharge observations: a quantitative analysis.
Hydrology and Earth System Sciences Discussions, 6 (1),
39–61. doi:10.5194/hessd-6-39-2009
Foglia, L., et al., 2013. Evaluating model structure adequacy:
the case of the Maggia Valley groundwater system, south-
ern Switzerland. Water Resources Research, 49 (1),
260–282. doi:10.1029/2011WR011779
Graefe, A., et al., 2014. Combining forecasts: an application
to elections. International Journal of Forecasting, 30 (1),
43–54. doi:10.1016/j.ijforecast.2013.02.005
Graefe, A., et al., 2015. Limitations of ensemble Bayesian
model averaging for forecasting social science problems.
International Journal of Forecasting, 31 (3), 943–951.
doi:10.1016/j.ijforecast.2014.12.001
Hoeting, J.A., et al., 1999. Bayesian model averaging: a tutorial.
Statistical Science,14(4),382–417. doi:10.2307/2676803
Hosking, J.R.M. and Wallis, J.R., 1997.Regional frequency
analysis: an approach based on L-moments. Cambridge,
UK: Cambridge University Press. doi:10.1017/
CBO9780511529443
Houghton, J.C. 1977.Robust estimation of the frequency of
extreme events in a flood frequency context. Ph.D disserta-
tion. Harvard University, Cambridge, MA.
Houghton, J.C., 1978. Birth of a parent: the Wakeby distri-
bution for modeling flood flows. Water Resources
Research, 14, 6. doi:10.1029/WR015i005p01288
Klemeš, V., 1986. Dilettantism in hydrology: transition or
destiny? Water Resources Research, 22 (9S), 177S–188S.
doi:10.1029/WR022i09Sp0177S
Klemeš, V., 1993.Probability of extreme hydrometeorologi-
cal events –a different approach. In:Proceedings of the
Yokohama Symposium, Extreme Hydrological Events:
Precipitation, Floods and Droughts, Vol. 213, Yokohama,
Japan, IAHS Publ. Wallingford, UK: IAHS Press, Centre
for Ecology and Hydrology, 167–176.
Koutsoyiannis, D., 2004. Statistics of extremes and estimation
of extreme rainfall: I. Theoretical investigation.
Hydrological Sciences Journal, 49, 4.
Kuczera, G., 1992. Uncorrelated measurement error in flood
frequency inference. Water Resources Research, 28 (1),
183–188.
Kuczera, G., 1996. Correlated rating curve error in flood
frequency inference. Water Resources Research, 32 (7),
2119–2127.
Laio, F., et al., 2011. Spatially smooth regional estimation of the
flood frequency curve (with uncertainty). Journal of Hydrology,
408 (1–2), 67–77. doi:10.1016/j.jhydrol.2011.07.022
Laio, F., Di Baldassarre, G., and Montanari, A., 2009. Model
selection techniques for the frequency analysis of hydro-
logical extremes. Water Resources Research, 45 (7),
W07416. doi:10.1029/2007WR006666
Landwehr, J.M. and Matalas, N.C., 1979. Estimation of para-
meters and quantiles of Wakeby distributions 1. Known
lower bounds. Water Resources Research, 15 (6), 1361–1372.
Landwehr, J.M. and Wallis, J.R., 1978. Some comparisons of
flood statistics in real and log space. Water Resources
Research, 14 (5), 902–920.
Linhart, H. and Zucchini, W., 1986.Model selection.
Hoboken, NJ: John Wiley.
Luke, A., et al., 2017. Predicting nonstationary flood frequen-
cies: evidence supports an updated stationarity thesis in the
United States. Water Resource Research,53(7),5469–5494.
Matalas, N.C., Slack, J.R., and Wallis, J.R., 1975. Regional skew
in search of a parent. Water Resources Research, 11 (6),
815–826.
Matalas, N.C. and Wallis, J.R., 1978. Some comparisons of
flood statistics in real and log space. Water Resources
Research, 14 (5), 902–920.
Merz, B. and Thieken, A.H., 2005. Separating natural and
epistemic uncertainty in flood frequency analysis. Journal
of Hydrology, 309 (1–4), 114–132. doi:10.1016/j.
jhydrol.2004.11.015
Milly, P.C.D., et al., 2008. Climate change - stationarity is
dead: whither water management? Science, 319 (5863),
573–574.
Montanari, A. and Koutsoyiannis, D., 2014. Modeling and
mitigating natural hazards: stationarity is immortal! Water
Resources Research, 50 (12), 9748–9756.
Moretti, G. and Montanari, A., 2008. Inferring the flood
frequency distribution for an ungauged basin using
a spatially distributed rainfall–runoff model. Hydrology
and Earth System Sciences Discussions, 5 (1), 1–26.
doi:10.5194/hessd-5-1-2008
Mutua, F.M., 1994. The use of the Akaike Information
Criterion in the identification of an optimum flood fre-
quency model. Hydrological Sciences Journal, 39 (3),
235–244. doi:10.1080/02626669409492740
Najafi, M.R. and Moradkhani, H., 2015. Multi-model ensem-
ble analysis of runoff extremes for climate change impact
assesment. Journal of Hydrology, 525, 352–361.
Najafi, M.R., Moradkhani, H., and Jung, I.W., 2011.
Assessing the uncertainties of hydrologic model selection
in climate change impact studies. Hydrological Processes,
25. doi:10.10002/hyp.8043
Potter, K.W. and Walker, J.F., 1985. An empirical study of
flood measurement error. Water Resources Research, 21 (3),
403–406. doi:10.1029/WR021i003p00403
Schöniger, A., et al., 2014. Model selection on solid ground:
rigorous comparison of nine ways to evaluate Bayesian
model evidence. Water Resources Research, 50,
5342–5350. doi:10.1002/2012WR013085
Serinaldi, F. and Kilsby, C.G., 2015. Stationarity is undead:
uncertainty dominates the distribution of extremes.
Advances in Water Resources, 77, 17–36.
Slack, J.R., Wallis, J.R., and Matalas, N.C., 1975. On the value
of information to flood frequency analysis. Water
Resources Research, 11 (5), 629–647. doi:10.1029/
WR011i005p00629
Smith, R.L., 1985. Maximum likelihood estimation in a class
of non-regular cases. Biometrika, 72 (1), 67–90.
Sonuga, J.O., 1972. Principal of maximum entropy in hydro-
logic frequency analysis. Journal of Hydrology, 17, 177–191.
Strupczewski,W.G.,Singh,V.P.,andFeluch,W.,2001.
Non-stationary approach to at-site flood frequency
modelling I. Maximum likelihood estimation. Journal
of Hydrology, 248, 123–142.
Sugiura, N., 1978. Further analysts of the data by akaike’s
information criterion and the finite corrections: further
HYDROLOGICAL SCIENCES JOURNAL 1925
analysts of the data by akaike’s. Communications in
Statistics - Theory and Methods, 7 (1), 13–26.
Tsai, F.T.-C. and Li, X., 2008. Water resources research.
Inverse Groundwater Modeling for Hydraulic Conductivity
Estimation Using Bayesian Model Averaging and Variance
Window, 44 (9), n/a-n/a. doi:10.1029/2007WR006576
Tung, Y. and Mays, L.W., 1981a. Optimal risk-based design of
flood levee systems. Water Resources Research, 17 (4), 843–852.
Tung, Y.K. and Mays, L.W., 1981b. Risk models for flood
levee design. Water Resources Research, 17 (4), 833–841.
doi:10.1029/WR017i004p00833
Turkman, R.F., 1985. The choice of extremal models by Akaike’s
information criterion. Journal of Hydrology, 82, 307–315.
US Water Resources Council, 1982.Guidelines for determining
flood flow frequency: bulletin 17B, hydrology subcommittee,
office of water data coordination,USgeologicalsurvey,Reston
Virginia. Washington, DC: U.S. Government Printing Office.
Viglione, A., et al., 2013. Flood frequency hydrology: 3.
A Bayesian analysis. Water Resources Research, 49 (2),
675–692. doi:10.1029/2011WR010782
Volpi, E., et al., 2017. Sworn testimony of the model evidence:
GaussianMixtureImportance(GAME)sampling.Water
Resources Research, 53 (7), 6133–6158. doi:10.1002/
2016WR020167
Westerberg,I.K.andMcMillan,H.K.,2015.Uncertainty
in hydrological signatures. Hydrology and Earth System
Sciences, 3951–3968. doi:10.5194/hess-19-3951-2015
Yan, H. and Moradkhani, H., 2016.Towardsmorerobust
extreme flood prediction by Bayesian hierarchical and
multimodeling. Natural Hazards,81,203–225. doi:10.1007/
s11069-015-2070-6
Ye, M., et al., 2010. A model-averaging method for
assessing groundwater conceptual model uncertainty.
Ground Water, 48 (5), 716–728. doi:10.1111/j.1745-
6584.2009.00633.x
1926 K. OKOLI ET AL.