Content uploaded by Douglas Maraun
Author content
All content in this area was uploaded by Douglas Maraun on Nov 03, 2017
Content may be subject to copyright.
INTERNATIONAL JOURNAL OF CLIMATOLOGY
Int. J. Climatol. (2017)
Published online in Wiley Online Library
(wileyonlinelibrary.com) DOI: 10.1002/joc.5222
The VALUE perfect predictor experiment: evaluation of
temporal variability
Douglas Maraun,a*Radan Huth,b,c José M. Gutiérrez,dDaniel San Martín,e
Martin Dubrovsky,cAndreas Fischer,fElke Hertig,gPedro M. M. Soares,hJudit Bartholy,i
Rita Pongrácz,iMartin Widmann,jMaria J. Casado,kPetra Ramosland Joaquín Bediae
aWegener Center for Climate and Global Change, University of Graz, Austria
bDepartment of Physical Geography and Geoecology, Faculty of Science, Charles University, Prague, Czech Republic
cInstitute of Atmospheric Physics Czech Academy of Sciences, Prague, Czech Republic
dInstitute of Physics of Cantabria (IFCA), University of Cantabria, Santander, Spain
ePredictia Intelligent Data Solutions SL, Santander, Spain
fFederal Ofce of Meteorology and Climatology MeteoSwiss, Zurich, Switzerland
gInstitute of Geography, Augsburg University, Germany
hInstituto Dom Luiz, Faculdade de Ciencias, Universidade de Lisboa, Portugal
iDepartment of Meteorology, Eotvos Lorand University, Hungary
jSchool of Geography, Earth and Environmental Sciences, University of Birmingham, UK
kAgencia Estatal de Meteorologia (AEMET), Madrid, Spain
lDelegacion Territorial de AEMET en Andalucía, Ceuta y Melilla, Sevilla, Spain
ABSTRACT: Temporal variability is an important feature of climate, comprising systematic variations such as the annual
cycle, as well as residual temporal variations such as short-term variations, spells and variability from interannual to long-term
trends. The EU-COST Action VALUE developed a comprehensive framework to evaluate downscaling methods. Here we
present the evaluation of the perfect predictor experiment for temporal variability. Overall, the behaviour of the different
approaches turned out to be as expected from their structure and implementation. The chosen regional climate model adds
value to reanalysis data for most considered aspects, for all seasons and for both temperature and precipitation. Bias correction
methods do not directly modify temporal variability apart from the annual cycle. However, wet day corrections substantially
improve transition probabilities and spell length distributions, whereas interannual variability is in some cases deteriorated
by quantile mapping. The performance of perfect prognosis (PP) statistical downscaling methods varies strongly from aspect
to aspect and method to method, and depends strongly on the predictor choice. Unconditional weather generators tend to
perform well for the aspects they have been calibrated for, but underrepresent long spells and interannual variability. Long-term
temperature trends of the driving model are essentially unchanged by bias correction methods. If precipitation trends are not
well simulated by the driving model, bias correction further deteriorates these trends. The performance of PP methods to
simulate trends depends strongly on the chosen predictors.
KEY WORDS regional climate; downscaling; evaluation; validation; temporal variability; spells; interannual variability;
long-term trends;
Received 28 September 2016; Revised 23 June 2017; Accepted 3 July 2017
1. Introduction
Downscaling is a common – often necessary – step in
assessing regional climate change and its impacts: the
resolution of global coupled atmosphere–ocean general
circulation models (GCMs) is typically too coarse to
represent many regional- or local-scale climate phenom-
ena. Therefore the output of GCMs is downscaled to
provide high-resolution simulations over a limited target
area. The EU Cooperation in Science and Technology
(COST) Action ES1102 VALUE was established to
* Correspondence to: D.Maraun, Wegener Center for Climate and Global
Change, University of Graz, Brandhofgasse 5, 8010 Graz, Austria.
E-mail: douglas.maraun@uni-graz.at
comprehensively evaluate different downscaling meth-
ods (Maraun et al., 2015). Three experiments have been
dened: a so-called perfect predictor experiment to isolate
downscaling skill in present climate; a GCM predic-
tor experiment to evaluate the overall skill to simulate
present-day regional climate; and a pseudo reality exper-
iment to evaluate the skill of downscaling methods to
represent future climates.
In a community effort, researchers from 16 European
institutions participated in the perfect predictor experi-
ment, and more than 50 different statistical downscaling
methods have been evaluated at 86 stations across Europe.
The evaluation comprises the representation of marginal
aspects (such as the mean or variance; J. M. Gutiérrez
et al., 2017; personal communication), temporal aspects
© 2017 Royal Meteorological Society
D. MARAUN et al.
(such as spell length distributions; this contribution), spa-
tial aspects (such as spatial decorrelation lengths; M. Wid-
mann et al., 2017; personal communication), and multi-
variable aspects (such as the relationship between temper-
ature and precipitation). Extreme events as well as an eval-
uation conditional on relevant synoptic and regional phe-
nomena have been, owing to their importance, considered
separately by E. Hertig et al. (2017; personal communica-
tion) and P. Soares et al. (2017; personal communication).
Here we present the evaluation of temporal aspects.
To illustrate different aspects of temporal variability,
Figure 1 shows a selected year of precipitation at the
participating rain gauge in Graz, Austria. On 18 July
(orange spike), several districts were ooded. The city’s
streams burst their banks following the heavy rainfalls
prior to the event, but a major contributor was the long wet
spell in the end of June (red shading). Southeast of Graz,
the overall event caused several thousand landslides. Total
rainfall in June exceeded the climatological mean by more
than 60%. Also annual rainfall was about 47% higher than
normal (Klein Tank et al., 2002), indicating substantial
interannual variability. A pronounced seasonality of all
aspects of precipitation is directly apparent. In late winter
and early spring, precipitation amounts are low compared
to summer. Also the probability of consecutive wet days
is low resulting in long dry spells (grey shading). Most
dry–wet and wet–wet transitions occur in late spring and
early summer, the highest rainfall amounts are observed
in late summer (blue shading).
In general, temporal variability involves a wide range
of time scales, from the diurnal cycle through day-to-day
variations, spells (dry, wet, warm, cold, etc.), and interan-
nual variations to long-term trends. The variability can be
broadly separated into systematic variations – the diurnal
and annual cycle as well as forced long-term trends – and
residual temporal variations, whose characteristics are
determined by the large-scale driving processes and by
local memory. For instance, temporal dependence in
precipitation may stem directly from memory caused by
soil-moisture feedbacks, or indirectly from the duration of
passing cyclones and anti-cyclones. Temporal aspects of
local climate are often essential for impact studies in var-
ious sectors such as water (e.g. preconditions of ooding,
Froidevaux et al., 2015; dry spells, Stoll et al. (2011)),
agriculture (e.g. dry spells, Calanca, 2007; seasonality,
Rosenzweig et al. (2001)), health (Semenza et al., 1996,
e.g. heat waves) and energy (Rosenzweig et al., 2011, e.g.
seasonality).
In VALUE we evaluate the performance of different
downscaling methods to represent temporal variability.
Apart from dynamical downscaling with regional climate
models (RCMs, Rummukainen, 2010), different statistical
approaches exist (Fowler et al., 2007; Maraun et al., 2010;
Wilks, 2010; Maraun, 2016): perfect prognosis (PP) sta-
tistical downscaling methods, which are calibrated purely
on observations and typically take their predictors from
large-scale elds of the free atmosphere; model output
statistics (MOS) methods, which are calibrated between
model data and observations (in climate science, these
are typically bias correction methods); and unconditional
weather generators (WGs), which are calibrated on local
data and do not include any meteorological predictors.
The basic driver of the residual, regional-scale tem-
poral variability is the propagation of planetary and
synoptic waves, which is essentially prescribed by
GCMs. This continental-scale variability is modulated
by regional-scale dynamical processes, inuences of
the orography, and feedback mechanisms such as
soil-moisture-temperature, soil-moisture-precipitation
feedbacks and snow-albedo feedbacks (Schär et al., 1999;
Seneviratne et al., 2006; Fischer et al., 2007; Hall et al.,
2008). As a result, regional-scale temporal variability
simulated by RCMs may diverge from the prescribed
large-scale variability (Alexandru et al., 2007). Local
temporal variability is often – in particular for precipi-
tation and wind – not fully determined by larger-scale
variability, but exhibits additional – essentially ran-
dom – uctuations. PP statistical downscaling inherits the
variability of the large-scale predictors and typically does
not add any local short-term variations. Some methods,
however, explicitly model local variability by random-
ization (von Storch, 1999; Chandler and Wheater, 2002;
Volosciuk et al., 2017). Such stochastic models might
simply generate white noise, but may also include weather
generators (see below) to model short-term temporal
dependence by Markov-chain-type components (Maraun
et al., 2010). Also bias correction typically does not
explicitly add local temporal variability to the driving
model, but only subtly modulates temporal variability
via its effect on the marginal distribution. For instance
wet day frequencies are adjusted, which indirectly affects
the representation of spells (Rajczak et al., 2016). Some
bias correction methods also attempt to explicitly adjust
the temporal structure (e.g. Vrac and Friederichs, 2015;
Cannon, 2016) but at the cost of destroying the tem-
poral consistency with the driving dynamical model.
Unconditional weather generators (i.e. weather generators
that do not use meteorological predictors) do not pro-
vide sequences which are synchronized with the driving
models. Instead, the only temporal structure they repre-
sent is explicitly modelled, typically by Markov chains
(Maraun et al., 2010). Most statistical models – PP and
MOS – have an explicit description of the annual cycle,
e.g. by being calibrated to each calendar day, month or
season individually, or (in case of PP) by including the
day-of-the year as predictor.
Of the temporal aspects studied in this paper, perhaps the
annual cycle has been the most frequent target of valida-
tion: many RCM studies as well as studies of both kinds of
statistical downscaling (PP and MOS) and of WGs include
a validation of the annual cycle, although it usually is
not their main topic (e.g. Frei et al., 2003; Moberg and
Jones, 2004; Kilsby et al., 2007; Schindler et al., 2007;
Turco et al., 2011; Soares et al., 2012; Kalognomou et al.,
2013; Martynov et al., 2013; Warrach-Sagi et al., 2013;
Keller et al., 2015; Favre et al., 2016). Also studies evalu-
ating precipitation (dry/wet) spells and precipitation transi-
tion probabilities (wet/wet, dry/wet) as well as interannual
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
date of 2009
precipitation [mm/day]
01.01 01.02 01.03 01.04 01.05 01.06 01.07 01.08 01.09 01.10 01.11 01.12
0
10
20
30
40
50
60
Figure 1. Daily precipitation totals in Graz, 2009. Shading: see text. [Colour gure can be viewed at wileyonlinelibrary.com].
variability have been relatively numerous (e.g. Semenov
et al., 1998; Charles et al., 1999; Giorgi et al., 2004; Jacob
et al., 2007; Kilsby et al., 2007; Schmidli et al., 2007;
Frost et al., 2011; Turco et al., 2011; Bürger et al., 2012;
Hu et al., 2013; Gutmann et al., 2014; Keller et al., 2015;
Rajczak et al., 2016). Much less attention has, on the other
hand, been paid to validation of temperature spells and
day-to-day temperature changes; only a few studies have
been published that focus on these characteristics (Huth
et al., 2001; Bürger et al., 2012; Vautard et al., 2013; Huth
et al., 2015; Lhotka and Kyselý, 2015).
The vast majority of validation studies also addressing
temporal issues focused on a single downscaling approach
or, at best, provide a comparison for models from one fam-
ily such as Kotlarski et al. (2014); Gutmann et al. (2014).
Exceptions are Wilby et al. (1998), who where the rst
to systematically evaluate temporal aspects in PP meth-
ods and unconditional weather generators; the STARDEX
project, which assessed temporal aspects of extreme events
in PP and a simple MOS method (Gawley et al., 2006;
Goodess et al., 2010); the study by Frost et al. (2011), who
compared the representation of spell lengths and interan-
nual variability in an RCM, a bias correction method, a PP
method and two weather generators; the study by Hu et al.
(2013), who carried out a similar intercomparison for a PP
method and two weather generators; the study by Bürger
et al. (2012), who compared extreme spells in several PP
and MOS methods; and the recent study by Huth et al.
(2015), which investigated temporal aspects in both sta-
tistical and dynamical downscaling methods. But all these
studies still include only a rather limited range of methods.
Even though extremely important for climate change
studies (Pielke and Wilby, 2012), evaluation studies of
trends in downscaled data are scarce (Benestad and Hau-
gen, 2007; Lorenz and Jacob, 2010; Bukovsky, 2012;
Ceppi et al., 2012; Huth et al., 2015). These studies
broadly indicate a rather limited ability of downscaling
methods to reproduce trends.
In brief, a substantial research gap exists. The perfor-
mance of many downscaling and bias correction methods
to represent temporal aspects – both individually and
relative to each other – is largely unknown. This study
takes a rst step to close this gap. In a perfect predic-
tor experiment, we analysed the performance of 1 raw
RCM and 48 statistical methods to represent day-to-day
variability, spells, seasonality, interannual and long-term
variability including trends. Aspects of temporal vari-
ability specically addressing extreme events, such as
long heatwaves or meteorological drought, are addressed
in the companion paper on extreme events (E. Hertig
et al., 2017; personal communication, in this issue). The
considered experiment was conducted for daily values,
hence we cannot evaluate sub-daily variations.
VALUE is a community effort, the participation in this
experiment (and its evaluation) was unpaid. The partic-
ipating methods thus form an ensemble of opportunity.
In particular no systematic set of predictor variables or
domains has been prescribed. Thus statements about opti-
mal predictor choice are limited to a few comparisons of
similar (or identical) methods with different predictors. A
detailed set of metadata has, however, been collected for all
participating methods. These metadata describe structural
aspects of all methods and often allow for quite detailed
interpretations of the individual performance. In the paper,
we will discuss selected examples in more detail, and addi-
tionally give a broad overview of the different model fam-
ilies. The metadata and complete results for individual
methods are available from the VALUE portal www.value-
cost.eu/validationportal for further investigation.
The aim of the perfect predictor experiment is to eval-
uate the isolated skill of the raw RCM and the statistical
models. Consequently, this study cannot give a conclusive
assessment of the skill to simulate regional future climates.
The skill of a full regional modelling system, compris-
ing the full modelling chain from GCM to RCM and/or
statistical model, as well as the downscaling performance
in future climates will be considered in additional experi-
ments (Maraun et al., 2015).
In the following section, we will briey review the
experimental setup, the considered diagnostics and the
participating methods. In Section 3 we will present
the results for different diagnostics and methods. An
overall discussion of the results will follow in the nal
section.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
2. Experiment, diagnostics and methods
The experimental design follows the VALUE perfect
predictor experiment with station data as target. As
(approximately) perfect predictors and perfect boundary
conditions, we use ERA-Interim data from 1 January
1979 to 31 December 2008 (Dee et al., 2011). The MOS
methods use ERA-Interim data at their native resolution
of 0.75∘as input, the PP methods ERA-Interim predic-
tors at 2∘, which resembles a typical GCM resolution.
Furthermore, most MOS methods also use ERA-Interim,
downscaled with the RCM RACMO (van Meijgaard et al.,
2008), as input to represent a typical RCM bias correction
situation. Apart from the resolution, some important
differences between these two MOS settings exist: in
the rst case, internal variability at the grid-box scale
is closely tied to real-world internal variability, whereas
the RCM develops its own internal variability within the
RCM domain. Furthermore, observed temperatures have
been assimilated into the ERA-Interim reanalysis; the
resulting predictors are thus essentially bias free at the
grid-box scale and differences with station observations
mainly result from the scale gap. RCM temperatures
inside the domain, however, are only mildly constrained
by the boundaries and are thus typically affected by
biases. Precipitation is in both cases calculated by model
parameterizations, without any reference to observed
precipitation. It is thus affected by scale-gap and biases.
As predictand data, time series from 86 stations from
the publicly available ECA data base were used (Klein
Tank et al., 2002). These stations were selected to cover
the different European climates, covering Mediterranean,
maritime, continental, alpine and subpolar climates. For
details refer to J. M. Gutiérrez et al. (2017; personal com-
munication) and the File S1, Supporting information.
In this article, we consider daily maximum and mini-
mum temperature and daily precipitation only. A dedicated
analysis of other variables will be carried out separately
for a set of stations in Germany. For the statistical meth-
ods a vefold cross validation with non-overlapping 6-year
blocks is carried out. Further details about the protocol can
be found in Maraun et al. (2015), J. M. Gutiérrez et al.
(2017; personal communication) and on www.value-cost
.eu/validation#Experiment_1a.
Table 1 lists the diagnostics we considered: the indices to
measure a specic aspect of temporal variability, the corre-
sponding performance measure to quantify the mismatch
with observations and the temporal resolution (seasonal,
annual) at which the evaluation has been carried out. In
two cases, we assessed correlations between observed and
downscaled local time series, namely at the interannual
and 7-year time scales. In this case, the diagnostic consists
of a performance measure – the correlation – only.
Detailed descriptions of these diagnostics can be
found in the File S1. The code used to calculate these
diagnostics is available from http://www.value-cost.eu/
validationportal/app#!indices (registration required).
In this analysis, we compare methods from the PP,
MOS and unconditional weather generator approaches
with raw ERA-Interim output, and dynamically down-
scaled ERA-Interim. Tables 2 and 3 list the methods par-
ticipating in the experiment (many methods are identical
for the different variables, but in several cases differences
exist in the implementation for different variables. There-
fore, we decided not to list the methods in a single table).
The MOS methods are listed prior to the PP methods to
ease comparison with the raw RCM and ERA-Interim data.
PP methods are calibrated purely on observed predictors
and predictands. The statistical model is then applied to
climate model predictors. In a climate change context, the
approach is based on three major assumptions Maraun and
Widmann (2018): rst, that the GCM predictors are per-
fectly simulated (hence the name) in present and future
climate. As a consequence, predictors are typically taken
from large-scale elds of the free atmosphere. Second, the
predictors should be informative of local variability and
climate change. Third, the model structure should well
describe local variability, and allow for at least moder-
ate extrapolations under climate change. Our evaluation
experiment employs perfect predictors to isolate down-
scaling skill in present climate. It can therefore be used
to assess whether the chosen predictors are informative of
local variability and observed changes, and whether the
model structure well describes observed local variability
and changes. The PP assumption and performance under
future climate change, however, cannot be assessed.
The participating PP methods broadly represent widely
used approaches – analogue, regression and weather-type
methods. Some of regression methods apply variance ina-
tion (MLR-ASI, MLR-AAI, GLM-P), some are stochastic
(see Tables). The ESD methods downscale at the monthly
scale, thus no diagnostics are considered that involve daily
values. The ESD– EOF implementation differs from the
standard ESD version in that the predictand values are l-
tered by PCA Benestad et al. (2015b).
All stochastic methods use, conditionally on the predic-
tors, independent noise, i.e. they do not have an explicit
Markov component implemented to simulate short-term
persistence. For precipitation, some of the participating
PP methods have been included for illustrative purposes
only (MLR-RAN, MLR-RSN, MLR-ASW, MLR-ASI). In
fact, it is well known that simple multiple linear regres-
sion methods are not suitable to model daily precipitation.
Yet they do participate in the intercomparison to high-
light the problems associated with them (marked in grey in
Table 3). Two of the stochastic methods (GLM and SWG)
are based on generalized linear models, with a logistic
regression for the occurrence process, and a generalized
linear regression on the gamma distribution parameters for
the amounts process. GLM-WT and WT-WG condition
the distribution parameters for occurrence and amounts on
weather types.
MOS methods are calibrated between model simula-
tions and observations. The approach can thus in principle
adjust biases (in fact, in climate science, these are almost
exclusively bias correction methods, i.e. predictor and
predictand have the same physical dimension), but has
to be calibrated individually to the chosen model. MOS
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
Table1. Diagnostics considered. Diagnostics only shown in File S1 are plotted in grey. For details see http://www.value-cost.eu/
validationportal/app#!indices and click on ‘details’ for the underlying R-Code (note that registration is required).
Index Variables Performance
measure
Resolution Description
Short-term variability
ACF1 Tmax ,Tmin Bias Seasonal Lag-1 autocorrelation
ACF2 Tmax ,Tmin Bias Seasonal lag-2 autocorrelation
WWprob Precipitation Bias Seasonal Probability of wet-wet transition
WDprob Precipitation Bias Seasonal Probability of wet-dry transition
Spellsa
WarmSpellMean Tmax Bias Seasonal Mean of the warm (>90th percentile) spell length
distribution
ColdSpellMean Tmin Bias Seasonal Mean of the cold (<10th percentile) spell length
distribution
WetSpellMean Precipitation Bias Seasonal Mean of the wet (≥1 mm) spell length distribution
DrySpellMean Precipitation Bias Seasonal Mean of the dry (<1 mm) spell length distribution
Interannual to long-term variability
VarY Tmax,Tmin ,rel. Error Seasonal Variance of seasonally/annually averaged data
precipitation
Cor.1Y Tmax,Tmin,Bias Seasonal Correlation with observations of
seasonally/annually averaged data
precipitation
Cor.7Y Tmax,Tmin,Correlation Seasonal Correlation with observations of
seasonally/annually averaged and Filtered data
precipitation
Trend Tmax,Tmin ,Trends themselves Seasonal Long-term (relative) trend of seasonally/annually
averaged data
precipitation
Annual cycle
AnnualCycleAmp Tmax,Tmin Bias Annual Amplitude of the annual cycle
AnnualCycleRelAmp precipitation rel. Error Annual Relative amplitude of the annual cycle
AnnualCyclePhase Tmax,Tmin Circular bias Annual Phase of highest peakb
aNote that, different to Maraun et al. (2015), we consider the mean, not the median, of the spell length distribution. The reason is that both statistics
are typically small numbers of order one. The median of counts is an integer and eliminates much information. Even a strongly biased distribution
would show up as essentially unbiased. bIn the model, the peak of the two highest peaks is considered which is closest to the observed peak.
is based on three major assumptions (which make up the
so-called stationarity assumption), similar to those of the
PP approach Maraun and Widmann (2018): rst, the pre-
dictors have to be credibly (but not necessarily bias free)
simulated. Second, the predictors need to be representative
of the local variable. And third, as in PP, the structure of
the transfer function needs to be suitable. Again, the rst
assumption cannot be tested with perfect predictors, only
the second and third, and only for present-day climate.
The participating MOS methods comprehensively span
the range of widely used methods, and also cover some
more experimental recent developments such as stochastic
bias correction (VGLMGAMMA Wong et al., 2014).
None of the participating MOS methods modies residual
temporal dependence directly, but only indirectly via
changes in the marginal distribution. The CDFt method
calibrates a statistical distribution also in the validation
period. As this is only 6 years in our experiment (in a cli-
mate change experiment, one would typically use a 30-year
time slice), we expect a broad spread for the resulting
performance measures due to sampling variability.
Unconditional weather generators are not conditioned
on meteorological predictors, but stochastically simulate
marginal and temporal aspects, sometimes also spatial.
They are calibrated to observed weather statistics. Under
climate change, the model parameters (or the observed
weather statistics) are adjusted by so-called change factors
derived from climate models. The underlying assumptions
are thus similar to those for MOS Maraun and Widmann
(2018): rst, the change factors have to be credibly simu-
lated, and all relevant change factors have to be included;
second, the simulated change factors have to represen-
tative of local changes; and third, the model structure
has to be suitable. In the chosen experiment, no change
factors are applied between calibration and validation
period; thus only the suitability of the model structure can
be evaluated. Some climatic statistics may have changed
between calibration and validation period, but resulting in
systematic biases cancel out under cross-validation.
The SS-WG and MARFI unconditional weather genera-
tors are of the Richardson type Richardson (1981), i.e. they
use a Markov chain to simulate precipitation occurrence,
and an autoregressive model to simulate temperature. A
major difference between the two is the wet-day thresh-
old: the SS-WG uses 1 mm while the MARFI models use
0.5 mm (note that the evaluation indices are in any case
based on a 1-mm threshold). The GOMEZ weather gener-
ators are based on resampling.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
Table 2. Participating methods for temperature.
Code Tech ST AC SE Predictors Domain References
MOS
RaiRat-M6 S No No Yes Temperature Gridbox Räisänen and Räty (2013)
RaiRat-M7 S No No Yes Temperature Gridbox Räisänen and Räty (2013)
RaiRat-M8 S No No Yes Temperature Gridbox Räisänen and Räty (2013)
SB S No No Yes Temperature Gridbox
ISI-MIP S/PM No No Yes Temperature Gridbox Hempel et al. (2013)
DBS PM No No Yes Temperature Gridbox Yang et al. (2010, 2015)
GPQM PM No No No Temperature Gridbox Bedia et al. (2016)
EQM QM No No No Temperature Gridbox Bedia et al. (2016)
EQMs QM No No Yes Temperature Gridbox Bedia et al. (2016)
EQM-WT QM/WT No No No Temperature Gridbox Bedia et al. (2016)
QMm QM No No Yes Temperature Gridbox Li et al. (2010)
QMBC-BJ-PR QM No No Yes Temperature Gridbox Pongrácz et al. (2014)
Bartholy et al. (2015)
CDFt QM No No Yes Temperature Gridbox Vrac et al. (2012)
QM-DAP QM No No Yes Temperature Gridbox Štˇ
epánek et al. (2016)
EQM-WIC658 QM No No Yes Temperature Gridbox Wilcke et al. (2013)
RaiRat-M9 QM No No Yes Temperature Gridbox Räisänen and Räty (2013)
DBBC QM No No Yes Temperature Gridbox
DBD QM No No Yes Temperature Gridbox
MOS-REG TF Yes No No Temperature 4 Gridboxes S. Herrera et al. (2017; personal
communication)
FIC02T PM/A/TF No No Yes Temperature Gridbox
PP
FIC01T A/TF No No Yes Z1000+500 Nat. >Gridb.
ANALOG-ANOM A No No Yes SLP/TD/T2/U +V+Z850 Continental Vaittinada Ayar et al. (2016)
ANALOG A No No No SLP/T2/T850
+700 +500/Q850
+500/Z500
National Gutiérrez et al. (2013)
San-Martín et al. (2017)
ANALOG-MP A No No Yes Z1000 +500 >U
+V600/T850
Nat. >Gridb. Obled et al. (2002)
Raynaud et al. (2017)
ANALOG-SP A No No Yes Z1000 +500 >T2/ T2-TD Nat. >Gridb. Obled et al. (2002)
Raynaud et al. (2017)
MO-GP TF No No No Full standard set Gridbox Zerenner et al. (2016)
MLR-T TF No No No T2/SLP/U +V10 m/T
+Q+U+V850
+700 +500
Gridbox
MLR-RAN TF No No No Z500/T850 Gridbox Huth (2002); Huth et al. (2015)
MLR-RSN TF No No Yes Z500/T850 Gridbox Huth (2002); Huth et al. (2015)
MLR-ASW TF Yes No Yes Z500/T850 Gridbox Huth (2002); Huth et al. (2015)
MLR-ASI TF No No Yes Z500/T850 Gridbox Huth (2002); Huth et al. (2015)
MLR-AAN TF No No Yes Z500/T850 Gridbox Huth (2002); Huth et al. (2015)
MLR-AAI TF No No Yes Z500/T850 Gridbox Huth (2002); Huth et al. (2015)
MLR-AAW TF Yes No Yes Z500/T850 Gridbox Huth (2002); Huth et al. (2015)
MLR-PCA-ZTR TF No No Yes Z850/T850/R850 Continental Hertig and Jacobeit (2008)
ESD-EOFSLP TF/WT No No Yes SLP Continental Benestad et al. (2015a)
ESD-EOFT2 TF/WT No No Yes T2 Continental Benestad et al. (2015a)
ESD-SLP TF/WT No No Yes SLP Continental Benestad et al. (2015a)
ESD-T2 TF/WT No No Yes T2 Continental Benestad et al. (2015a)
MLR TF No No No SLP/T2/T850
+700 +500/Q850
+500/Z500
National Gutiérrez et al. (2013)
MLR-WT TF/WT Yes No Yes SLP/T2/T850 +700 +500/
Q850 +500/Z500
National Gutiérrez et al. (2013)
WT-WG WT/WG Yes No No SLP National Gutiérrez et al. (2013)
SWG TF/WG Yes No Yes SLP/T2/TD/U +V+Z850 Continental Vaittinada Ayar et al. (2016)
WG
SS-WG WG Yes Yes Yes NA NA Keller et al. (2015, 2017)
MARFI-BASIC WG Yes Yes Yes NA NA
MARFI-TAD WG Yes Yes Yes NA NA
MARFI-M3 WG Yes Yes Yes NA NA
GOMEZ-BASIC WG Yes Yes Yes NA NA
GOMEZ-TAD WG Yes Yes Yes NA NA
Techniques: S, additive correction; PM, parametric quantile mapping; QM, empirical quantile mapping; A, analog method; TF, regression-like transfer function; WT,
weather typing; WG, weather generator. Explicitly modelled: ST, stochastic noise; AC, autocorrelation; SE, seasonality. SLP, sea level pressure; T2, 2m– temperature;
T, temperature; TD, dew point temperature; Z, geopotential height; Q, specic humidity; R, relative humidity; U,V,Z, velocities. A >indicates a two-step method. For
the full VALUE standard set of predictors and further details on the methods see J. M. Gutiérrez et al. (2017; personal communication) or http://www.value-cost.eu/
validationportal/app#!downscalingmethod.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
Table 3. Participating methods for precipitation.
Code Tech ST AC SE Predictors Domain References
MOS
Ratyetal-M6 S No No Yes Precipitation Gridbox Räty et al. (2014)
Ratyetal-M7 S No No Yes Precipitation Gridbox Räty et al. (2014)
ISI-MIP S/PM No No Yes Precipitation Gridbox Hempel et al. (2013)
DBS PM No No Yes Precipitation Gridbox Yang et al. (2005, 2015)
Ratyetal-M9 PM No No Yes Precipitation Gridbox Räty et al. (2014)
BC PM No No Yes Precipitation Gridbox Monjo et al. (2014)
GQM PM No No No Precipitation Gridbox Bedia et al. (2016)
GPQM PM No No No Precipitation Gridbox Bedia et al. (2016)
EQM QM No No No Precipitation Gridbox Bedia et al. (2016)
EQMs QM No No Yes Precipitation Gridbox Bedia et al. (2016)
EQM-WT QM/WT No No No Precipitation Gridbox Bedia et al. (2016)
QMm QM No No Yes Precipitation Gridbox Li et al. (2010)
QMBC-BJ-PR QM No No Yes Precipitation Gridbox Pongrácz et al. (2014)
Bartholy et al. (2015)
CDFt QM No No Yes Precipitation Gridbox Vrac et al. (2012)
QM-DAP QM No No Yes Precipitation Gridbox Štˇ
epánek et al. (2016)
EQM-WIC658 QM No No Yes Precipitation Gridbox Wilcke et al. (2013)
Ratyetal-M8 QM No No Yes Precipitation Gridbox Räty et al. (2014)
MOS-AN A No No No Precipitation Gridbox Turco et al. (2011, 2017)
MOS-GLM TF Yes No No Precipitation 4 Gridboxes S. Herrera et al. (2017;
personal communication)
VGLMGAMMA TF/WG Yes No Yes Precipitation Gridbox Wong et al. (2014)
FIC02P PM/A/TF No No Yes Precipitation Gridbox
FIC04P PM/A/TF No No Yes Precipitation Gridbox
PP
FIC01P A/TF No No Yes Z1000+500 Nat. >Gridb.
FIC03P A/TF No No Yes U +V10 m/U +V500/
R850 +700
Nat. >Gridb.
>R850/Q700
ANALOG-ANOM A No No Yes SLP/TD/T2/U +V+Z850 Continental Vaittinada Ayar et al. (2016)
ANALOG A No No No SLP/T2/T850 +700 +500/
Q850 +500/Z500
National Gutiérrez et al. (2013)
San-Martín et al. (2017)
ANALOG-MP A No No Yes Z1000 +500 >U+V600/T850 Nat. >Gridb. Obled et al. (2002)
Raynaud et al. (2017)
ANALOG-SP A No No Yes Z1000 +500 >T2/ T2-TD Nat. >Gridb. Obled et al. (2002)
Raynaud et al. (2017)
MO-GP TF No No No full standard set Gridbox Zerenner et al. (2016)
GLM-P TF YesaNo No SLP/ U+V10 m / T+Q+U+
V850+700+500
Gridbox
MLR-RAN TF No No No Z500/T850 Gridbox
MLR-RSN TF No No Yes Z500/T850 Gridbox
MLR-ASW TF Yes No Yes Z500/T850 Gridbox
MLR-ASI TF No No Yes Z500/T850 Gridbox
GLM-det TF No No No SLP/T2/T850+700 +500/
Q850 +500/Z500
National San-Martín et al. (2017)
GLM TF Yes No No SLP/T2/T850 +700
+500/Q850 +500/Z500
National San-Martín et al. (2017)
GLM-WT TF/WT Yes No Yes SLP/T2/T850 +700
+500/Q850 +500/Z500
National San-Martín et al. (2017)
(WT: only SLP)
WT-WG WT/WG Yes No No SLP National San-Martín et al. (2017)
SWG TF/WG Yes No Yes SLP/T2/TD/U +V+Z850 Continental Vaittinada Ayar et al. (2016)
WG
SS-WG WG Yes Yes Yes NA NA Keller et al. (2015, 2017)
MARFI-BASIC WG Yes Yes Yes NA NA
MARFI-TAD WG Yes Yes Yes NA NA
MARFI-M3 WG Yes Yes Yes NA NA
GOMEZ-BASIC WG Yes Yes Yes NA NA
GOMEZ-TAD WG Yes Yes Yes NA NA
Techniques: S, scaling; PM, parametric quantile mapping; QM, empirical quantile mapping; A, analog method; TF, regression-like transfer function; WT, weather typing;
WG, weather generator. Explicitly modelled: ST, stochastic noise; AC, autocorrelation; SE, seasonality. SLP, sea level pressure; T2, 2m– temperature; T, temperature;
TD, dew point temperature; Z, geopotential height; Q, specic humidity; R, relative humidity; U,V,Z, velocities. A>indicates a two-step method. Methods included
for illustrative purposes are marked in grey. For the full VALUE standard set of predictors and further details on the methods see J. M. Gutiérrez et al. (2017; personal
communication) or http://www.value-cost.eu/validationportal/app#!downscalingmethod. aOnly occurrence is randomized, amounts are based on inated regression (in
this case, the results are based on a single realisation).
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
2 4 6 8 10 12 14
0
50
100
150
Length (days)
Number of spells
Observations
EQM
GLM
MLR−ASI
Ratyetal−M6
SS−WG
0 100 200 300
0
2
4
6
8
10
Day of the year
Precipitation (mm/day)
Figure 2. Illustration of selected aspects for daily precipitation, Graz, Austria. Top: dry spell length distribution. Bottom: annual cycle. Top, vertical
dashed lines: mean spell length; bottom, vertical dashed lines: phase of annual cycle maximum; bottom, horizontal lines: minimum and maximum
of annual cycle. [Colour gure can be viewed at wileyonlinelibrary.com].
Diagnostics have been calculated for each method
and each station. They can be downloaded from the
VALUE portal (www.value-cost.eu/validationportal/
app#!validation). For stochastic methods, an ensemble of
100 realizations have been uploaded. The performance
measures have been derived for each realization and then
averaged across the ensemble.
When interpreting the evaluation results, it has to be
acknowledged whether a specic index is calibrated or
emerges from the model. For instance, a good represen-
tation of the annual cycle could result from including
meteorological predictors that describe the annual cycle,
or trivially from tting a statistical model separately to
each month. In particular, weather generators by construc-
tion resemble many marginal and temporal aspects. In this
study, only spell lengths and interannual variability are
not calibrated. In Tables 2 and 3 we therefore also list
whether short-term dependence (AC) and seasonality (SE)
are calibrated or not. For further details on the contributing
methods see J. M. Gutiérrez et al. (2017; personal com-
munication) or the VALUE portal (www.value-cost.eu/
validationportal/app#!downscalingmethod).
3. Results
Figure 2 illustrates selected temporal aspects for precipi-
tation in Graz, Austria, and how corresponding model per-
formance has been quantied in this study. The top panel
shows the dry spell length distribution. Observations are
shown in bold solid black, the results for ve different sta-
tistical methods are shown in colour. Methods in red and
orange are MOS, in blue PP, and the method shown in
magenta is an unconditional weather generator. One index
that can be derived from the distribution is the mean spell
length (which is quantied in this study for all the partici-
pating methods and all selected weather stations). Dashed
vertical lines show this index for observations and statis-
tical models. The performance of a model is given by the
difference between the modelled and observed mean, i.e.
the mean spell length bias. Similarly, the bottom panel
shows the annual cycle of daily mean precipitation. Here,
two indices are considered: rst, the relative amplitude (for
temperature the absolute amplitude) dened as the differ-
ence between maximum and minimum value (horizontal
dashed lines), relative to the mean of these two values. Sec-
ond, the phase of the annual cycle, dened as the day of
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
the annual cycle maximum∗(vertical dashed lines). The
performance for the rst is measured as the relative error
between modelled and observed relative amplitude, for the
second as the circular bias between modelled and observed
phase (circular in the sense that the difference between,
say, 31 December and 1 January is −1 day, not 364 days).
In the following, we present the results, separately for
temperature and precipitation. To keep the number of
gures at a reasonable level, we selected a suite of rele-
vant diagnostics for short-term variability, spells, monthly
to interannual variability, and the annual cycle. Often, only
one season is shown, in case of temperature, only either
daily minimum or maximum temperature. A more com-
prehensive catalogue of plots can be found in the File
S1. The gures for all diagnostics are organized similarly,
see Figure 3 as an example. In this example, one diagnos-
tic is shown for daily maximum and minimum tempera-
ture. In the top row, the observed indices are shown – here
auto-correlation of daily maximum (left) and minimum
(right) temperatures. Note that correlations on interannual
and 7-year time scales have no corresponding observed
indices, consequently no maps are drawn. The two panels
below show the performance measures for these indices
(top: maximum temperature, bottom: minimum tempera-
ture). Each box-whisker-plot represents one method: the
raw driving data (ERA-Interim at the 2∘resolution used as
predictor for PP methods, at the native 0.75∘resolution and
the RACMO2 RCM), the MOS methods, the PP methods
and the unconditional weather generators. The individual
box-whisker-plots summarize the results for all 86 sta-
tions: the boxes give the 25–75% range, the whiskers the
maximum value within 1.5 times the interquartile range;
values outside that range are plotted individually. The thick
coloured horizontal bars show the medians for the indi-
vidual PRUDENCE regions (Christensen and Christensen,
2007). Note that the number of stations entering these cal-
culations differs from region to region (ranging from 3
in France to 21 in Scandinavia, typically around 10). A
red asterisk indicates that values lie outside the plotted
range. Results for individual stations are – depending on
the index – substantially affected by noise, but the median
over all considered stations in general provides a robust
estimate of the overall performance of a given method. Fur-
thermore, the diagnostic is solely dened between obser-
vations and simulations, thus no observed indices exist.
For a given index, all methods are shown for which
the index may sensibly be calculated. That is, methods
producing only monthly output are not shown for any
indices based on daily values. Otherwise, all indices are
presented, even though a method might not be designed to
reproduce them. Such results are not intended to denounce
specic methods, but rather to highlight the consequences
of using a method in such a context. These situations will
be made explicit to avoid misinterpretation of the results.
As mentioned in the Introduction, the methods partici-
pating in the experiment form an ensemble of opportunity.
∗In some cases, the annual cycle of precipitation has two maxima. We
will discuss below how the phase is dened in this case.
Also we have a list of candidate predictors for each
method, but the actually selected set of predictors might be
much lower for individual stations. To fully attribute differ-
ences in model performance to the approach, the particular
implementation and the choice of predictors, dedicated
sensitivity studies would be required. In many cases,
conclusions may be drawn for groups of methods. For
instance, all analog methods often behave similarly inde-
pendent of the different predictors and implementations.
Thus, conclusions about analog-type methods as a whole
can often be drawn. A discussion of differences within
this type, however, would be very speculative, because
the individual methods often differ both in the implemen-
tation and choice of predictors. The level of detail in our
interpretation will thus differ from case to case. In some
cases, any discussion would be too speculative – we then
restrict ourselves to a description of the ndings.
3.1. Temperature
3.1.1. Short-term variability
Figure 3 shows the results for lag-1 autocorrelation of
summer daily maximum and minimum temperature as a
measure of short-term persistence. The top row shows
observations for daily maximum (left) and minimum
(right) temperature. The corresponding plots for winter
can be found in File S1. For Tmax, summer persistence is
relatively evenly distributed across Europe; for Tmin,per-
sistence is notably lower over many regions. The bottom
panels show the performance of the individual models.
The spatial averaging of ERA-Interim results in a
moderate overestimation of summer persistence of Tmax
(upper panel), these biases are reduced by the RCM.
Almost all MOS methods inherit the skill of the predictor
data set, in particular the added value of the RCM. The
regression based MOS method (MOS-REG) includes
averaging across several grid boxes and thus overesti-
mates persistence. All analog methods underestimate
persistence of temperature. The reason might be twofold:
rst, the spatial predictor variability might be strongest
for circulation-based predictors. Thus, analogs may be
selected that best constrain circulation (and in turn precip-
itation, see Section 3.2). And second, large-scale analogs
might be sufciently dissimilar at local scales to deteri-
orate day-to-day variations. Understanding this problem
requires further detailed analysis. The ANALOG-ANOM
method uses predictors dened at a continental scale,
which likely explains the low performance.
As expected, all deterministic regression models overes-
timate persistence, as not all local variability is explained
by large-scale predictors. This problem cannot be mit-
igated by inated regression (MLR-ASI, MLR-AAI).
All stochastic regression models randomize with white
noise (MLR-ASW, MLR-AAW; though conditional on
the predictors) and thus underestimate persistence. The
low performance of the SWG method may partly be
explained by the use of continental-scale predictors in
combination with a stochastic white-noise randomization.
The WT-WG method performs worst, as it is stochastic
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
RaiRat−M6−E
RaiRat−M6−R
RaiRat−M7−E
RaiRat−M7−R
RaiRat−M8−E
RaiRat−M8−R
SB−E
SB−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
RaiRat−M9−E
RaiRat−M9−R
DBBC−E
DBBC−R
DBD−E
DBD−R
MOS−REG−E
MOS−REG−R
FIC02T−E
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
−0.6
−0.4
−0.2
0.0
0.2
raw PP WG
ERAINT−200
ERAINT−075
RACMO22E
RaiRat−M6−E
RaiRat−M6−R
RaiRat−M7−E
RaiRat−M7−R
RaiRat−M8−E
RaiRat−M8−R
SB−E
SB−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
RaiRat−M9−E
RaiRat−M9−R
DBBC−E
DBBC−R
DBD−E
DBD−R
MOS−REG−E
MOS−REG−R
FIC02T−E
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
−0.6
−0.4
−0.2
0.0
0.2
British Isles Iberian Peninsula France Central Europe Scandinavia Alps Mediterranean Eastern Europe
Figure 3. AC1 for summer Tmax (left/top) and Tmin (right/bottom). Top row: observed relationships for summer. Bottom rows: bias of the individual
methods. For each method, box-whisker-plots summarize the information for all considered stations. Boxes span the 25–75% range, the whiskers
the maximum value within 1.5 times the interquartile range, values outside that range are plotted individually. A red asterisk indicates that values lie
outside the plotted range. The sufxes in the names of the MOS methods indicate whether a method has been driven with ERA-Interim (−E) or the
RCM (−R). [Colour gure can be viewed at wileyonlinelibrary.com].
and additionally uses only sea level pressure as predictor.
For the Iberian Pensinula and the UK, ERA-Interim over-
estimates summer persistence of Tmax, the RCM reduces
the bias. Conversely, for Eastern Europe ERA-Interim
is almost bias free, but the RCM reduces persistence.
This performance is again inherited by many statistical
methods.
For Tmin (lower panel), the performance is consistently
worse for all approaches, with a strong tendency to
overestimate summer persistence. The RCM, however,
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
1.3
1.5
1.7
1.9
2.1
2.3
2.5
2.7
2.9
3.1
3.3
3.5
1.3
1.5
1.7
1.9
2.1
2.3
2.5
2.7
2.9
3.1
3.3
3.5
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
RaiRat−M6−E
RaiRat−M6−R
RaiRat−M7−E
RaiRat−M7−R
RaiRat−M8−E
RaiRat−M8−R
SB−E
SB−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
RaiRat−M9−E
RaiRat−M9−R
DBBC−E
DBBC−R
DBD−E
DBD−R
MOS−REG−E
MOS−REG−R
FIC02T−E
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
RaiRat−M6−E
RaiRat−M6−R
RaiRat−M7−E
RaiRat−M7−R
RaiRat−M8−E
RaiRat−M8−R
SB−E
SB−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
RaiRat−M9−E
RaiRat−M9−R
DBBC−E
DBBC−R
DBD−E
DBD−R
MOS−REG−E
MOS−REG−R
FIC02T−E
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
British Isles Iberian Peninsula France Central Europe Scandinavia Alps Mediterranean Easter n Europe
Figure 4. As Figure 3, but for summer WarmSpellMean (days) of Tmax (top/left) and summer ColdSpellMean (days) of Tmin (bottom/right). [Colour
gure can be viewed at wileyonlinelibrary.com].
performs slightly worse than ERA-Interim. The relative
performance across most other methods is similar to that
for Tmax. The ISIMIP method, driven with ERA-Interim,
is a notable exception – it has the lowest bias of all MOS
methods. Most MOS methods leave the persistence bias
essentially unchanged, the methods driven with reanalysis
data have a lower bias, the methods driven with the RCM
a higher. Interestingly, however, some QM-based bias
correction methods moderately improve the representation
of persistence indirectly by adjusting marginal distribu-
tions. The persistence of summer Tmin is overestimated
in the British Isles. But in contrast to the overall
behaviour, this bias is reduced by the RCM (and
again, this reduction is inherited by the MOS meth-
ods). The performance for most methods is best in
the Alps.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
10
12
14
16
18
20
22
24
26
28
1
2
3
4
5
6
7
8
9
10
11
12
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
RaiRat−M6−E
RaiRat−M6−R
RaiRat−M7−E
RaiRat−M7−R
RaiRat−M8−E
RaiRat−M8−R
SB−E
SB−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
RaiRat−M9−E
RaiRat−M9−R
DBBC−E
DBBC−R
DBD−E
DBD−R
MOS−REG−E
MOS−REG−R
FIC02T−E
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
−10
−8
−6
−4
−2
0
2
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
RaiRat−M6−E
RaiRat−M6−R
RaiRat−M7−E
RaiRat−M7−R
RaiRat−M8−E
RaiRat−M8−R
SB−E
SB−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
RaiRat−M9−E
RaiRat−M9−R
DBBC−E
DBBC−R
DBD−E
DBD−R
MOS−REG−E
MOS−REG−R
FIC02T−E
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
−10
−5
0
5
10
British Isles Iberian Peninsula France Central Europe Scandinavia Alps Mediterranean Eastern Europe
Figure 5. As Figure 3, but for the amplitude (K) (left/top) and phase (days) (right/bottom) of the annual cycle for Tmax. [Colour gure can be viewed
at wileyonlinelibrary.com].
3.1.2. Spells
Overall, the performance to simulate spells is similar to the
performance to simulate short-term variability. The results
for summer temperature spells are shown in Figure 4,
measured in terms of the mean spell length. Recall that
temperature-related spells are not dened by exceedances
of absolute thresholds (e.g., 30 ∘C), but by the 90th
percentile of daily maximum temperature, which varies
from station to station and will be much lower in Scan-
dinavia than in the Mediterranean (Table 1). The longest
summer warm spells occur in Scandinavia, the shortest
in the western Mediterranean. Summer cold spells are
generally much shorter shortest in Northern Europe, and
longest in the Mediterranean.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
ERA-Interim simulates slightly too long warm spells
of Tmax (upper panel), in particular for the area aver-
aged version. The RCM, again, adds value. MOS inher-
its the predictor performance (by construction, as the
percentile-based spells are invariant to bias correction).
Owing to the predictor averaging, the regression-based
MOS (MOS-REG) again performs considerably worse.
Also the behaviour of the PP methods is broadly consistent
with that for short-term persistence: analog methods and
stochastic white noise methods (MLR-ASW, MLR-AAW,
WT-WG, SWG) simulate too short spells. This holds
in particular WT-WG, driven only with sea level pres-
sure. Weather generators slightly underestimate mean spell
lengths, in particular those who underestimate short-term
persistence. Persistence of summer warm spells of Tmax is
consistently overestimated over the Mediterranean, a bias
whichismuchimprovedbytheRCM.
The persistence for summer cold spells of Tmin (lower
panel), consistent with the results for short-term persis-
tence, is generally too high. The RCM deteriorates the
performance of ERA-Interim. This performance is, again
trivially, unchanged by the MOS methods. The PP meth-
ods perform similar as for warm spells, though with a ten-
dency towards higher persistence. All weather generators
perform well, consistent with the results for short-term per-
sistence. Cold spells of summer Tmin are too long for the
British Isles and (but to a lesser extent) the Mediterranean.
Performance is best for the Alps.
3.1.3. Seasonality
The amplitude of the annual cycle of Tmax (Figure 5) is
small towards the Atlantic and the Mediterranean, and
large in the continental climates of eastern Scandinavia and
Eastern Europe. It peaks in July in continental central and
Eastern Europe, and slightly later in August towards the
Atlantic. ERA-Interim slightly underestimates the ampli-
tude of the seasonal cycle (upper panel) – likely linked to
its resolution, as the further averaging increases the bias.
The RCM in general adds value, but also increase spread
across stations. Being seasonally trained, most MOS meth-
ods trivially capture the annual cycle well. Note, however,
that also the quantile mapping methods without an explic-
itly annual cycle perform well (GPQM, EQM, EQM-WT)
for most stations. The authors do not understand the strong
drop in performance of the MOS-REG method when
driven with the RCM instead of ERA-Interim. Most PP
methods perform reasonably well, even those without sea-
sonal training, because the physical link between the pre-
dictors (including temperature) and the predictand is close.
Only the WT-WG method sticks out: it is not seasonally
trained and uses only sea level pressure as predictor. Thus,
seasonality in circulation patterns is captured, but not the
changes in temperature within these patterns. The weather
generators perform well by construction.
The phase of the seasonal cycle (lower panel) is cap-
tured by most methods. ERA-Interim peaks a day too
late, the RCM increases the spread across stations. MOS
methods perform well, even those without an explicit
model of the seasonal cycle (GPQM, EQM, EQM-WT)
are within ±2 days (apart from the MOS-REG method,
when driven with the RCM). The analog methods per-
form reasonably well, although the version without sea-
sonal training (ANALOG) has a comparably broad spread
across seasons. For regression models, no seasonal train-
ing is required if the predictors are standardized (e.g.,
MLR-AAN, MLR-AAI compared to MLR-RAN). Biases
in the ESD methods are caused by the monthly resolu-
tion of the data. Again, weather generators perform well
by construction.
3.1.4. Interannual variability and long-term trends
Interannual variability of summer daily maximum temper-
ature, measured by the variance of summer mean values, is
lowest in the Mediterranean and Scotland, and consistently
higher in Central and Eastern Europe and Scandinavia
(Figure 6). ERA-Interim slightly underestimates interan-
nual variability, again likely linked to the area averaging.
The performance varies widely across stations. The RCM
adds moderate value (high in the Mediterranean), but
also spread. Simple additive MOS (RaiRat-M6) leaves
interannual variability unchanged. Variances of the daily
distribution are underestimated by ERA-Interim (see
J. M. Gutiérrez et al., 2017; personal communication).
The resulting correction by quantile mapping inates
interannual variability, in particular for the Mediterranean,
where it is overestimated by around 50%. MOS-REG
underestimates interannual variability, in particular when
driven with ERA-Interim, because it uses predictors
averaged over several grid-boxes.
All analog methods underestimate interannual vari-
ability, consistent with the results for short-term
persistence. The ANALOG-ANOM method searches
for continental-scale analogs within a one-month window
around the calendar day of interest – this likely restricts
the number of analogs and in turn also the represented
variability. Interestingly, most regression methods dra-
matically underestimate interannual variability. The worst
performing methods are those without a seasonal cycle
and non-standardized predictors (MLR-RAN), those
without temperature predictors (ESD-EOFSLP, ESD-SLP,
WT-WG) and those with white noise randomisation
(MLR-ASW, MLR-AAW, WT-WG, SWG). Note also
that both the ESD methods and the SWG method are
dened on continental-scale predictors, which may not be
suitable to capture local variations. Inated regression by
construction slightly increases the variance at interannual
scales. WGs do not model long-term variations and thus
underestimate interannual variability.
In addition to considering the variance at the interannual
scale, we also investigate the correlation between the
downscaled time series and observations at the interan-
nual scale. Prior to calculating correlations, the time series
are linearly detrended. This analysis provides additional
insight into the predictors required to explain longer-term
variations. These correlations can only be calculated when
simulated and observed time series are in synchrony.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
RaiRat−M6−E
RaiRat−M6−R
RaiRat−M7−E
RaiRat−M7−R
RaiRat−M8−E
RaiRat−M8−R
SB−E
SB−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
RaiRat−M9−E
RaiRat−M9−R
DBBC−E
DBBC−R
DBD−E
DBD−R
MOS−REG−E
MOS−REG−R
FIC02T−E
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
0.5
1.0
1.5
raw MOS PP
ERAINT−200
ERAINT−075
RaiRat−M6
RaiRat−M7
RaiRat−M8
SB
ISI−MIP
DBS
GPQM
EQM
EQMs
EQM−WT
QMm
QMBC−BJ−PR
CDFt
QM−DAP
EQM−WIC658
RaiRat−M9
DBBC
DBD
MOS−REG
FIC02T
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
0.0
0.2
0.4
0.6
0.8
1.0
British Isles Iberian Peninsula France Central Europe Scandinavia Alps Mediterranean Eastern Europe
Figure 6. As Figure 3, but for summer VarY (K2) (map/top) and Cor.1Y (no map/bottom) of Tmax . [Colour gure can be viewed at wileyonlinelibrary
.com].
The RCM develops its own internal variability and thus
reduces synchronicity. Therefore we have not shown
results for the RCM and RCM-driven MOS. Equivalently,
the unconditional weather generators are not in synchrony
with observations and hence not shown. Correlations
for ERA-Interim and essentially all deterministic MOS
methods are high. It is not clear to the authors why
CDFt and EQMWIC658 are so little synchronized – they
deterministically transform the ERA-Interim predictors
and should thus only marginally affect the temporal
sequence.
Also PP methods perform well in general. Exceptions
are the ANALOG-ANOM method, the ESD meth-
ods, the WT-WG and the SWG method. Recall that
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
raw MOS PP
ERAINT−200
ERAINT−075
RaiRat−M6
RaiRat−M7
RaiRat−M8
SB
ISI−MIP
DBS
GPQM
EQM
EQMs
EQM−WT
QMm
QMBC−BJ−PR
CDFt
QM−DAP
EQM−WIC658
RaiRat−M9
DBBC
DBD
MOS−REG
FIC02T
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
0.0
0.2
0.4
0.6
0.8
1.0
raw MOS PP
ERAINT−200
ERAINT−075
RaiRat−M6
RaiRat−M7
RaiRat−M8
SB
ISI−MIP
DBS
GPQM
EQM
EQMs
EQM−WT
QMm
QMBC−BJ−PR
CDFt
QM−DAP
EQM−WIC658
RaiRat−M9
DBBC
DBD
MOS−REG
FIC02T
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
0.0
0.2
0.4
0.6
0.8
1.0
British Isles Iberian Peninsula France Central Europe Scandinavia Alps Mediterranean Eastern Europe
Figure 7. As Figure 3, but for Cor.7Y and Tmax. Top: DJF; bottom: JJA. [Colour gure can be viewed at wileyonlinelibrary.com].
ANALOG-ANOM takes analogs from a 30-day win-
dow around the calendar day of interest – the identied
analogs might therefore have a rather strong mismatch
at the local scale and thus destroy synchronicity. Also,
analogs of this method are dened over the whole
European domain, which might result in additional dis-
crepancies at the local scales. The ESD methods, which
use either 2 m–temperature or sea level pressure as
predictor, perform worse compared to other regression
models; again, also the ESD method uses predictors
dened over the whole of Europe. The WT-WG and
SWG methods perform rather bad, likely because they
are based on white noise randomisation. The WT-WG
additionally only uses sea level pressure as predic-
tand, the SWG predictors are dened at the continental
scale.
To characterize decadal scale variations, we considered
correlations between simulated and observed time series
at the 7-year scale. The seasonal aggregated time series
are ltered with a 7-year Hamming lter. Correlations are
calculated on the ltered time series without any further
detrending. The choice of 7 years is a compromise between
the desired information about long time scales, and the
limited length of the time series. The effective number of
data points is thus low for each series (of the order of
5 per series), but still a coherent picture emerges when
investigating larger regions.
Figure 7 presents the results for summer (top panel)
and winter (bottom panel) daily maximum temperature.
The results are overall similar to those for interan-
nual variability. Correlations are in general slightly
lower during summer, in particular for ESD-SLP and
WT-WG (driven by sea level pressure only) for which
correlations are consistently negative. Correlations are
lower on the Iberian Peninsula, for winter for the whole
Mediterranean.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
Finally, we investigate the representation of long-term
temperature trends by the different methods. Figure 8 dis-
plays the results for winter daily maximum temperatures in
selected regions. Of course, no results for weather gener-
ators are shown, as these do not include any predictors or
change factors to represent long-term changes. Note that
in this experiment it is not relevant whether the trends are
statistically signicant, because long-term variations are
imprinted by the ERA-Interim predictors – the right pre-
dictor choice should therefore capture large-scale forced
trends. It is, however, relevant whether the simulated trends
are statistically distinguishable from the observed trends.
Thus, we calculated 95% condence intervals of the trend
estimates, marked as grey shading in the panels. As trends
differ very much across Europe, we calculated average
trends across the PRUDENCE regions. The variations of
trends within a region is indicated by whiskers; these
denote 1.96 times the variance of all trend estimates across
the region.
Observed winter trends are highest in Scandinavia and
lowest in the Mediterranean, which is consistent with
polar amplication. ERA-Interim performs mostly ne,
but overestimates trends in Central Europe, the Alps and
the Meditrerranean (but note that the underlying ECA-D
data are not homogenized, so a denite answer as to
which trends are more realistic is impossible). The RCM
underestimates trends in particular in Scandinavia, but
also in the Alps and the Mediterranean. These trends are
inherited by additive bias correction (RaiRat-M6), but
notably modied by many quantile mapping methods due
to ination of daily variances. Note that also the ISI-MIP
method, which is designed to preserve mean trends,
modies trends in some regions. These trend variations
are substantial, but within the range of uncertainty of
the observed trend estimates. The performance of PP
methods again depends mainly on the predictor choice.
Methods using only sea level pressure or temperature (but
not both; ESD-EOFSLP, ESD-SLP, ESD-T2, WT-WG)
tend to perform badly, although ltering of stations by
PCA appears to strongly increase the link with the tem-
perature predictor on decadal scales (ESD-EOFT2). The
ANALOG-ANOM, again, uses rather narrowly dened
analogs (continental scale, within 1 month), the SWG
method combines a white-noise stochastic approach
with continental-scale predictors. The best performing
methods (ANALOG-MP, ANALOG-SP, MO-GP, MLR,
MLR-WT) all include circulation predictors and 2 m
temperature. Note, however, that 2 m temperature is
likely not well simulated by GCMs (see the discussion in
Section 4).
Summer trends of daily maximum temperatures (see
File S1) are highest in Eastern Europe and the Alps.
ERA-Interim in general captures these trends, but under-
estimates them in the Alps and overestimates them in the
Mediterranean. The RCM underestimates summer trends
everywhere, in particular in the Alps where the simulated
trend is not consistent with the observations. The per-
formance of the statistical methods is similar to that for
winter.
3.2. Precipitation
3.2.1. Short-term variability
As a measure of persistence in precipitation, we consider
wet-wet and dry-wet transition probabilities (Figure 9).
Short-term persistence in precipitation amounts has not
been investigated. Winter wet-wet transition probabilities
(top left panel) are low in southern Europe and high
along the Atlantic coasts as well as in high mountains.
Winter dry-wet transition probabilities (top right panel)
are generally lower than wet-wet probabilities, with low
values in southern Europe.
Because it represents area average precipitation,
ERA-Interim overestimates wet-wet probabilities, in
particular when further averaged. Here the RCM adds
substantial value. MOS methods perform consistently
well. Interestingly, the simple rescaling by the method
RaiRat-M6 appears to perform on par with explicit wet day
corrections by quantile mapping (note that the BC method
only treats zero precipitation as dry). MOS-AN denes
analogs based on simulated large-scale precipitation
elds – these may not discriminate well between local
dry and wet days. MOS-GLM and VGLMGAMMA are
both stochastic methods with white noise randomisation
and consequently simulate too weak wet persistence. The
four-grid-box-averaging of the MOS-GLM input appears
to considerably improve the performance though. Yet dif-
culties in regression-based MOS techniques are evident
from the low performance of MOS-GLM when driven
with RCM data: the RCM strongly perturbs the local
day-to-day correspondence between observations and
simulation, which is required for a successful calibration.
The analog methods perform well for wet-wet
transitions, most deterministic regression mod-
els fail. In fact, simple linear regression models
(MLR-RAN/RSN/ASW/ASI) are by construction not
capable of simulating daily precipitation variability – still
the corresponding results are included for illustration
and comparison. Only the deterministic generalized
linear model (GLM) performs reasonably well. Most
stochastic methods with white noise randomisation
(GLM-WT, WT-WG, SWG) slightly underestimate
wet-day-persistence, in particular WT-WG, which uses
only sea level pressure, but no humidity predictors. The
stochastic GLM with predictors of the circulation as well
as temperature and specic humidity at cloud base is the
best-performing PP method. Interestingly, the structurally
similar GLM-P (at least for the occurrence process)
method with similar predictors performs substantially
worse. One reason might be that the former denes pre-
dictors at the synoptic scale, the latter at the grid-box scale.
For wet-day occurrence, vertical velocities are important
which can be determined from horizontal convergence
or divergence. Grid box pressure or velocities, however,
do not carry such information. Still, further analyses
comparing different predictor choices are required to fully
understand the performance of specic predictors.
Dry-wet transition probabilities are well represented
by ERA-Interim. The RCM has a slightly positive bias.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
raw MOS PP British Isles
observed
ERAINT−200
ERAINT−075
RACMO22E
RaiRat−M6−E
RaiRat−M6−R
RaiRat−M7−E
RaiRat−M7−R
RaiRat−M8−E
RaiRat−M8−R
SB−E
SB−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
RaiRat−M9−E
RaiRat−M9−R
DBBC−E
DBBC−R
DBD−E
DBD−R
MOS−REG−E
MOS−REG−R
FIC02T−E
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
−0.02
0.00
0.02
0.04
0.06
0.08
0.10
raw MOS PP Central Europe
observed
ERAINT−200
ERAINT−075
RACMO22E
RaiRat−M6−E
RaiRat−M6−R
RaiRat−M7−E
RaiRat−M7−R
RaiRat−M8−E
RaiRat−M8−R
SB−E
SB−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
RaiRat−M9−E
RaiRat−M9−R
DBBC−E
DBBC−R
DBD−E
DBD−R
MOS−REG−E
MOS−REG−R
FIC02T−E
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
0.00
0.05
0.10
raw MOS PP Scandinavia
observed
ERAINT−200
ERAINT−075
RACMO22E
RaiRat−M6−E
RaiRat−M6−R
RaiRat−M7−E
RaiRat−M7−R
RaiRat−M8−E
RaiRat−M8−R
SB−E
SB−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
RaiRat−M9−E
RaiRat−M9−R
DBBC−E
DBBC−R
DBD−E
DBD−R
MOS−REG−E
MOS−REG−R
FIC02T−E
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
0.00
0.05
0.10
0.15
raw MOS PP Alps
observed
ERAINT−200
ERAINT−075
RACMO22E
RaiRat−M6−E
RaiRat−M6−R
RaiRat−M7−E
RaiRat−M7−R
RaiRat−M8−E
RaiRat−M8−R
SB−E
SB−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
RaiRat−M9−E
RaiRat−M9−R
DBBC−E
DBBC−R
DBD−E
DBD−R
MOS−REG−E
MOS−REG−R
FIC02T−E
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
−0.02
0.00
0.02
0.04
0.06
0.08
0.10
0.12
raw MOS PP Mediterranean
observed
ERAINT−200
ERAINT−075
RACMO22E
RaiRat−M6−E
RaiRat−M6−R
RaiRat−M7−E
RaiRat−M7−R
RaiRat−M8−E
RaiRat−M8−R
SB−E
SB−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
RaiRat−M9−E
RaiRat−M9−R
DBBC−E
DBBC−R
DBD−E
DBD−R
MOS−REG−E
MOS−REG−R
FIC02T−E
FIC01T
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
MLR−T
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
MLR−AAN
MLR−AAI
MLR−AAW
MLR−PCA−ZTR
ESD−EOFSLP
ESD−EOFT2
ESD−SLP
ESD−T2
MLR
MLR−WT
WT−WG
SWG
−0.05
0.00
0.05
0.10
0.15
0.20
Figure 8. Trend [K] in DJF mean Tmax . Horizontal marks: region-averaged trend. Whiskers: 1.96 times the variance of all stations-trends within a
region. Grey shading: 95% condence interval of observed station-trend estimate, averaged across the selected region. [Colour gure can be viewed
at wileyonlinelibrary.com].
Surprisingly, however, MOS appears to reduce dry-wet
transitions (by wet day adjustments). Thereby it induces
a negative bias for ERA-Interim, but removes the pos-
itive RCM bias. Only for the UK, the positive RCM
bias is even increased by many methods. Stochastic
MOS (MOS-GLM, VGLMGAMMA) simulate too many
dry-wet transitions, but the averaging of simulated precip-
itation across grid-boxes seems to substantially improve
the problem (MOS-GLM-E vs VGLMGAMMA-E). The
performance of the different PP methods depends strongly
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
Ratyetal−M6−E
Ratyetal−M6−R
Ratyetal−M7−E
Ratyetal−M7−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
Ratyetal−M9−E
Ratyetal−M9−R
BC−E
BC−R
GQM−E
GQM−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
Ratyetal−M8−E
Ratyetal−M8−R
MOS−AN−E
MOS−AN−R
MOS−GLM−E
MOS−GLM−R
VGLMGAMMA−E
FIC02P−E
FIC04P−E
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
−0.2
−0.1
0.0
0.1
0.2
0.3
0.4
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
Ratyetal−M6−E
Ratyetal−M6−R
Ratyetal−M7−E
Ratyetal−M7−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
Ratyetal−M9−E
Ratyetal−M9−R
BC−E
BC−R
GQM−E
GQM−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
Ratyetal−M8−E
Ratyetal−M8−R
MOS−AN−E
MOS−AN−R
MOS−GLM−E
MOS−GLM−R
VGLMGAMMA−E
FIC02P−E
FIC04P−E
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
−0.10
−0.05
0.00
0.05
0.10
British Isles Iberian Peninsula France Central Europe Scandinavia Alps Mediterranean Eastern Europe
Figure 9. As Figure 3, but for winter WWProb (left/top) and DWProb (right/bottom). [Colour gure can be viewed at wileyonlinelibrary.com].
on both their structure and the chosen predictors. The
authors do not fully understand the differences in per-
formance of different implementations. The two best
performing methods are ANALOG-ANOM and GLM.
Both methods include circulation-based predictors (which
should indirectly give information about lifting) and, at
least indirectly, measures of relative humidity (dew point
temperature depression; specic humidity in combination
with temperature). Other methods, however, include
similar predictors, but perform worse. Recall, however,
that we only know the candidate predictors used for
calibration, not the nally selected predictors at the given
stations. The SS-WG and GOMEZ weather generators
slightly overestimate dry-wet transitions, even though
this aspect is explicitly calibrated. Recall that the MARFI
weather generator uses a wet-day threshold of 0.5 mm,
resulting in a strong overestimation of dry-wet transitions
when evaluated against a 1-mm threshold.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
3.2.2. Spells
The behaviour of mean spell lengths – as well as the
corresponding method performance – is closely tied to
that of transition probabilities (Figure 10). Mean winter
wet-spell lengths (top left) are high along the Atlantic
west coasts and mountain ranges, and short in Eastern
Europe and the Mediterranean. Summer dry spells (top
right) are short in Central and Northern Europe, and long
in the Mediterranean.
The representation of winter wet spells is very similar
to that of wet-wet probabilities (Figure 9). ERA-Interim
overestimates winter wet spells because of spatial aver-
aging (upper panel). The RCM adds substantial value
by reducing the ERA-Interim bias of too many wet-days
(J. M. Gutiérrez et al., 2017; personal communication) as
well as the bias in too high a wet-wet transition prob-
ability (see above). Almost all bias correction methods
perform very well, with a marginal improvement when
driven with an RCM. The MOS-AN, MOS-GLM and
VGLMGAMMA perform very similar as with regard to
short-term persistence, the averaging of predictors across
4 grid boxes in the stochastic methods (MOS-GLM-E
vs VGLMGAMMA-E) further increase skill. The per-
formance of the PP methods scatters widely, as already
for short-term persistence. The analog methods and the
advanced generalized linear models (GLM, GLM-WT,
WT-WG, SWG) perform well, but in particular all deter-
ministic regression models perform badly. The SS-WG
and GOMEZ weather generators only slightly underesti-
mate wet spell lengths.
Regarding summer dry spells, ERA-Interim simulates
too short spells. The RCM adds substantial value, likely
due to a reduction of the area-average-related drizzle
effect. MOS appears to increase the length of dry spells as a
consequence of the wet day correction. For ERA-Interim
this leads to unbiased results, whereas the RCM perfor-
mance is deteriorated towards too long dry spells. This
problem occurs in particular for quantile mapping meth-
ods, which are not seasonally trained (GQM, GPQM,
EQMs, EQM-WT†). Analog methods perform slightly bet-
ter for dry- than for wet spells, the GLM performs worse
than for wet spells, but still reasonably well. Weather gen-
erators perform slightly better for dry- than for wet spells.
Owing to the different wet-day threshold, the MARFI
weather generator is slightly more biased and has a much
higher spread across stations. In general, the length of dry
spells is overestimated in the Mediterranean and France.
3.2.3. Seasonality
Seasonality of precipitation is measured by the relative
amplitude (dened as the difference between precipita-
tion in the maximum and minimum of the seasonal cycle,
†For winter dry spells (not shown), conditioning on weather types
(EQM-WT) has the same effect as an explicit seasonal training (EQMs),
indicating that biases are circulation dependent and translate into
seasonally-dependent biases, because the frequency of weather types
changes throughout the year.
relative to the annual mean) and phase (dened as the posi-
tion of the maximum of the seasonal cycle). Although the
calculation is identical to that of the seasonal cycle of tem-
perature, some details will be relevant in particular for pre-
cipitation. In fact, the seasonal cycle of precipitation has
two peaks in many regions, sometimes even shoulders or
peaks that may be artefacts of sampling variability. Fol-
lowing Favre et al. (2016), we therefore lter the seasonal
cycle by four harmonics – this model is exible enough to
capture smooth – likely physical – variations, but at the
same time lters out residual noise (see Figure 2). The
amplitude of the seasonal cycle is simply dened as the
difference between maximum and minimum. For the phase
denition, further steps have been carried out. They are a
compromise between being simple and transparent, but at
the same time capturing the complex seasonal behaviour.
First, secondary peaks with an amplitude (dened as the
difference between the closest local minimum and the peak
itself) of less than 10% of the total amplitude have been
removed, as well as neighbouring peaks with a minimum
in between that is less than 10% of the total amplitude
lower than the mean height of the two peaks. The two
peaks are then replaced by a single peak by averaging
their height as well as phase. The rst step removes all
minor peaks, the second step removes dips in an over-
all broad maximum, which are both likely an artefact of
sampling variability. Visual inspection of observed sea-
sonality for all 86 stations corroborates that this deni-
tion conforms with expert judgement. We then record the
phase of the remaining highest and second highest peak
for observations and all simulations. The observed phase
is then dened as that of the highest peak. The simulated
phase is dened as the phase of that of the two highest
peaks, which is closest to the observed. The latter deni-
tion avoids that, if highest and second highest peak have
similar height and are swapped in the simulation, an arti-
cially large phase bias is calculated. Apart from this phase
denition we considered other measures for characterizing
the timing of the seasonal cycle, but rejected all other pos-
sibilities. We considered, e.g. correlations between sim-
ulated and observed seasonal cycle, but this measure is
difcult to interpret in terms of an actual mismatch in tim-
ing. Additionally, we also considered to calculate phases of
secondary peaks, but concluded that a plain and transpar-
ent presentation of performance across Europe would be
difcult.
Seasonality of precipitation (Figure 11) has a strong
north–south gradient, ranging from less than 50% of
annual mean precipitation in central-west Europe to
more than 200% in southern Spain and southern Greece.
The annual cycle peaks in winter along the Atlantic
and the Mediterranean, and in summer in Central and
eastern Europe and eastern Scandinavia. Reanalysis and
RCM underestimate the amplitude of the annual cycle,
although the RCM adds considerable value. MOS gen-
erally performs well, although methods without seasonal
training (GQM, GPQM, EQM, EQM-WT) overestimate
the relative amplitude by about 20%. Note, however, that
conditioning the correction on weather types (EQM-WT)
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
1
1.5
2
2.5
3
3.5
4
5
7
9
11
13
15
17
19
21
23
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
Ratyetal−M6−E
Ratyetal−M6−R
Ratyetal−M7−E
Ratyetal−M7−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
Ratyetal−M9−E
Ratyetal−M9−R
BC−E
BC−R
GQM−E
GQM−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
Ratyetal−M8−E
Ratyetal−M8−R
MOS−AN−E
MOS−AN−R
MOS−GLM−E
MOS−GLM−R
VGLMGAMMA−E
FIC02P−E
FIC04P−E
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
−2
−1
0
1
2
3
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
Ratyetal−M6−E
Ratyetal−M6−R
Ratyetal−M7−E
Ratyetal−M7−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
Ratyetal−M9−E
Ratyetal−M9−R
BC−E
BC−R
GQM−E
GQM−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
Ratyetal−M8−E
Ratyetal−M8−R
MOS−AN−E
MOS−AN−R
MOS−GLM−E
MOS−GLM−R
VGLMGAMMA−E
FIC02P−E
FIC04P−E
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
−3
−2
−1
0
1
British Isles Iberian Peninsula France Central Europe Scandinavia Alps Mediterranean Eastern Europe
Figure 10. As Figure 3, but for winter WetSpellMean (days) (left/top) and summer DrySpellMean (days) (right/bottom). [Colour gure can be viewed
at wileyonlinelibrary.com].
substantially reduces this bias. PP performance again
depends on the method-type, the treatment of seasonality,
and the choice of predictors. The analog methods perform
reasonably well, linear regression models all underrepre-
sent the relative amplitude (MLR-RAN/RSN/ASW/ASI).
The good performance of the GLM method indicates that a
sensible model structure and predictor choice (circulation
and humidity) may allow to capture the seasonal cycle
without an explicit model. The phase of the seasonal cycle
is well captured by most methods. The bad performance
of WT-WG indicates that sea level pressure alone does
not determine the seasonal cycle.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
0.5
0.7
0.9
1.1
1.3
1.5
1.7
1.9
2.1
1
2
3
4
5
6
7
8
9
10
11
12
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
Ratyetal−M6−E
Ratyetal−M6−R
Ratyetal−M7−E
Ratyetal−M7−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
Ratyetal−M9−E
Ratyetal−M9−R
BC−E
BC−R
GQM−E
GQM−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
Ratyetal−M8−E
Ratyetal−M8−R
MOS−AN−E
MOS−AN−R
MOS−GLM−E
MOS−GLM−R
VGLMGAMMA−E
FIC02P−E
FIC04P−E
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
0.2
0.4
0.6
0.8
1.0
1.2
1.4
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
Ratyetal−M6−E
Ratyetal−M6−R
Ratyetal−M7−E
Ratyetal−M7−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
Ratyetal−M9−E
Ratyetal−M9−R
BC−E
BC−R
GQM−E
GQM−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
Ratyetal−M8−E
Ratyetal−M8−R
MOS−AN−E
MOS−AN−R
MOS−GLM−E
MOS−GLM−R
VGLMGAMMA−E
FIC02P−E
FIC04P−E
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
−100
−50
0
50
British Isles Iberian Peninsula France Central Europe Scandinavia Alps Mediterranean Eastern Europe
Figure 11. As Figure 3, but for the relative amplitude (left/top) and phase (days) (right/bottom) of the annual cycle of precipitation. [Colour gure
can be viewed at wileyonlinelibrary.com].
3.2.4. Interannual variability and long-term trends
Interannual variability of precipitation varies unsys-
tematically in space (Figure 12). Values, however, tend
to be higher at higher elevations. As for temperature,
reanalysis data underrepresent interannual variabil-
ity, especially at low resolution. But in contrast to
temperature, the RCM succeeds in reducing the overall
bias, in particular over the Mediterranean. Deterministic
MOS methods suffer strongly from variance ina-
tion, which in cases doubles the interannual variance.
Regression-based MOS by contrast tends to underesti-
mate interannual variability, consistent with the driving
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
model. The performance of PP methods, again, varies
considerably. Note, however, that all well-performing
methods include not only circulation-based predictors,
but also measures of humidity (ANALOG-ANOM,
ANALOG, ANALOG-SP, GLM-det, GLM, GLM-WT).
Weather generators, as expected, underestimate inter-
annual variability – even more so for the MARFI
weather generator because of the different wet-day
threshold.
Interannual correlations are, as expected, lower for
precipitation than for temperature: only about 50% of the
local variability (∼0.72) seems to be explained by the area
average, the rest is due to local variability. Deterministic
MOS methods do not modify this correlation (again,
we cannot explain the performance of EQM-WIC658).
For the stochastic MOS methods, the value of averaging
simulated precipitation across neighbouring grid boxes is
evident (compare MOS-GLM-E and VGLMGAMMA-E).
All PP methods explain substantially less of the
interannual variability than the grid-box ERA-Interim.
The worst performing methods are ANALOG-ANOM
(analogs searched within 30-day window only, continental
scale predictors and analogs), MLR-ASW (Gaussian
white noise radomisation), WT-WG (stochastic, only
sea level pressure as predictors) and SWG (stochas-
tic, continental scale predictors). Note the substantial
difference between the – structurally similar – GLM and
SWG models. GLM denes predictors on a national scale,
SWG on a continental scale.
Seven-year correlations between simulations and
observations are similar to interannual correlations;
they are much higher though in winter than in summer
(see File S1).
Finally, we investigate the performance in representing
relative trends in seasonal mean precipitation. Figure 13
presents the results for summer and selected regions. All
observed trends are essentially zero and insignicant, with
moderately positive values in Central Europe. We never-
theless show the results to demonstrate the behaviour of
the different methods. ERA-Interim captures the observed
trends in some regions, but simulates a zero trend for
Central Europe, and a negative trend for the Alps. The
RCM simulates positive trends for the British Isles, Cen-
tral Europe, Scandinavia and the Alps, although all these
are within the range of sampling uncertainty. The MOS
methods tend to inate the wrong RCM trends, as well as
the wrong negative ERA-Interim trends in the Alps. Many
PP methods capture observed trends quite well, although
the performance changes substantially – and not for obvi-
ous reasons – from region to region. Identifying necessary
predictors appears to be much less straight forward than in
case of temperature trends.
4. Discussion and conclusions
We have systematically evaluated how different types of
downscaling and bias correction approaches represent
temporal aspects. These aspects comprise systematic
seasonal variations and residual temporal dependence
such as short-term persistence, spell length distributions
and interannual to long-term variability. Additionally, we
considered long-term trends, which are a superposition of
long-term internal climate variability and forced trends.
Our results complement, corroborate and extend earlier
ndings, in particular by Frost et al. (2011), Hu et al.
(2013), Benestad and Haugen (2007) and Huth et al.
(2015).
Overall, the behaviour of the different approaches turned
out to be as expected from their structure and imple-
mentation. For the interpretation of the results, it has
to be acknowledged whether a particular aspect of a
model is explicitly calibrated – a good performance is then
more or less trivial – or emerges from the model, e.g. by
well-chosen meteorological predictors.
A summary of the results (apart from correlations
and long-term trends) can be found in Figure 14. The
raw ERA-Interim data are typically biased compared to
observed station data, stronger so for the spatially aggre-
gated 2∘version. Note, however, that these discrepancies
are not necessarily bias in the sense of model errors, but
simply reect the scale-gap between area averages and
point values (Volosciuk et al., 2015). The chosen RCM
adds value to reanalysis data for most considered aspects,
for all seasons and for both temperature and precipitation.
Note, however, that we included just one RCM in our val-
idation study. One should be careful in generalizing these
results because RCMs may differ considerably in their
ability to reproduce temporal characteristics (Kotlarski
et al., 2014; Huth et al., 2015).
The MOS methods considered in this intercomparison
do not explicitly change the residual temporal dependence
(and it is questionable whether they should explicitly do
so, as such changes would destroy the temporal consis-
tency with the driving model). However, quantile mapping
approaches modifying the marginal distribution (including
wet day probabilities) do indirectly improve temporal vari-
ability. For temperature, some implementations slightly
improve short-term persistence, but in particular for pre-
cipitation, the representation of transition probabilities
as well as wet and dry spells is substantially improved.
Interestingly, dry-wet transitions and dry-spell lengths
are much better for the bias-corrected RCM than for
bias-corrected reanalyses, even though the added value of
the RCM for these indices was marginal only. Interannual
and long-term variability is typically inated by MOS.
Moderately for temperature, but substantially for precipi-
tation. These ndings corroborate earlier results of adverse
ination effects by quantile mapping (Maraun, 2013).
Long-term trends are inherited from the driving model,
but may be substantially deteriorated by further variance
ination. The annual cycle is improved by almost all
MOS methods – but recall that most methods are season-
ally trained. Conditioning on weather types (EQM-WT)
seems to be a successful – and physically more defensi-
ble – variant to better represent the annual cycle. In any
case, our results clearly show that – for many but not all
temporal aspects – dynamical downscaling prior to the
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
raw MOS PP WG
ERAINT−200
ERAINT−075
RACMO22E
Ratyetal−M6−E
Ratyetal−M6−R
Ratyetal−M7−E
Ratyetal−M7−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
Ratyetal−M9−E
Ratyetal−M9−R
BC−E
BC−R
GQM−E
GQM−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
Ratyetal−M8−E
Ratyetal−M8−R
MOS−AN−E
MOS−AN−R
MOS−GLM−E
MOS−GLM−R
VGLMGAMMA−E
FIC02P−E
FIC04P−E
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
SS−WG
MARFI−BASIC
MARFI−TAD
MARFI−M3
GOMEZ−BASIC
GOMEZ−TAD
0.0
0.5
1.0
1.5
2.0
2.5
3.0
raw MOS PP
ERAINT−200
ERAINT−075
Ratyetal−M6
Ratyetal−M7
ISI−MIP
DBS
Ratyetal−M9
BC
GQM
GPQM
EQM
EQMs
EQM−WT
QMm
QMBC−BJ−PR
CDFt
QM−DAP
EQM−WIC658
Ratyetal−M8
MOS−AN
MOS−GLM
VGLMGAMMA
FIC02P
FIC04P
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
0.0
0.2
0.4
0.6
0.8
1.0
British Isles Iberian Peninsula France Central Europe Scandinavia Alps Mediterranean Eastern Europe
Figure 12. As Figure 3, but for summer VarY (mm2) (map/top) and Cor.1Y (no map/bottom) of precipitation. [Colour gure can be viewed at
wileyonlinelibrary.com].
bias correction substantially improves the results com-
pared to a direct bias correction from the global model‡.
The reason of course is that the bias correction does not
‡Note in this context, that the ERA-Interim is an ‘ideal’ GCM in the
sense that it is forced to closely follow the observed large-scale weather.
improve the representation of meso-scale processes. Thus,
depending on the context, dynamical downscaling may be
advisable or even essential.
The performance of the participating PP methods varies
strongly from aspect to aspect and method to method. Ana-
log methods show difculties representing temperature
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
raw MOS PP British Isles
observed
ERAINT−200
ERAINT−075
RACMO22E
Ratyetal−M6−E
Ratyetal−M6−R
Ratyetal−M7−E
Ratyetal−M7−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
Ratyetal−M9−E
Ratyetal−M9−R
BC−E
BC−R
GQM−E
GQM−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
Ratyetal−M8−E
Ratyetal−M8−R
MOS−AN−E
MOS−AN−R
MOS−GLM−E
MOS−GLM−R
VGLMGAMMA−E
FIC02P−E
FIC04P−E
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
−0.01
0.00
0.01
0.02
0.03
raw MOS PP Central Europe
observed
ERAINT−200
ERAINT−075
RACMO22E
Ratyetal−M6−E
Ratyetal−M6−R
Ratyetal−M7−E
Ratyetal−M7−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
Ratyetal−M9−E
Ratyetal−M9−R
BC−E
BC−R
GQM−E
GQM−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
Ratyetal−M8−E
Ratyetal−M8−R
MOS−AN−E
MOS−AN−R
MOS−GLM−E
MOS−GLM−R
VGLMGAMMA−E
FIC02P−E
FIC04P−E
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
−0.015
−0.010
−0.005
0.000
0.005
0.010
0.015
0.020
raw MOS PP Scandinavia
observed
ERAINT−200
ERAINT−075
RACMO22E
Ratyetal−M6−E
Ratyetal−M6−R
Ratyetal−M7−E
Ratyetal−M7−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
Ratyetal−M9−E
Ratyetal−M9−R
BC−E
BC−R
GQM−E
GQM−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
Ratyetal−M8−E
Ratyetal−M8−R
MOS−AN−E
MOS−AN−R
MOS−GLM−E
MOS−GLM−R
VGLMGAMMA−E
FIC02P−E
FIC04P−E
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
−0.01
0.00
0.01
0.02
raw MOS PP Alps
observed
ERAINT−200
ERAINT−075
RACMO22E
Ratyetal−M6−E
Ratyetal−M6−R
Ratyetal−M7−E
Ratyetal−M7−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
Ratyetal−M9−E
Ratyetal−M9−R
BC−E
BC−R
GQM−E
GQM−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
Ratyetal−M8−E
Ratyetal−M8−R
MOS−AN−E
MOS−AN−R
MOS−GLM−E
MOS−GLM−R
VGLMGAMMA−E
FIC02P−E
FIC04P−E
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
−0.02
−0.01
0.00
0.01
raw MOS PP Mediterranean
observed
ERAINT−200
ERAINT−075
RACMO22E
Ratyetal−M6−E
Ratyetal−M6−R
Ratyetal−M7−E
Ratyetal−M7−R
ISI−MIP−E
ISI−MIP−R
DBS−E
DBS−R
Ratyetal−M9−E
Ratyetal−M9−R
BC−E
BC−R
GQM−E
GQM−R
GPQM−E
GPQM−R
EQM−E
EQM−R
EQMs−E
EQMs−R
EQM−WT−E
EQM−WT−R
QMm−E
QMm−R
QMBC−BJ−PR−E
QMBC−BJ−PR−R
CDFt−E
CDFt−R
QM−DAP−E
QM−DAP−R
EQM−WIC658−E
EQM−WIC658−R
Ratyetal−M8−E
Ratyetal−M8−R
MOS−AN−E
MOS−AN−R
MOS−GLM−E
MOS−GLM−R
VGLMGAMMA−E
FIC02P−E
FIC04P−E
FIC01P
FIC03P
ANALOG−ANOM
ANALOG
ANALOG−MP
ANALOG−SP
MO−GP
GLM−P
MLR−RAN
MLR−RSN
MLR−ASW
MLR−ASI
GLM−det
GLM
GLM−WT
WT−WG
SWG
−0.20
−0.15
−0.10
−0.05
0.00
0.05
Figure 13. As Figure 8, but for the relative trend in JJA mean precipitation. [Colour gure can be viewed at wileyonlinelibrary.com].
variability, but perform quite well for precipitation vari-
ability. Two reasons may contribute to the low performance
for temperature: rst, predictors describing circulation and
humidity have much stronger spatial– temporal variability
than temperature elds and therefore dominate the de-
nition of the analogs. Second, predictors and analogs are
often dened on large scales. Locally, differences between
actual weather and analogs may be substantial. Thus, even
if analogs may describe a smooth temperature evolution
at large scales, the resulting local sequence might be too
noisy.
Deterministic linear regression models perform fairly
well for temperature, but overestimate short-term per-
sistence and spell lengths. White noise randomisation
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
AC1
AC2
AC3
ColdSpellMean
VarY
AnnualCycleAmp
AnnualCyclePhase
GOMEZ−TAD
GOMEZ−BASIC
MARFI−M3
MARFI−TAD
MARFI−BASIC
SS−WG
SWG
WT−WG
MLR−WT
MLR
ESD−T2
ESD−SLP
ESD−EOFT2
ESD−EOFSLP
MLR−PCA−ZTR
MLR−AAW
MLR−AAI
MLR−AAN
MLR−ASI
MLR−ASW
MLR−RSN
MLR−RAN
MLR−T
MO−GP
ANALOG−SP
ANALOG−MP
ANALOG
ANALOG−ANOM
FIC01T
FIC02T−E
MOS−REG−R
MOS−REG−E
DBD−R
DBD−E
DBBC−R
DBBC−E
RaiRat−M9−R
RaiRat−M9−E
EQM−WIC658−R
EQM−WIC658−E
QM−DAP−R
QM−DAP−E
CDFt−R
CDFt−E
QMBC−BJ−PR−R
QMBC−BJ−PR−E
QMm−R
QMm−E
EQM−WT−R
EQM−WT−E
EQMs−R
EQMs−E
EQM−R
EQM−E
GPQM−R
GPQM−E
DBS−R
DBS−E
ISI−MIP−R
ISI−MIP−E
SB−R
SB−E
RaiRat−M8−R
RaiRat−M8−E
RaiRat−M7−R
RaiRat−M7−E
RaiRat−M6−R
RaiRat−M6−E
RACMO22E
ERAINT−075
ERAINT−200
WWProb
DWPro b
DrySpellMean
WetSpellMean
VarY
AnnualCycleRelAmp
AnnualCyclePhase
GOMEZ−TAD
GOMEZ−BASIC
MARFI−M3
MARFI−TAD
MARFI−BASIC
SS−WG
SWG
WT−WG
GLM−WT
GLM
GLM−det
MLR−ASI
MLR−ASW
MLR−RSN
MLR−RAN
GLM−P
MO−GP
ANALOG−SP
ANALOG−MP
ANALOG
ANALOG−ANOM
FIC03P
FIC01P
FIC04P−E
FIC02P−E
VGLMGAMMA−E
MOS−GLM−R
MOS−GLM−E
MOS−AN−R
MOS−AN−E
Ratyetal−M8−R
Ratyetal−M8−E
EQM−WIC658−R
EQM−WIC658−E
QM−DAP−R
QM−DAP−E
CDFt−R
CDFt−E
QMBC−BJ−PR−R
QMBC−BJ−PR−E
QMm−R
QMm−E
EQM−WT−R
EQM−WT−E
EQMs−R
EQMs−E
EQM−R
EQM−E
GPQM−R
GPQM−E
GQM−R
GQM−E
BC−R
BC−E
Ratyetal−M9−R
Ratyetal−M9−E
DBS−R
DBS−E
ISI−MIP−R
ISI−MIP−E
Ratyetal−M7−R
Ratyetal−M7−E
Ratyetal−M6−R
Ratyetal−M6−E
RACMO22E
ERAINT−075
ERAINT−200
Normalised (abs./rel.) Bias
−0.5 −0.33 −0.17 −0.09 +0.1 +0.2 +0.5 +1
−1 −0.5 −0.2 −0.1 +0.1 +0.2 +0.5 +1
Normalised (abs./rel.) Bias
−0.5 −0.33 −0.17 −0.09 +0.1 +0.2 +0.5 +1
−1 −0.5 −0.2 −0.1 +0.1 +0.2 +0.5 +1
Figure 14. Performance summary. Left: Tmin, right: precipitation. For each index either the performance for all four seasons is shown, or additionally
the performance for the whole year (separated by a dashed line), or – in case of the seasonal cycle – one for the whole year. Grey squares indicate
that no values have been calculated. For the scales used for normalization, see Appendix. [Colour gure can be viewed at wileyonlinelibrary.com].
deteriorates the representation of these aspects. Linear
regression models, in any variant, are far too simplistic
for precipitation downscaling. They strongly overestimate
wet-wet transitions and the length of wet spells, while
stochastic methods underestimate these aspects. Biases
for dry-wet transitions and dry-spell lengths tends to be
opposite to those for wet-wet transitions and wet-spell
lengths, but they are substantial for almost all PP meth-
ods. Only a stochastic generalized linear model with
suitable predictors has shown to perform well (GLM). A
structurally similar model (SWG) – with similar predictor
variables, but dened on the continental scale – performs
notably bad. The representation of the annual cycle
depends strongly on the individual method; whether or not
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
a method is seasonally trained plays a minor role – the
choice of reasonable predictors seems to be a key fac-
tor. For temperature, temperature-related predictors are
required; for precipitation, circulation and humidity based
predictors. There is evidence that biases in interannual
variability of temperature mainly depend on the method
type (again, analog methods and white noise randomiza-
tion underestimate internal variability), on the predictor
variables (all well-performing methods combine circula-
tion and temperature predictors) and the domain size (all
methods using continental-size predictor domains perform
badly). For precipitation, the inclusion of predictors that
represent both circulation and humidity appears to be
crucial. Long-term trends in temperature are captured
by models with surface temperature predictors (see the
critical discussion below), for precipitation no conclu-
sions can be drawn based on the available ensemble, and
the rather low signal-to-noise ratio. Overall, white-noise
randomization with continental-scale predictors turned
out to perform weakly. Apparently, the variance explained
by predictors at such large scales is rather low, such that
the residual white noise is too strong to retain the overall
temporal dependence.
Unconditional weather generators tend to perform well
for the aspects they have been calibrated for: they only
slightly underestimate short-term temperature persistence
and wet-wet transitions, but slightly overestimate dry-wet
transitions. Nevertheless also many non-calibrated aspects
are fairly well represented. Temperature spell lengths are
slightly underestimated, in particular for winter cold spells
and summer warm spells. Wet spell lengths are well repre-
sented, dry spell lengths underestimated. Only interannual
variability is substantially underrepresented. These effects
are well-known issues (Wilks and Wilby, 1999) and are
relevant also for decadal variability. Seasonality is, by con-
struction, well simulated.
Overall, the performance is similar in different sea-
sons – but recall that in particular most MOS methods
and all weather generators are calibrated to do so. These
explicit seasonal models, however, may be questioned for
being used in a future climate: seasonally varying biases
indicate that seasonal biases may also change differently
on long time scales.
Our ndings highlight a series of open research ques-
tions, and the need for a range of improvements. MOS
methods perform overall very well. Some key issues,
however, remain to be addressed: the ination (or poten-
tially deation) of interannual and long-term variability
and trends is of course directly tied to the simplicity of
quantile mapping compared to MOS methods in weather
forecasting and the PP methods presented here: whereas
the latter express physical relationships between large and
local scales at least rudimentarily as regression models and
thereby can distinguish between forced and local inter-
nal variability, quantile mapping adjusts only long-term
distributions of daily values without any physical basis.
This calibration is especially problematic when a scale
gap between predictand and predictor is to be bridged
(Maraun, 2013). The reason for the calibration, of course,
is that regression models cannot easily be calibrated in
a free running climate model, which is not in synchrony
with observations Maraun et al. (2010). More research is
needed to understand the link between biases in short- and
long-term variability. Some methods have been developed
to separate variability on different scales, and to adjust
them independently, other methods have been developed
to preserve climate model trends to various degrees (Li
et al., 2010; Haerter et al., 2011; Hempel et al., 2013;
Pierce et al., 2015). The physical assumptions underlying
these different methods need to be better understood. In
any case, our results show that any bias correction relies
on climate models that simulate realistic trends. In case
of downscaling to a ner resolution, it might be useful
to separate the bias correction from the downscaling, i.e.
apply a correction against gridded observational data, and
then implement a stochastic downscaling model against
point data (Volosciuk et al., 2017). Regression-based
MOS methods have been presented as further alternatives
(MOS-REG/GLM, VGLMGAMMA), but these cannot
be calibrated to standard climate model simulations. The
results show that even typical RCM hindcast simulations
(where the RCM is driven with a reanalysis, MOS-REG-R
and MOS-GLM-R) are not sufciently synchronous to
ensure a successful calibration. A way out might be
to condition bias correction on weather types, such as
demonstrated by EQM-WT.
Various research strands are possible and necessary to
better understand and to improve PP methods. For analog
methods, in particular in case of temperature, a way
forward could be based on dening the analogs not on a
single day, but rather on a sequence of days (e.g. Beersma
and Buishand, 2003). Such approaches, however, require
long time series. Note, however, that analog methods
cannot represent substantial climatic changes, where no
analogs might be available to sample from Gutiérrez et al.
(2013). An obvious improvement of regression models is
a better representation of residual variability – in linear
models for temperature, and generalized linear models
for precipitation. Here, conditional weather generators
are promising that extend the white noise randomization
(both for temperature and precipitation) by a Markov
component. For instance, one may include not only mete-
orological predictors, but also simulated predictand values
from previous days as predictors (Chandler and Wheater,
2002; Yang et al., 2005).
The crucial questions regarding the PP approach are,
however, not an improvement in model structure, but a
better understanding of predictor choice. Unfortunately,
the available model ensemble did not allow for a strin-
gent identication of suitable predictors. Nevertheless,
the results highlight a couple of issues. Note that these
are questions of physics more than of statistics. First,
what is a suitable domain size? The GLM-P and GLM
methods include a structurally similar rainfall occurrence
process and a – at rst sight – similar set of predictors.
But the GLM method performs far better than GLM-P
in simulating all occurrence-related aspects. A major
difference between the two implementations is that GLM
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
uses synoptic scale predictors, whereas GLM-P relies on
grid-box predictors. Precipitation occurrence is controlled
by relative humidity and vertical velocity. The latter
is typically represented by predictors of the horizontal
circulation. The underlying reasoning is that horizontal
divergence and convergence determines vertical descent
and ascent. Convergence and divergence, in turn, may
be implicit in large-scale pressure elds, but they are
not represented by grid-box pressure values. Thus, the
choice of predictor variables depends on the domain size.
Many methods with limited performance, in particular for
temperature, were based on continental-scale predictors.
Thus, there is evidence that such predictor domains are
simply too large to successfully represent local variability.
Here one has to trade-off between downscaling across
large areas and precision at local scales. In fact, we see
the main strength of PP methods not in competing with
RCMs across whole continents, but rather in providing
tailored region-specic projections.
Second, which predictors are required for representing
long-term trends? We demonstrated that model perfor-
mance for the same set of predictors differed substan-
tially for short-term persistence and long-term changes.
The reason of course is that downscaling methods are cal-
ibrated to day-to-day-variability, but are intended to work
on long-term variability (Huth et al., 2015). For temper-
ature, a combination of temperature and circulation pre-
dictors appeared to fairly well explain long-term trends.
Precipitation, however, is a more complex nonlinear pro-
cess, and no method convincingly captured trends in all
considered regions. A further complicating issue is the low
signal-to-noise ratio: all trends, and all misrepresentations,
are still within the sampling uncertainty.
Weather generators do have an explicit model of the
short-term temporal dependence, but those variants par-
ticipating in this inter-comparison did not include any
meteorological predictors. As a result, these methods
underestimated long-term variability – it was not explic-
itly modelled. Also here improvements are possible,
e.g. by conditioning the weather generator on monthly
aggregates (being generated by the separate monthly WG
or taken from the driving data – e.g. GCM, RCM or
reanalysis) to improve the representation of interannual
variability (Dubrovský et al., 2004).
This study was based on a perfect predictor setting to
isolate downscaling skill. Therefore, we did not investigate
the performance with imperfect predictors or boundary
conditions from free running GCMs. Downscaling meth-
ods – apart from unconditional weather generators – to
a large extent inherit the errors in representing tempo-
ral variability of the driving models (Hall, 2014). The
downscaling performance may, therefore, drop consid-
erably, when driven by imperfect forcing from a GCM.
For MOS, the issue is rather subtle: marginal biases in
present climate are by construction removed, hence it is
difcult to identify fundamental GCM errors such as the
misrepresentation of the large-scale circulation and its
temporal structure. Thus, also non-calibrated aspects, in
particular the temporal aspects, should thus be evaluated.
For PP one typically assumes that large-scale predictors
from the free atmosphere full the PP assumption. This
assumption should be tested for GCMs. Again, evaluating
temporal aspects might be more informative than eval-
uating marginal aspects – often, predictors are based on
anomalies, such that mean biases are implicitly removed.
But even more, many PP predictors are not dened at
large scales, and not chosen from the free atmosphere. For
instance, those methods that best represented temperature
trends all relied on 2 m–temperature. In the reanalysis,
which has been used as predictors, temperature obser-
vations have been assimilated into the model, such that
grid-box variability and long-term are likely correctly
represented in data rich regions. Local surface feedbacks
that modulate temperature variability are thus implic-
itly accounted for. But a free running GCM will likely
not correctly represent these feedbacks, such that GCM
simulated 2 m temperature will likely not full the PP
assumption. Similar arguments apply for grid box values
of, e.g. 10 m winds.
Even though we investigated the performance to repre-
sent observed trends, we can only draw limited conclu-
sions about representing future trends. MOS relies on cred-
ibly simulated grid box trends – the ERA-Interim trends
are approximately correct by construction, the RCM show
substantial deciencies. But also for PP methods, our nd-
ings are far from being conclusive. For temperature, as
discussed before, the PP assumption for relevant predictors
may not be fullled. For precipitation, simply no conclu-
sions are possible because of the low signal-to-noise ratio.
In any case, a method performing badly with perfect pre-
dictors will not perform better with imperfect predictors.
Passing this evaluation is therefore a necessary, but not a
sufcient requirement for a method to be applicable under
climate change conditions.
This discussion shows that further studies are required
to establish the skill of downscaling under simulated future
conditions. The VALUE community is planning additional
experiments Maraun et al. (2015): GCM predictor exper-
iments to assess the performance under imperfect predic-
tors, and pseudo-reality experiments to establish statistical
downscaling skill in simulated future climates. Addition-
ally, we have identied a range of open questions that can
be addressed within our perfect predictor experiment, in
particular related to the predictor choice of PP methods.
The metadata and complete results for individual meth-
ods are available from the VALUE portal www.value-cost
.eu/validationportal. They can be downloaded and further
analysed. Additionally, we encourage dedicated sensitivity
studies based on the ensemble at hand.
Acknowledgements
VALUE has been funded as EU COST Action ES1102.
Participation of M. Dubrovsk and R. Huth in VALUE
was supported by the Ministry of Education, Youth, and
Sports of the Czech Republic under contracts LD12029
and LD12059, respectively.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
Appendix
Performance summary
Similar to the portrait diagram in Sillmann et al. (2013),
Figure 14 summarizes the performance of the different
methods for different indices in one (colour-coded) value.
To make these comparable across methods and indices, a
reference scale has to be dened. This scale cannot simply
be measured in terms of the best and worst performing
methods for an index, as such a scale would only mea-
sure relative performance, not absolute performance. For
instance, one would not be able to distinguish an index that
is well represented from one that is poorly represented by
all methods. Sillmann et al. (2013) dene the variability of
an index in space as reference scale. But this scale cannot
be applied to a single series, and it cannot distinguish
between indices that are well modelled by all methods
across space (e.g. the seasonal cycle) and indices that
are badly modelled (e.g. interannual variability). Thus,
we attempt to dene natural scales for different types of
indices:
•For biases in mean temperature, we dene twice the
standard deviation of daily variability as scale. For
Gaussian distributed variables, this range spans roughly
95% of the probability mass.
•For biases of temperature indices, which may be
expressed as anomalies (such as the 20-year return
value or the amplitude of the seasonal cycle), we chose
the actual modulus of the anomaly (i.e. the difference of
the return value and mean temperature, or the amplitude
itself) as reference scale.
•For relative biases, which assume only positive values
(such as for temperature variance, precipitation intensity
or mean spell length), a natural scale is the observed
value itself.
•For the phase of the seasonal scale we (somewhat arbi-
trarily) dene 1 month as a reference scale.
Supporting information
The following supporting information is available as part
of the online article:
File S1.
References
Alexandru A, de Elia R, Laprise R. 2007. Internal variability in regional
climate downscaling at the seasonal scale. Mon. Weather Rev. 135(9):
3221– 3238. https://doi.org/10.1175/MWR3456.1.
Bartholy J, Pongrácz R, Kis A. 2015. Projected changes of extreme
precipitation using multi-model approach. Q. J. Hungarian Meteorol.
Serv. 119: 129–142.
Bedia J, Iturbide M, Herrera S, Manzanas R, Gutiérrez J. 2016. down-
scaler: climate data manipulation, bias correction and statistical
downscaling. http://github.com/SantanderMetGroup/downscaleR/
wiki (accessed 1 August 2017).
Beersma JJ, Buishand TA. 2003. Multi-site simulation of daily precipita-
tion and temperature conditional on the atmospheric circulation. Clim.
Res. 25: 121– 133.
Benestad RE, Haugen JE. 2007. On complex extremes: ood hazards and
combined high spring-time precipitation and temperature in Norway.
Clim. Change 85(3-4): 381– 406.
Benestad RE, Chen D, Mezghani A, Fan L, Parding K. 2015a. On using
principal components to represent stations in empirical-statistical
downscaling. Tellus A 67(1): 28326.
Benestad R, Mezghani A, Parding K. 2015b. esd: climate analysis and
empirical-statistical downscaling (ESD) package for monthy and daily
data. http://rcg.gvc.gu.se/edu/esd.pdf (accessed 1 August 2017).
Bukovsky MS. 2012. Temperature trends in the NARCCAP regional
climate models. J. Clim. 24: 3985– 3991.
Bürger G, Murdock TQ, Werner AT, Sobie SR, Cannon AJ. 2012.
Downscaling extremes – an intercomparison of multiple statistical
methods for present climate. J. Clim. 25(12): 4366– 4388.
Calanca P. 2007. Climate change and drought occurrence in the alpine
region: how severe are becoming the extremes? Glob. Planet. Change
57(1-2): 151– 160.
Cannon AJ. 2016. Multivariate bias correction of climate model output:
matching marginal distributions and intervariable dependence struc-
ture. J. Clim. 29(19): 7045– 7064.
Ceppi P, Scherrer SC, Fischer AM, Appenzeller C. 2012. Revisit-
ing Swiss temperature trends 1959– 2008. Int. J. Climatol. 32(2):
203– 213.
Chandler RE, Wheater HS. 2002. Analysis of rainfall variability using
generalized linear models: a case study from the west of Ireland. Wat.
Resour. Res. 38(10): 1192.
Charles SP, Bates BC, Hughes JP. 1999. A spatiotemporal model for
downscaling precipitation occurrence and amounts. J. Geophys. Res.
104(D24): 31657– 31669.
Christensen JH, Christensen OB. 2007. A summary of the PRUDENCE
model projections of changes in European climate by the end of this
century. Clim. Change 81(S1): 7– 30.
Dee DP, Uppala SM, Simmons AJ, Berrisford P, Poli P, Kobayashi S,
Andrae U, Balmaseda MA, Balsamo G, Bauer P, Bechtold P, Beeljars
ACM, van den Berg L, Bidlot J, Bormann N, Delsol C, Dragani R,
Fuentes M, Geer AJ, Haimberger L, Healy SB, Hersbach H, Hólm
EV, Isaksen L, Kållberg P, Köhler M, Matricardi M, McNally AP,
Monge-Sanz BM, Morcrette J-J, Park B-K, Peubey C, de Rosnay P,
Tavolato C, Thépaut J-N, Vitart F. 2011. The ERA-interim reanalysis:
conguration and performance of the data assimilation system. Q. J.
R. Meteorol. Soc. 137(656): 553–597.
Dubrovský M, Buchtele J, Žalud Z. 2004. High-frequency and
low-frequency variability in stochastic daily weather generator and its
effect on agricultural and hydrologic modelling. Clim. Change 63(1):
145– 179.
Favre A, Philippon N, Pohl B, Kalognomou E-A, Lennard C, Hewitson
B, Nikulin G, Dosio A, Panitz H-J, Cerezo-Mota R. 2016. Spatial
distribution of precipitation annual cycles over South Africa in 10
CORDEX regional climate model present-day simulations. Clim. Dyn.
46(5-6): 1799– 1818.
Fischer EM, Seneviratne SI, Vidale PL, Lüthi D, Schär C. 2007. Soil
moisture-atmosphere interactions during the 2003 European summer
heat wave. J. Clim. 20(20): 5081–5099.
Fowler HJ, Blenkinsop S, Tebaldi C. 2007. Linking climate change mod-
elling to impacts studies: recent advances in downscaling techniques
for hydrological modelling. Int. J. Climatol. 27(12): 1547– 1578.
Frei C, Christensen JH, Deque M, Jacob D, Jones RG, Vidale PL. 2003.
Daily precipitation statistics in regional climate models: evaluation
and intercomparison for the European alps. J. Geophys. Res. Atmos.
108(D3): 4124.
Froidevaux P, Schwanbeck J, Weingartner R, Chevalier C, Martius O.
2015. Flood triggering in Switzerland: the role of daily to monthly
preceding precipitation. Hydrol. Earth Syst. Sci. 19(9): 3903–3924.
Frost AJ, Charles SP, Timbal B, Chiew FHS, Mehrotra R, Nguyen
KC, Chandler RE, McGregor JL, Fu G, Kirono DGC et al. 2011. A
comparison of multi-site daily rainfall downscaling techniques under
Australian conditions. J. Hydrol. 408: 1):1–1)18.
Giorgi F, Bi X, Pal J. 2004. Mean, interannual variability and trends
in a regional climate change experiment over Europe. I. Present-day
climate (19611990). Clim. Dyn. 22: 733–756.
Goodess CM, Anagnostopoulou C, Bárdossy A, Frei C, Harpham C,
Haylock MR, Hundecha Y, Maheras P, Ribalaygua J, Schmidli J,
Schmith T, Tolika K, Tomozeiu R, Wilby RL. 2010. An intercom-
parison of statistical downscaling methods for Europe and European
regions assessing their performance with respect to extreme weather
events and the implications for climate change applications. Final
Project Report, Climatic Research Unit, University of East Anglia,
Norwich, UK.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
EVALUATION OF TEMPORAL VARIABILITY
Gutiérrez JM, San-Martín D, Brands S, Manzanas R, Herrera S. 2013.
Reassessing statistical downscaling techniques for their robust appli-
cation under climate change conditions. J. Clim. 26(1): 171– 188.
Gutmann E, Pruitt T, Clark MP, Brekke L, Arnold JR, Raff DA, Ras-
mussen RM. 2014. An intercomparison of statistical downscaling
methods used for water resource assessments in the United States.
Water Resour. Res. 50(9): 7167–7186.
Haerter JO, Hagemann S, Moseley C, Piani C. 2011. Climate model bias
correction and the role of timescales. Hydrol. Earth Syst. Sci. 15(3):
1065– 1079.
Hall A. 2014. Projecting regional change. Science 346(6216):
1461– 1462.
Hall A, Qu X, Neelin JD. 2008. Improving predictions of summer climate
change in the united states. Geophys. Res. Lett. 35(1): L01702.
Haylock MR, Gawley GC, Harpham C, Wilby RL, Goodess CM. 2006.
Downscaling heavy precipitation over the United Kingdom: a compar-
ison of dynamical and statistical methods and their future scenarios.
Int. J. Climatol. 26(10): 1397– 1415.
Hempel S, Frieler K, Warszawski L, Schewe J, Piontek F. 2013. A
trend-preserving bias correction – the ISI-MIP approach. Earth Syst.
Dyn. 4(2): 219– 236.
Hertig E, Jacobeit J. 2008. Assessments of Mediterranean precipitation
changes for the 21st century using statistical downscaling techniques.
Int. J. Climatol. 28(8): 1025– 1045.
Hu Y, Maskey S, Uhlenbrook S. 2013. Downscaling daily precipitation
over the yellow river source region in china: a comparison of three
statistical downscaling methods. Theor. Appl. Climatol. 112(3-4):
447– 460.
Huth R. 2002. Statistical downscaling of daily temperature in central
Europe. J. Clim. 15(13): 1731– 1742.
Huth R, Kyselý J, Dubrovský M. 2001. Time structure of observed,
GCM-simulated, downscaled, and stochastically generated daily tem-
perature series. J. Clim. 14(20): 4047– 4061.
Huth R, Miksovsky J, Stepanek P, Belda M, Farda A, Chladova Z,
Pisoft P. 2015. Comparative validation of statistical and dynamical
downscaling models on a dense grid in central Europe: temperature.
Theor. Appl. Climatol. 120(3–4): 533– 553. https://doi.org/10.1007/
s00704-014-1190-3.
Jacob D, Bärring L, Christensen OB, Christensen JH, de Castro M,
Déqué M, Giorgi F, Hagemann S, Hirschi M, Jones R, Kjellström E,
Lenderink G, Rockel B, Sánchez E, Schär C, Seneviratne SI, Somot S,
van Ulden A, van den Hurk B. 2007. An inter-comparison of regional
climate models for Europe: model performance in present-day climate.
Clim. Change 81(S1): 31– 52.
Kalognomou E-A, Lennard C, Shongwe M, Pinto I, Favre A, Kent
M, Hewitson B, Dosio A, Nikulin G, Panitz H-J, Büchner M. 2013.
A diagnostic evaluation of precipitation in CORDEX models over
southern Africa. J. Clim. 26(23): 9477– 9506. https://doi.org/10.1175/
JCLI-D-12-00703.1.
Keller D, Fischer AM, Frei C, Liniger MA, Appenzeller C, Knutti R.
2015. Implementation and validation of a Wilks-type multi-site daily
precipitation generator over a typical alpine river catchment. Hydrol.
Earth Syst. Sci. 19(5): 2163–2177.
Keller DE, Fischer AM, Liniger MA, Appenzeller C, Knutti R. 2017.
Testing a weather generator for downscaling climate change projec-
tions over Switzerland. Int. J. Climatol. 37(2): 928–942.
Kilsby CG, Jones PD, Burton A, Ford AC, Fowler HJ, Harpham C, James
P, Smith A, Wilby RL. 2007. A daily weather generator for use in
climate change studies. Environ. Model. Softw. 22(12): 1705– 1719.
Klein Tank AMG, Wijngaard JB, Können GP, Böhm R, Demarée G,
Gocheva A, Mileta M, Pashiardis S, Hejkrlik L, Kern-Hansen C, Heino
R, Bessemoulin P, Müller-Westermeier G, Tzanakou M, Szalai S,
Pálsdóttir T, Fitzgerald D, Rubin S, Capaldo M, Maugeri M, Leitass
A, Bukantis A, Aberfeld R, van Engelen AFV, Forland E, Mietus
M, Coelho F, Mares C, Razuvaev V, Nieplova E, Cegnar T, López
JA, Dahlström B, Moberg A, Kirchhofer W, Ceylan A, Pachaliuk O,
Alexander LV, Petrovic P. 2002. Daily dataset of 20th-century surface
air temperature and precipitation series for the European climate
assessment. Int. J. Climatol. 22(12): 1441– 1453.
Kotlarski S, Keuler K, Christensen OB, Colette A, Déqué M, Gobiet A,
Goergen K, Jacob D, Lüthi D, van Meijgaard E, Nikulin G, Schär
C, Teichmann C, Vautard R, Warrach-Sagi K, Wulfmeyer V. 2014.
Regional climate modelling on European scales: a joint standard
evaluation of the EURO-CORDEX RCM ensemble. Geosci. Model.
Dev. Discuss. 7(1): 217–293.
Lhotka O, Kyselý J. 2015. Spatial and temporal characteristics of heat
waves over Central Europe in an ensemble of regional climate model
simulations. Clim. Dyn. 45(9-10): 2351– 2366.
Li H, Shefeld J, Wood EF. 2010. Bias correction of monthly precipita-
tion and temperature elds from intergovernmental panel on climate
change AR4 models using equidistant quantile matching. J. Geophys.
Res. 115: D10101.
Lorenz P, Jacob D. 2010. Validation of temperature trends in the
ENSEMBLES regional climate model runs driven by ERA40. Clim.
Res. 44(2-3): 167– 177.
Maraun D. 2013. Bias correction, quantile mapping and downscaling:
revisiting the ination issue. J. Clim. 26(6): 2137–2143.
Maraun D. 2016. Bias correcting climate change simulations – a critical
review. Curr. Clim. Change Rep. 2(4): 211– 220. https://doi.org/10
.1007/s40641-016-0050-x.
Maraun D, Widmann M. 2018. Statistical Downscaling and Bias Correc-
tion for Climate Research. Cambridge University Press: Cambridge,
UK.
Maraun D, Wetterhall F, Ireson AM, Chandler RE, Kendon EJ, Wid-
mann M, Brienen S, Rust HW, Sauter T, Themeßl M, Venema VKC,
Chun KP, Goodess CM, Jones RG, Onof C, Vrac M, Thiele-Eich I.
2010. Precipitation downscaling under climate change, Recent devel-
opments to bridge the gap between dynamical models and the end user.
Rev. Geophys. 48: RG3003.
Maraun D, Widmann M, Gutierrez JM, Kotlarski S, Chandler RE, Hertig
E, Wibig J, Huth R, Wilcke RAI. 2015. VALUE: a framework to
validate downscaling approaches for climate change studies. Earth’s
Future 3(1): 1– 14.
Martynov A, Laprise R, Sushama L, Winger K, Separovic L, Dugas
B. 2013. Reanalysis-driven climate simulation over CORDEX North
America domain using the Canadian regional climate model, version
5: model performance evaluation. Clim. Dyn. 41: 29733005.
van Meijgaard E, van Ulft LH, van de Berg WJ, Bosveld FC, van den
Hurk BJJM, Lenderink G, Siebesma AP. 2008. The KNMI regional
atmospheric climate model RACMO version 2.1. Technical Report
302, Royal Dutch Meteorological Institute, De Bilt, The Netherlands.
Moberg A, Jones PD. 2004. Regional climate model simulations of daily
maximum and minimum near-surface temperatures across Europe
compared with observed station data 1961-1990. Clim. Dyn. 23(7-8):
695– 715.
Monjo R, Chust G, Caselles V. 2014. Probabilistic correction of RCM
precipitation in the Basque Country (northern Spain). Theor. Appl.
Climatol. 117(1-2): 317– 329.
Obled C, Bontron G, Garçon R. 2002. Quantitative precipitation fore-
casts: a statistical adaptation of model outputs through an analogues
sorting approach. Atmos. Res. 63(3): 303– 324.
Pielke RA, Wilby RL. 2012. Regional climate downscaling: What’s the
point? Eos 93(5): 52– 53.
Pierce DW, Cayan DR, Maurer EP, Abatzoglou JT, Hegewisch KC.
2015. Improved bias correction techniques for hydrological simula-
tions of climate change. J. Hydrometeorol. 16(6): 2421–2442.
Pongrácz R, Bartholy J, Kis A. 2014. Estimation of future precipitation
conditions for Hungary with special focus on dry periods. Id˝
ojárás
118(4): 305– 321.
Räisänen J, Räty O. 2013. Projections of daily mean temperature vari-
ability in the future: cross-validation tests with ENSEMBLES regional
climate simulations. Clim. Dyn. 41(5-6): 1553– 1568.
Rajczak J, Kotlarski S, Schär C. 2016. Does quantile mapping of sim-
ulated precipitation correct for biases in transition probabilities and
spell lengths? J. Clim. 29(5): 1605– 1615.
Räty O, Räisänen J, Ylhäisi JS. 2014. Evaluation of delta change
and bias correction methods for future daily precipitation: inter-
model cross-validation using ENSEMBLES simulations. Clim. Dyn.
42(9-10): 2287– 2303.
Raynaud D, Hingray B, Zin I, Anquetin S, Debionne S, Vautard R. 2017.
Atmospheric analogues for physically consistent scenarios of surface
weather in Europe and Maghreb. Int. J. Climatol. 37(4): 2160– 2176.
Richardson CW. 1981. Stochastic simulation of daily precipitation,
temperature, and solar radiation. Water Resour. Res. 17(1): 182–190.
Rosenzweig C, Iglesias A, Yang XB, Epstein PR, Chivian E. 2001.
Climate change and extreme weather events: implications for food
production, plant diseases, and pests. Glob. Change Human Health
2(2): 90– 104.
Rosenzweig C, Solecki WD, Hammer SA, Mehrotra S (eds). 2011.
Climate Change and Cities: First Assessment Report of the Urban
Climate Change Research Network. Cambridge University Press:
Cambridge, UK.
Rummukainen M. 2010. State-of-the-art with regional climate models.
WIRES Clim. Change 1(1): 82– 96. https://doi.org/10.1002/wcc.8.
San-Martín D, Manzanas R, Brands S, Herrera S, Gutiérrez JM.
2017. Reassessing model uncertainty for regional projections of
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
D. MARAUN et al.
precipitation with an ensemble of statistical downscaling methods.
J. Clim. 30(1): 203– 223.
Schär C, Lüthi D, Beyerle U, Heise E. 1999. The soil precipitation
feedback: a process study with a regional climate model. J. Clim
12(3): 722– 741. https://doi.org/10.1175/1520-0442(1999)0120722:
TSPFAP2.0.CO;2.
Schindler A, Maraun D, Luterbacher J. 2007. Validation of the present
day annual cycle in heavy precipitation over the British Islands simu-
latedby14RCMs.J. Geophys. Res. 117: D18107.
Schmidli J, Goodess CM, Frei C, Haylock MR, Hundecha Y, Ribal-
aygua J, Schmith T. 2007. Statistical and dynamical downscaling of
precipitation: an evaluation and comparison of scenarios for the Euro-
pean alps. J. Geophys. Res. Atmos. 112(D4): D04105.
Semenov MA, Brooks RJ, Barrow EM, Richardson CW. 1998. Compar-
ison of the WGEN and LARS-WG stochastic weather generators for
diverse climates. Clim. Res. 10(2): 95– 107.
Semenza JC, Rubin CH, Falter KH, Selanikio JD, Flanders WD, Howe
HL, Wilhelm JL. 1996. Heat-related deaths during the July 1995 heat
wave in Chicago. N. Engl. J. Med. 335(2): 84–90.
Seneviratne S, Lüthi D, Litschi M, Schär C. 2006. Land-atmosphere
coupling and climate change in Europe. Nature 443(7108):
205– 209.
Sillmann J, Kharin VV, Zhang X, Zwiers FW, Bronaugh D. 2013.
Climate extremes indices in the CMIP5 multimodel ensemble: part
1. Model evaluation in the present climate. J. Geophys. Res. 118(4):
1716– 1733.
Soares PMM, Cardoso RM, Miranda PMA, Viterbo P, Belo-Pereira M.
2012. Assessment of the ENSEMBLES regional climate models in the
representation of precipitation variability and extremes over Portugal.
J. Geophys. Res. 117(D7): D071114.
Štˇ
epánek P, Zahradníˇ
cek P, Farda A, Skalák P, Trnka M, Meitner J,
Rajdl K. 2016. Projection of drought-inducing climate conditions in
the czech republic according to euro-cordex models. Clim. Res. 70(2):
179– 193.
Stoll S, Hendricks Franssen HJ, Butts M, Kinzelbach W. 2011. Analysis
of the impact of climate change on groundwater related hydrological
uxes: a multi-model approach including different downscaling meth-
ods. Hydrol. Earth Syst. Sci. 15(1): 21–38.
von Storch H. 1999. On the use of “ination” in statistical downscaling.
J. Clim. 12(12): 3505– 3506.
Turco M, Quintana-Seguí P, Llasat MC, Herrera S, Gutiérrez JM. 2011.
Testing MOS precipitation downscaling for ENSEMBLES regional
climate models over Spain. J. Geophys. Res. 116(D18): D18109.
Turco M, Llasat MC, Herrera S, Gutiérrez JM. 2017. Bias correction
and downscaling of future RCM precipitation projections using a
MOS-analog technique. J. Geophys. Res. 122(5): 2631– 2648.
Vaittinada Ayar P, Vrac M, Bastin S, Carreau J, Déqué M, Gallardo
C. 2016. Intercomparison of statistical and dynamical downscaling
models under the EURO-and MED-CORDEX initiative framework:
present climate evaluations. Clim. Dyn. 46(3-4): 1301–1329.
Vautard R, Gobiet A, Jacob D, Belda M, Colette A, Déqué M, Fernández
J, García-Díez M, Goergen K, Güttler I, Halenka T, Karacostas T,
Katragkou E, Keuler K, Kotlarski S, Mayer S, van Meijgaard E,
Nikulin G, Patarcic M, Scinocca J, Sobolowski S, Suklitsch M, Teich-
mann C, Warrach-Sagi K, Wulfmeyer V, Yiou P. 2013. The simula-
tion of European heat waves from an ensemble of regional climate
models within the EURO-CORDEX project. Clim. Dyn. 41(9-10):
2555– 2575.
Volosciuk C, Maraun D, Semenov VA, Park W. 2015. Extreme precipita-
tion in an atmosphere general circulation model: impact of horizontal
and vertical model resolutions. J. Clim. 28(3): 1184– 1205.
Volosciuk C, Maraun D, Vrac M, Widmann M. 2017. A combined sta-
tistical bias correction and stochastic downscaling method for precip-
itation. Hydrol. Earth Syst. Sci. 21(3): 1693–1719.
Vrac M, Friederichs P. 2015. Multivariate-intervariable, spatial, and
temporal-bias correction. J. Clim. 28(1): 218– 237.
Vrac M, Drobinski P, Merlo A, Herrmann M, Lavaysee C, Li L, Somot
S. 2012. Dynamical and statistical downscaling of the French Mediter-
ranean climate: uncertainty assessment. Nat. Hazard Earth Syst. Sci.
12(9): 2769– 2784.
Warrach-Sagi K, Schwitalla T, Wulfmeyer V, Bauer H-S. 2013. Evalu-
ation of a climate simulation in Europe based on the WRF–NOAH
model system: precipitation in Germany. Clim. Dyn. 41(3-4):
755– 774. https://doi.org/10.1007/ s00382-013-1727-7.
Wilby RL, Wigley TML, Conway D, Jones PD, Hewitson BC, Main
J, Wilks DS. 1998. Statistical downscaling of general circulation
model output: a comparison of methods. Water Resour. Res. 34(11):
2995– 3008.
Wilcke RAI, Mendlik T, Gobiet A. 2013. Multi-variable error correction
of regional climate models. Clim. Change 120(4): 871– 887.
Wilks DS. 2010. Use of stochastic weathergenerators for precipitation
downscaling. WIRES Clim. Change 1(6): 898– 907.
Wilks DS, Wilby RL. 1999. The weather generation game: a review of
stochastic weather models. Prog. Phys. Geogr. 23(3): 329–357.
Wong G, Maraun D, Vrac M, Widmann M, Eden J, Kent T. 2014.
Stochastic model output statistics for bias correcting and downscaling
precipitation including extremes. J. Clim. 27(18): 6940– 6959.
Yang C, Chandler RE, Isham VS. 2005. Spatial-temporal rainfall sim-
ulation using generalized linear models. Water Resour. Res. 41(11):
W11415.
Yang W, Andréasson J, Graham LP, Olsson J, Rosberg J, Wetterhall
F. 2010. Distribution-based scaling to improve usability of regional
climate model projections for hydrological climate change impacts
studies. Hydrol. Res. 41(3-4): 211–229.
Yang W, Gardelin M, Olsson J, Bosshard T. 2015. Multi-variable bias
correction: application of forest re risk in present and future climate
in Sweden. Nat. Hazards Earth Syst. Sci. 15(9): 2037– 2057.
Zerenner T, Venema V, Friederichs P, Simmer C. 2016. Downscaling
near-surface atmospheric elds with multi-objective genetic program-
ming. Environ. Model. Softw. 84: 85– 98.
© 2017 Royal Meteorological Society Int. J. Climatol. (2017)
A preview of this full-text is provided by Wiley.
Content available from International Journal of Climatology
This content is subject to copyright. Terms and conditions apply.