ArticlePDF Available

Ensemble hindcasts of ENSO events over the past 120 years using a large number of ensembles

Authors:

Abstract and Figures

Based on an intermediate coupled model (ICM), a probabilistic ensemble prediction system (EPS) has been developed. The ensemble Kalman filter (EnKF) data assimilation approach is used for generating the initial ensemble conditions, and a linear, first-order Markov-Chain SST anomaly error model is embedded into the EPS to provide model-error perturbations. In this study, we perform ENSO retrospective forecasts over the 120 year period 1886–2005 using the EPS with 100 ensemble members and with initial conditions obtained by only assimilating historic SST anomaly observations. By examining the retrospective ensemble forecasts and available observations, the verification results show that the skill of the ensemble mean of the EPS is greater than that of a single deterministic forecast using the same ICM, with a distinct improvement of both the correlation and root mean square (RMS) error between the ensemble-mean hindcast and the deterministic scheme over the 12-month prediction period. The RMS error of the ensemble mean is almost 0.2°C smaller than that of the deterministic forecast at a lead time of 12 months. The probabilistic skill of the EPS is also high with the predicted ensemble following the SST observations well, and the areas under the relative operating characteristic (ROC) curves for three different ENSO states (warm events, cold events, and neutral events) are all above 0.55 out to 12 months lead time. However, both deterministic and probabilistic prediction skills of the EPS show an interdecadal variation. For the deterministic skill, there is high skill in the late 19th century and in the middle-late 20th century (which includes some artificial skill due to the model training period), and low skill during the period from 1906 to 1961. For probabilistic skill, for the three different ENSO states, there is still a similar interdecadal variation of ENSO probabilistic predictability during the period 1886–2005. There is high skill in the late 19th century from 1886 to 1905, and a decline to a minimum of skill around 1910–50s, beyond which skill rebounds and increases with time until the 2000s.
Content may be subject to copyright.
ADVANCES IN ATMOSPHERIC SCIENCES, VOL. 26, NO. 2, 2009, 359–372
Ensemble Hindcasts of ENSO Events over the Past 120 Years
Using a Large Number of Ensembles
ZHENG Fei1(
Ü
ì
), ZHU Jiang2(
ý
), WANG Hui3(
), and Rong-Hua ZHANG4
1International Center for Climate and Environment Science (ICCES),
Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029
2State Key Laboratory of Atmospheric Boundary Layer Physics and Atmospheric Chemistry (LAPC),
Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029
3National Meteorological Center, Beijing 100081
4Earth System Science Interdisciplinary Center (ESSIC), University of Maryland, College Park, Maryland, USA
(Received 20 January 2008; revised 2 November 2008)
ABSTRACT
Based on an intermediate coupled model (ICM), a probabilistic ensemble prediction system (EPS) has
been developed. The ensemble Kalman filter (EnKF) data assimilation approach is used for generating the
initial ensemble conditions, and a linear, first-order Markov-Chain SST anomaly error model is embedded
into the EPS to provide model-error perturbations. In this study, we perform ENSO retrospective forecasts
over the 120 year period 1886–2005 using the EPS with 100 ensemble members and with initial conditions
obtained by only assimilating historic SST anomaly observations.
By examining the retrospective ensemble forecasts and available observations, the verification results
show that the skill of the ensemble mean of the EPS is greater than that of a single deterministic forecast
using the same ICM, with a distinct improvement of both the correlation and root mean square (RMS) error
between the ensemble-mean hindcast and the deterministic scheme over the 12-month prediction period.
The RMS error of the ensemble mean is almost 0.2C smaller than that of the deterministic forecast at a
lead time of 12 months. The probabilistic skill of the EPS is also high with the predicted ensemble following
the SST observations well, and the areas under the relative operating characteristic (ROC) curves for three
different ENSO states (warm events, cold events, and neutral events) are all above 0.55 out to 12 months
lead time.
However, both deterministic and probabilistic prediction skills of the EPS show an interdecadal variation.
For the deterministic skill, there is high skill in the late 19th century and in the middle-late 20th century
(which includes some artificial skill due to the model training period), and low skill during the period from
1906 to 1961. For probabilistic skill, for the three different ENSO states, there is still a similar interdecadal
variation of ENSO probabilistic predictability during the period 1886–2005. There is high skill in the late
19th century from 1886 to 1905, and a decline to a minimum of skill around 1910–50s, beyond which skill
rebounds and increases with time until the 2000s.
Key words: ENSO, ensemble prediction system, interdecadal predictability, hindcast
Citation: Zheng, F., J. Zhu, H. Wang, and R.-H. Zhang, 2009: Ensemble hindcasts of ENSO events over the
past 120 years using a large number of ensembles. Adv. At mos. Sc i.,26(2), 359–372, doi: 10.1007/s00376-
009-0359-7.
1. Introduction
Based on the intermediate coupled model (ICM)
(Keenlyside and Kleeman, 2002; Zhang et al., 2005), a
probabilistic EPS was developed. It has been demon-
strated that this system can be improved for El Ni˜no
simulations and predictions through the use of the en-
semble Kalman filter (EnKF; e.g., Evensen, 2003) data
Corresponding author: ZHU Jiang, jzhu@mail.iap.ac.cn
360 ENSEMBLE ENSO HINDCAST OVER PAST 120 YEARS VOL. 26
Fig. 1. Horizontal distributions of the normalized obser-
vation error (a) and the model initial uncertainty (b) of
SST in January 1998. The model initial uncertainty is
estimated from the EnKF analysis spread. The contour
interval is 0.05C in (a) and 0.02Cin(b).
assimilation approach for generating the initial ensem-
ble conditions, as well as a linear, first-order Markov-
Chain SST anomaly error model that was embedded
into the ensemble prediction system (EPS) to provide
model error perturbations (Zheng et al., 2006a). How-
ever, the model performance was only verified over a
relatively short period with relative small number of
events.
As pointed out by Chen et al. (2004), previous esti-
mations of El Ni˜no’s predictability (e.g., Goswami and
Shukla, 1991; Kirtman and Schopf, 1998; Latif et al.,
1998) were mostly based on retrospective predictions
for the last two or three decades (i.e., the hindcast pe-
riod encompassed a relatively small number of events).
With so few degrees of freedom and short hindcast pe-
riods, the statistical significance of those estimates is
questionable. From available SST observations, Chen
et al. (2004) used the Lamont ENSO prediction model
to perform a retrospective forecast experiment of 148
years from 1856 to 2003, and found that ENSO pre-
dictability clearly had interdecadal variations. This
was, to date, the first work that studied ENSO pre-
dictability by extending realistic forecasts to a pe-
riod over 100 years. Also, Tang et al. (2008) com-
pared ENSO predictabilities using three different mod-
els by performing 120-year retrospective forecasts, and
confirmed the interdecadal variations in ENSO pre-
dictability were not model dependent.
However, the ENSO predictability in these models
was only verified in the deterministic sense. Indeed, as
considered in classic theories, ENSO should be viewed
as a chaotic or irregular interannual fluctuation in the
tropical Pacific (e.g., Tziperman et al., 1994). So we
need to discuss the ENSO predictability in not only
a deterministic sense but also in a probabilistic sense.
With a realistic ENSO EPS, and newly-developed SST
assimilation approaches (Zheng et al., 2006a), we re-
cently completed a long-term retrospective ensemble
forecast from 1886 to 2005 with 100 members, and an-
alyzed the ENSO predictability and its variations in
both a deterministic and probabilistic sense.
This paper is structured as follows: Section 2 de-
scribes the components of the EPS, and the historic
SST data in detail. Section 3 examines the determin-
istic and probabilistic prediction skills of the EPS for
the whole period from 1886 to 2005. In section 4,
the interdecadal variations of the ensemble prediction
skills in the ENSO EPS are examined in both the de-
terministic and probabilistic sense. A summary and
discussion are given in section 5.
2. Ensemble prediction system components
and dataset
2.1 Basic deterministic model
Our ensemble prediction system mainly contains
three components. The EPS is firstly based on a
deterministic model, and the basic intermediate cou-
pled model was developed by Keenlyside and Klee-
man (2002) and Zhang et al. (2003). Its dynamical
component consists of both linear and non-linear com-
ponents. The former was essentially a McCreary-type
(McCreary, 1981) modal model, but was extended to
include a horizontally-varying background stratifica-
tion. In addition, ten baroclinic modes, along with a
parameterization of the local Ekman-driven upwelling,
were included. A SST anomaly model was embedded
within this dynamical framework to simulate the evo-
lution of the mixed-layer temperature anomalies. As
demonstrated by Zhang et al. (2005), having a realistic
parameterization for the temperature of the subsurface
water entrained into the mixed-layer (Te)iscrucialto
the performance of SST simulations in the equatorial
Pacific. An empirical Temodel was constructed from
historical data and was demonstrated to be effective in
improving the SST simulations. The ocean model was
coupled with a statistical atmospheric model, which
specifically relates wind stress (τ) to SST anomaly
fields. The two empirical models (the Temodel and
the atmospheric model) were constructed based on the
historic observations during the period 1963–96 (34
yr of data). All coupled-model components exchange
NO. 2 ZHENG ET AL. 361
simulated anomaly fields. Information concerning the
interactions between the atmosphere (τ) and the ocean
(SST) was exchanged once a day.
2.2 Initial ensemble condition
Based on the ICM, a probabilistic EPS was de-
veloped by Zheng et al. (2006a). The initial ensem-
ble conditions of the EPS were provided by the EnKF
(e.g., Evensen, 2003, 2004) data assimilation approach
through assimilating SST anomaly data into the model
with 100 ensemble members (Zheng et al., 2006a). Fig-
ure 1 shows an example of horizontal distributions of
the normalized observation error and the model ini-
tial uncertainty of SST at the initial time of January
1998. The distribution of the model uncertainty has
the same shape as that of the normalized observation
error. Thus, each initial ensemble member after assim-
ilation represents an equally realistic initial condition.
At the same time, the initial ensemble state variables
are dynamically balanced within the model after a se-
ries of assimilation cycles. Thus, this ensemble initial-
ization approach not only can generate accurate and
dynamically consistent initial ensemble members, but
also can provide reasonable surface initial stochastic
uncertainties for the EPS by combining both back-
ground and observation errors during the assimilation
cycles (Zheng and Zhu, 2008).
2.3 Stochastic model-error perturbation
As described by Zheng et al. (2006a), due to simu-
lation deficiencies for coupled air-sea interactions and
subsurface thermal effects in the SST anomaly model,
a linear, first-order Markov stochastic model is em-
bedded within the SST anomaly model of the ICM to
represent the model uncertainties of forecasted SST
anomaly fields. This perturbation method was veri-
fied to be capable of effectively simulating the time
evolution of model uncertainties during the ensemble
forecasting procedure (Zheng et al., 2007). Here, we
make further refinements and extensions to the model
error perturbation scheme by carefully analyzing the
forecast errors (408 samples of the observation-minus-
forecast values for 12-month lead time from 1963 to
1996, covering the same analysis period as the train-
ing period of the deterministic model) for the different
lead times by an empirical orthogonal function (EOF)
method, instead of the formulation used in Zheng et al.
(2006a). After doing these, the time evolution of the
model errors at different lead times can be represented
as,
Q(t)
j=
M
i=1
λ(t)
i×Ψi,j +ξ(t)
j
λ(t)
i=αi,j ×λ(t1)
i+1α2
i,j ×v(t)
i,
(1)
where Ψi,j represents the spatial pattern of the ith
EOF mode for the (SST anomaly) model error Qat
lead time of jmonth, and which is a constant hori-
zontal distribution for each mode. λ(t)
irepresents the
random normalized time coefficient of the ith mode at
time t, the coefficient αi,j is the time correlation of the
stochastic forcing for the ith mode at lead time of j
month, v(t)
iis a random number of the ith mode at
time t, with a mean equal to 0 and variance equal to
1, and the correlations between the random vector of
each mode should be zero to allow the maintenance of
the orthogonality of each mode. Therefore, this equa-
tion ensures that the variance in λ(t)
iis equal to 1 as
long as the variance of λ(t1)
iis also equal to 1. The
subscripts iand jindicate the EOF mode number and
the lead time respectively, the number Mis the num-
ber of the EOF modes used in the stochastic model,
and ξ(t)
jrepresents a residual random field for Qat
time tthat is obtained by taking out the first MEOF
modes from the observation-minus-forecast values.
There are two advantages that should be addressed
here for this simplified representation of the model er-
rors. First, there is no longer any need to calculate
the spatial correlation scales of the model errors as in
Zheng et al. (2006a) at each grid point through per-
turbing the time coefficients with only the constant
spatial patterns for each mode. Second, the temporal
correlation coefficients α, for each mode in Eq. (1) for
the stochastic model can be easily obtained by calcu-
lating the lagged correlations of the series of the time
coefficients from EOF analysis results.
This model-error analysis was performed by com-
paring the SST anomalies’ twelve-month observation-
minus-forecast values. The model errors were com-
puted from 408 samples over a 34-year period (co-
inciding with the model training period) starting in
1963 and extending until 1996, without considering
the errors inherent within the initial conditions. The
forecast initialization scheme was a nudging assimila-
tion scheme, which was used to minimize the initial
errors here (Zheng et al., 2006b). The details of the
analysis process for estimating the model errors are
as follows. Firstly, to obtain the approximate “per-
fect” initial fields, the observed SST anomaly data
were nudged into the model at every time step and at
each grid point, and this nudging process was started
each month from December 1962 to November 1996
with a reasonable nudging intensity [i.e., 0.50 follow-
ing Zheng et al. (2006b)] and 12-month nudging time
length. Then, twelve-month forecasts were initialized
from the nudging results each month during the 34-
yr period from 1963 to 1996. Thirdly, twelve-month
observation-minus-forecast values of the SST anoma-
lies during this 34-yr period (408 samples) were ob-
362 ENSEMBLE ENSO HINDCAST OVER PAST 120 YEARS VOL. 26
Fig. 2. Spatial patterns of the first mode (left column), second mode (middle column), and their associated
normalized time coefficients (right column) for the SST anomaly model errors at 3-, 6-, 9-, and 12-month
lead times. The contour interval is 0.2Cforthefirstmodeand0.1
C for the second mode.
tained as model errors. Finally, the properties of the
model errors, such as spatial patterns and their asso-
ciated temporal variations, were analyzed through the
EOF method.
Figure 2 shows the spatial patterns [i.e., Ψ in Eq.
(1)] of the first and second EOF modes for the SST
anomaly model errors at lead times of 3, 6, 9, and
12 months, and the associated time series. The spatial
structure indicates the regions, which are not predicted
well by the model. For the first mode, model uncer-
tainties are mainly located over the eastern equatorial
Pacific, and extend into the central basin with longer
lead times. In contrast to the first mode, the model
uncertainties of the second mode are mainly located
over the eastern costal regions and the central equa-
torial Pacific. And the proportion of the first mode
in total covariance increases from 37.1% to 71.8% in
the 12-month model-error analysis results, while the
proportion of the second mode decreases to 8.3% at
12-month lead. These results indicate that the first
several modes can explain and describe the variations
of the model errors in the tropical Pacific, and the
contributions of the first mode dominate the model-
error simulations, especially at longer leads. The tem-
poral correlation coefficients [i.e., αin Eq. (1)] for
each mode were obtained by calculating the one-month
lagged correlations for each EOF time-series. Table 1
presents the temporal correlation coefficients that are
used in the stochastic model. The temporal correla-
tion coefficients of each mode increase with increasing
lead time, which indicates a decreased randomness in
the expansion time coefficients, and the temporal cor-
relation coefficient αof the first mode exceeds 0.95 at
12-month lead time. Thus, the variations of the ma-
jor modes in the model-error model are allowed to be
more random at short lead times, but with more stable
and bias-correction like properties at longer lead times
(e.g., Evensen, 2003).
After carefully building up a reasonable model-
error model, we can use Eqs. (1) and (2) to provide
a simple representation of a non-linear model, by em-
bedding the above model-error system within the dy-
NO. 2 ZHENG ET AL. 363
Table 1. Time-correlated coefficients of the stochastic model for the first ten modes from one-month to twelve-month
lead times.
Lead time EOF mode
(months) 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
1 0.695 0.603 0.604 0.397 0.408 0.416 0.274 0.403 0.403 0.253
2 0.799 0.840 0.703 0.721 0.689 0.747 0.630 0.714 0.621 0.651
3 0.858 0.863 0.767 0.745 0.754 0.780 0.659 0.732 0.666 0.693
4 0.879 0.875 0.796 0.746 0.787 0.784 0.670 0.696 0.753 0.698
5 0.891 0.877 0.816 0.750 0.795 0.782 0.688 0.686 0.783 0.691
6 0.902 0.888 0.804 0.768 0.792 0.779 0.697 0.699 0.784 0.704
7 0.919 0.888 0.797 0.784 0.774 0.780 0.712 0.700 0.780 0.740
8 0.928 0.879 0.809 0.802 0.756 0.776 0.718 0.694 0.785 0.740
9 0.935 0.877 0.810 0.818 0.737 0.785 0.726 0.703 0.781 0.753
10 0.941 0.875 0.817 0.835 0.714 0.795 0.728 0.704 0.783 0.768
11 0.948 0.872 0.828 0.849 0.710 0.790 0.732 0.704 0.781 0.770
12 0.954 0.868 0.836 0.859 0.720 0.791 0.728 0.714 0.775 0.779
namical model to simulate the time evolutions of the
model errors during the ensemble forecast process:
ψt=f(ψt1)+Qt(2)
where ψtrepresents the model state at time t,andfis
the non-linear model operator. In order to achieve rea-
sonable amplitudes, the first ten EOF modes were re-
tained in simulating the random model errors, and the
simulated random model errors of SST anomalies were
generated and added into the physical model daily.
2.4 Dataset
The data used in this study is the monthly ex-
tended global SST (ERSST) dataset from 1854 to 2006
reconstructed by Smith and Reynolds (2004), with 2
horizontal resolution. Due to the relatively poor qual-
ity of the dataset prior to 1880, the observed SST
anomalies before 1880 lack annual and seasonal vari-
ations (Smith et al., 2008), so the initial conditions
can not trigger real annual oscillations and seasonal
variations of the predicted signals (Tang et al., 2008).
Thus we focus on the period from 1886 to 2005 in this
study, and the data domain is configured as the tropi-
cal Pacific Ocean. A very important task in ENSO pre-
dictions is to optimize the oceanic initial conditions,
and the assimilation of subsurface in-situ observations
and satellite altimetry can significantly improve model
skills (e.g., Tang and Hsieh, 2003; Zheng et al., 2007).
However, the oceanic satellite altimetry and subsur-
face observation records are too short for our study.
The only way solution is to only assimilate SST to ini-
tialize forecasts. Zheng et al. (2006a) used the EnKF
data assimilation system to provide an initial condition
ensemble for the ICM with 100 members. And this
SST-only assimilation approach has been verified to
be able to provide dynamically balanced initial fields
and significantly improve El Ni˜no predictions. In this
study, only the observed monthly SST anomaly fields
from Smith and Reynolds (2004) are assimilated into
the ICM with the EnKF once a month. These ob-
servational data are also used for verifying the model
predictions.
3. Retrospective forecast experiments
The retrospective forecast (or hindcast) experi-
ments covering the period 1886–2005 are made and
compared to available observations. A 12-month hind-
cast is initialized each month during this 120-yr period.
For each initial month, an ensemble of 100 hindcasts
is run, yielding a total of 144000 retrospective fore-
casts to be verified. Figure 3 directly shows the pre-
dicted ensemble mean of the Ni˜no-3.4 (5S–5N, 120
170W) SST anomalies and the prediction spread at
6-month lead time from 1886 to 2005. The variability
of the ensemble mean follows the Ni˜no-3.4 observations
quite well. Apart from a few exceptions the ensemble
forecasts can encompass the observations. This in-
dicates that the EPS is able to predict most of the
warm and cold events that occurred in past 120 years
at 6-month lead time, especially the relatively large El
Ni˜no and La Ni˜na events. The skill of the hindcasts is
examined from both a deterministic and a probabilis-
tic perspective. The skill estimation in this section is
based on the full hindcast period, 1886–2005, which
corresponds to a total of 144000 members.
3.1 Deterministic prediction skill
Firstly, to check the deterministic predictability
of the EPS for the large events, Fig. 4 shows long
lead time deterministic retrospective forecast results
for four of the largest warm episodes (as measured
by the peak Ni˜no-3.4 SST anomalies) of the past 120
years. In all cases, the EPS is able to predict the ob-
364 ENSEMBLE ENSO HINDCAST OVER PAST 120 YEARS VOL. 26
Fig. 3. Time series of observed and forecasted Ni˜no-3.4 SST anomalies at 6-month
lead time. The dashed line represents the observed SST anomalies, the solid line
represents the ensemble mean, and the shaded area represents the prediction spread.
Fig. 4. Four of the largest El Ni˜nos since 1886. The thick black curves are observed
Ni˜no-3.4 SST anomalies, and the thin curves of red, green, blue and purple are en-
semble mean predictions started respectively 12, 9, 6, and 3 months before the peak
of each El Ni˜no.
NO. 2 ZHENG ET AL. 365
Fig. 5. Anomaly correlation (top) and RMS error (bot-
tom) of the Ni˜no-3.4 SST anomalies for the model en-
semble mean hindcast (solid line with closed circle), the
deterministic hindcast (solid line with open circle), and
persistence (dot–dashed line) are shown as functions of
lead time.
served strong El Ni˜no events twelve months in ad-
vance, although some errors still exist in the forecasted
onset and development, and in the magnitude of these
events. The implication is that the signal components
present in initial fields play a critical role in deter-
mining ENSO prediction skills (e.g., Peng and Kumar,
2005; Moore et al., 2006; Zheng et al., 2009).
Figure 5 shows the anomaly correlation and root
mean square (RMS) error between observed and pre-
dicted average SST anomalies over the tropical Pa-
cific Ocean Ni˜no-3.4 region as a function of lead time.
To compare with the original deterministic predic-
tion skill, we also perform a prediction experiment
whose initialization procedure is briefly described here,
wherein the wind stress anomalies reconstructed from
observed SST anomalies via a singular value decom-
position (SVD) based model are used to integrate the
ocean model over the whole forecast period to generate
initial conditions for the dynamical component, and
the SST anomaly model initial conditions are taken
as the observed SST anomalies (Zhang et al., 2003).
The skill scores for the ensemble mean hindcast are
better than that of the original deterministic forecast
scheme; both of hindcasts schemes have particularly
high skill at short lead times and beat persistence for
all lead times with a correlation coefficient of greater
than 0.94 for the first month. Beyond 4-month lead
time, there is a distinct difference of RMS errors be-
tween the ensemble mean hindcast and the original
scheme. The RMS error of the ensemble mean re-
mains smaller than 0.94C over the 12-month predic-
tion period, and is almost 0.2C smaller than that of
the original deterministic forecast scheme at a lead
time of 12 months. Over the whole period, this im-
provement occurs because the advanced assimilation
method can provide more dynamically consistent and
accurate initial conditions than the original initializa-
tion method, and the ensemble mean can remove some
unpredictable stochastic information.
3.2 Probabilistic prediction skill
We use Talagrand diagrams (also known as rank
histograms) to evaluate whether the hindcast and
the verifying observation are sampled from the same
probability distribution (e.g., Talagrand et al., 1998;
Hamill, 2001). The Talagrand diagrams are generated
by ordering at each grid point the forecast values from
each of the ensemble members from smallest to largest.
For our full ensemble, with 100 members, this creates
101 intervals, and the value of the verifying observa-
tion then falls into one of the 101 categories. Figure
6 shows the Talagrand diagram for the SST anoma-
lies over the Ni˜no-3.4 region, and is a diagram of the
frequencies as a function of the category index. For
the SST anomalies, the distribution is flat, although
the two extreme categories are somewhat higher than
their adjacent categories. The 12-month lead hind-
cast is better in this respect than the 3-month lead
hindcast, however. This indicates that the ensemble
spread at longer lead is more reasonable. Also, there
is a small shift of frequencies (i.e., the frequencies in
the upper intervals are decreasing from shorter lead
time to longer lead time, while the frequencies in the
lower and middle intervals are increasing at the same
time) of the verifying observation from the lower cat-
egories to the higher categories at all four lead times.
The Talagrand diagrams indicate that the probability
distribution of observations can be represented by the
ensemble approach.
As described in section 2, our ensemble members
are generated based on the hypothesis of a Gaussian
distribution, but the standard normally distributed
perturbations are processed at all model grids (not on
regions), and thus we need to check whether probabil-
ity distributions of the Ni˜no-3.4 forecasted ensemble
members accord with the Gaussian distribution. Fig-
ure 7 shows the normalized probability curve of the
forecasted ensemble members over the Ni˜no-3.4 region
based on the entire 120-yr period. At different lead
times, the forecasted ensemble members agree with
the normal distribution quite well, and there are no
366 ENSEMBLE ENSO HINDCAST OVER PAST 120 YEARS VOL. 26
Fig. 6. Talagrand diagram for the full ensemble Ni˜no-3.4 SST anomaly hindcast over the
whole 120-year period: (a) 3-month lead time, (b) 6-month lead time, (c) 9-month lead time,
and (d) 12-month lead time hindcasts. The dashed line marks the theoretical frequency for
a perfectly reliable EPS.
Fig. 7. Gaussian distribution diagram for the Ni˜no-3.4 ensemble SST anomaly hindcast: (a)
3-month lead time, (b) 6-month lead time, (c) 9-month lead time, and (d) 12-month lead
time. The dashed line marks a standard normal probability curve.
NO. 2 ZHENG ET AL. 367
Fig. 8. Ni˜no-3.4 ROC curves for a lead time of (a) 3 months, (b) 6 months, (c) 9 months,
and (d) 12 months. Warm events (upper tercile) are denoted with closed squares, normal
events (middle tercile) are denoted with open circles, and cold events (lower tercile) are
denoted with asterisks.
double- or multi-modal peaks for the ensemble mem-
bers. This indicates that the generation methods of
the forecast ensemble members are reasonable, and the
ensemble-mean forecast result is the most representa-
tive deterministic forecast, capable of illustrating the
deterministic performance of the EPS.
To measure the probabilistic prediction skill more
accurately, here we choose the method commonly re-
ferred to as relative operating characteristic (ROC;
e.g., Mason and Graham, 1999) to measure the en-
semble forecast performance by comparing the fraction
of events that were properly forewarned (i.e., the hit
rate) with the fraction of nonevents that occurred after
a warning was issued (i.e., the false alarm rate). The
ratios are determined from contingency tables and the
events are predefined and expressed in binary terms.
Given an ensemble of hindcasts, an ROC curve show-
ing the different combinations of hit and false alarm
rates given different forecast probabilities can be con-
structed. The ROC curve is useful for identifying op-
timum strategies for issuing warnings, by indicating
the trade-off between false alarms and misses. Details
and examples of the ROC calculation can be found in
Mason and Graham (1999).
ROC curves for the Ni˜no-3.4 hindcasts at lead
times of 3, 6, 9, and 12 months are shown in Fig. 8.
For all lead times, there are three curves representing
three different event types: (i) warm events (upper ter-
cile), (ii) cold events (lower tercile), and (iii) normal
events (middle tercile), where both the retrospective
forecasts and the observations have been normalized
by their local standard deviation. An ideal probabilis-
tic forecast system would have relatively large hit rates
and small false alarm rates so that all the points on
the ROC curve would cluster in the upper-left corner
of the diagram (e.g., Kirtman, 2003). For a relatively
poor forecast system, all the points of the ROC curve
would lie very close to the dashed diagonal line indi-
cating that the hit rate and the false alarm rate were
nearly the same (i.e., no skill). Akin to previous stud-
ies (e.g., Kirtman, 2003; DeWitt, 2005), the EPS has
relatively higher skill for the warm events and cold
events, and it has relatively lower skill for the neutral
events. For 3- and 6-month lead times, both warm and
cold events are fairly well predicted. The false alarm
rates are low and the hit rates are relatively high when
the agreement among the ensemble members is rela-
tively large. For a normal event forecast, the 3-month
lead time also has some skill although smaller than for
the extremes, whereas for 6-, 9-, and 12-month leads,
the ROC curve lies close to the diagonal, indicating
little skill. These results indicate that the EPS can
capture and predict big SST anomaly signals or ex-
treme events over the Ni˜no-3.4 region in different sea-
368 ENSEMBLE ENSO HINDCAST OVER PAST 120 YEARS VOL. 26
sons quite well (Zhang et al., 2005), and the model is
able to predict extreme events. At 9- and 12-month
lead times, there is a considerable drop in skill. High
confidence forecasts for warm and cold events are only
marginally better than those for normal events, sug-
gesting that a confident forecast for a warm or cold
event at 12 months lead time is still not particularly
useful. This is also appeared to be the case with the
earlier studies (e.g., Barnston et al., 1999; Kirtman,
2003).
The ability to easily verify the hindcast skill of
warm events and cold events separately is one of the
advantages of the ROC calculation, and thus we fur-
ther used the ROC area to verify the probabilistic
skills of the EPS for the three different events. The
ROC area is the area under the ROC curve, and a
perfect forecast system would have a ROC area of 1
while a system with no ability to distinguish in ad-
vance between different events would have a score of
0.5. Figure 9 shows the ROC area of SST anomalies
for warm events, normal events, and cold events over
the Ni˜no-3.4 region, as a function of lead time. Similar
to the analysis results above for the 120-yr hindcast,
the ROC areas for both the warm and cold events are
clearly higher than that of the neutral events during
the 12-month forecast period. This also indicates that
a large (initial) signal can lead to a reliable prediction
and high prediction skill, and that for small predicted
signals, the evolution of predicted SST anomalies in
our EPS might present a more chaotic evolution, which
would degrade prediction skill and induce obvious de-
Fig. 9. ROC area of SST anomalies for warm events
(solid line), normal events (dot-dashed line), and cold
events (dashed line) over the Ni˜no-3.4 region, shown as
a function of lead time.
creases of predictability (e.g., Zheng et al., 2009).
4. Variation of ENSO ensemble predictability
Similar to previous studies (e.g., Chen et al., 2004),
Fig. 3 shows that the characteristics of the interan-
nual variability obviously have changed with time. To
examine the possible interdecadal variation of ENSO
ensemble predictability, in this section, we calculate
both deterministic and probabilistic prediction skills
of 6 sub-periods of 20 years each.
4.1 Deterministic predictability
ENSO’s deterministic predictability depends on
the time period in which it is estimated (Balmaseda
et al., 1995; Kirtman and Schopf, 1998; Chen et al.,
2004). This is also evident in Fig. 10. For the six
sub-periods of 20-year each, both anomaly correlation
and RMS error vary over significant ranges, especially
Fig. 10. Anomaly correlation (top) and RMS error (bot-
tom) between the observed and the ensemble-mean pre-
dicted values of the Ni˜no-3.4 index. These are shown as
a function of lead time, for six consecutive 20-yr periods
since 1886.
NO. 2 ZHENG ET AL. 369
Fig. 11. The averaged correlation (top) and RMS error
(bottom) between the observed and the predicted Ni˜no-
3.4 SST anomalies at 6-month lead time. The correlation
and RMS error are computed at each running window of
20-yr period from 1886 to 2005. The shaded area rep-
resents the 95% confidence interval via bootstrap proce-
dures.
at longer lead times. For example, high prediction
skills appear in the late 19th century and the middle-
late 20th century (i.e., 1886–1905, 1966–85, and 1986–
2005), these periods are dominated by strong and regu-
lar ENSO events. The high scores for the 1966–85 and
1986–2005 periods might not be surprising because the
model is trained using data from part of this period,
and the high scores for the 1886–1905 period, which is
free of artificial skill, indicate that the large El Ni˜no
and La Ni˜na events can be highly predictable, even
initialized with only SST anomaly data. But, the pe-
riods of 1906–25, 1926–45, and 1946–65 have relatively
low prediction skills. The lower skill in these periods
is consistent with there being fewer and smaller events
to predict.
The consistent temporal variations of the determin-
istic prediction skills of the EPS are further displayed
in Fig. 11, which shows the averaged correlation and
RMS error at 6-month lead time measured by a run-
ning window of 20-yr from 1886 to 2005 (i.e., 1886–
1905, 1887–1906, ···, 1986–2005). For example, the
skill at 1896 was calculated using the samples from
1886–1905. The 20-yr window is shifted by one year
for each time starting from 1886 to 2005. There is a
striking interdecadal variation of ENSO deterministic
predictability (in both the correlation and RMS error)
over the past 120 years from 1886 to 2005 in the EPS.
Generally, there is high predictability in the late 19th
century and in the middle-late 20th century, and a low
predictability from 1906 to 1951 (correlation is lower
than 0.50).
A bootstrapped resampling procedure (Efron and
Tibshirani, 1986) is also used to derive useful confi-
dence limits for the skill scores in order to allow mean-
ingful statistical conclusions to be drawn from these
comparisons. The shaded area in Fig. 11 represents
the 95% confidence interval computed using bootstrap
procedures, and indicates the uncertainty of verifica-
tion sampling. Considering the confidence interval,
both the correlation and RMS error results have shown
that verification sampling have smaller influence on
the forecast skill scores than differences between the
skills in the different decades. This might because the
ensemble members match the Gaussian distribution
quite well at different lead times (Fig. 7), and thus
that the resampling process makes little adjustments
on the distribution of the forecasted ensembles.
4.2 Probabilistic predictability
To verify the variations of the probabilistic pre-
dictability of the EPS, we examine the temporal
changes of the ROC area for the three different event
types. Figure 12 shows the ROC area in the Ni˜no-
3.4 region for warm events, cold events, and normal
events in six consecutive 20-yr periods since 1886. Ob-
viously, the ENSO probabilistic predictability for dif-
ferent events also depends on the time period. For
warm and cold events, high probabilistic prediction
skills still appear in the late 19th century (i.e., 1886–
1905, when the skill for warm events is only a little
higher than that in the early 20th century) and the
middle-late 20th century (1966–85 and 1986–2005),
with the highest skills for warm events in the sub-
period 1966–85, and highest probabilistic prediction
skills for cold events in sub-period 1986–2005.
In order to illustrate the interdecadal features of
the probabilistic prediction skills more clearly,we fur-
ther verify the consistent temporal variations of the
probabilistic prediction skills of the EPS. Figure 13
shows the ROC area for the three different event types
at 6-month lead time measured by a running window
of 20-yr from 1886 to 2005. For the warm events,
the highest skill appears in the late 20th century, and
the lowest skill appears from 1910 to 1930. For the
cold events, the highest skill also appears in the late
20th century,and the lowest skill emerges from 1920 to
1950. For the neutral events, the 20-yr averaged skill
decreases from 1896 to 1910, and takes a linear in-
creasing feature from 1910 to 1995. The uncertainty of
verification sampling is also shown in Fig. 13 using the
95% confidence interval computed via bootstrap pro-
cedures. Considering the confidence interval, for the
three different events, the ROC analysis results also
show that verification sampling has smaller influence
370 ENSEMBLE ENSO HINDCAST OVER PAST 120 YEARS VOL. 26
Fig. 12. ROC area in the Ni˜no-3.4 region for (a) warm events (upper tercile), (b)
cold events (middle tercile), and (c) normal events (lower tercile). These are shown
as a function of lead time, for six consecutive 20-yr periods since 1886.
on the forecast skill scores than differences between
the skills in the different decades. Compared to
Fig. 11, the probabilistic verification uncertainties are
larger than the deterministic verification uncertainties.
However, in summary, for three different the event
types, there are still obvious interdecadal variations
of ENSO probabilistic predictability over the past 120
years from 1886 to 2005 in the EPS.
5. Discussions and conclusions
In this paper, long-term retrospective ensemble
forecasts using 100 members covering the past 120
years are performed with an EPS. With the assimi-
lation of only a historic SST dataset, the prediction
skills of the EPS are verified in both a deterministic
and probabilistic sense, and the EPS displays useful
prediction skill. An interesting finding from the retro-
spective ensemble forecasts is that the EPS showed in-
terdecadal variations in both deterministic and proba-
bilistic prediction skills. Both deterministic and prob-
abilistic prediction skills are high in the late 19th cen-
tury from 1886 to 1905, and then decline with time,
reaching a minimum around 1910–50, beyond which
skill rebounds and increases with time from the 1960s
onward. The EPS has relatively high prediction skill
(but also including some artificial skill) from the 1960s
onward, especially in the late 20th century from 1986
to 2005. These results are similar to previous studies
(e.g., Chen et al., 2004; Tang et al., 2008), although
there are still some differences in the prediction skills
among different models [which is also shown in Tang
et al. (2008)]. However, the trends of the interdecadal
variations in different models appear comparable (i.e.,
higher predictability in the late 19th century and in
the middle-late 20th century, and a lower predictabil-
ity in early 20th century). These results all indicate
that the interdecadal variability of ENSO (determin-
istic and probabilistic) predictability exists generally,
and is not model dependent.
However, it should be noted that the theoretical
framework discussed in this study is based on a rela-
tively simple EPS, and one could argue that our anal-
ysis is not complete since we only use SST data. One
serious question is whether or not this interdecadal
variation in predictability discussed in this paper is
due mainly to the differences of the quality of the data
in different periods. For example, one can guess that
the high prediction skill for the period from 1966 to
2005 is probably due to better data quality because
of improvements of observation systems and the fact
that the model was trained using the data from part
of this period.
To explore this, we can examine the simulation skill
NO. 2 ZHENG ET AL. 371
Fig. 13. The averaged ROC area of Ni˜no-3.4 SST
anomalies for (a) warm events, (b) normal events, and
(c) cold events at 6-month lead time, respectively. The
ROC areas are computed at each running window of 20-
yr period from 1886 to 2005. The shaded area represents
the 95% confidence interval via bootstrap procedures.
Fig. 14. The averaged correlation (solid line) and RMS
error (dashed line) between the observed and the simu-
lated Ni˜no-3.4 SST anomalies forced by the reconstructed
wind stress anomalies. The correlation and RMS error
are computed at each running window of 20-yr period
from 1886 to 2005.
of the model forced by the reconstructed wind stress
from the atmospheric τmodel. Thus, the quality of
initial conditions of predictions and the model perfor-
mance can be indicated by the simulation skill, with
both inherent to the data quality. And the existence
of an impact of the data quality on the model’s sim-
ulation skill will be mostly felt through the quality of
initial conditions, such as initial SST anomalies. Fig-
ure 14 shows the averaged correlation and RMS er-
ror between the observed and the simulated Ni˜no-3.4
SST anomalies forced by the reconstructed wind stress
anomalies at each running window of 20-yr period from
1886 to 2005, and indicates that the interdecadal dif-
ference of the simulation skill is not large in the model.
The magnitude of variation is about 0.1 from maxi-
mum to minimum during the entire period for both
correlation and RMS error (units: C). A comparison
between Figs. 11 and 14 reveals that the interdecadal
variation in predictability does not agree with that in
the simulation skill. Thus, interdecadal variation in
predictability is not due to model performance associ-
ated with data quality. This is further suggested by
the fact that noticeably higher prediction skill also oc-
curs during the period from 1886 to 1905.
The results of the analyses in this paper moti-
vate us to further investigate the possible reasons
and sources of limited ENSO predictability in detail.
These concerns in future works need to be addressed
through more comprehensive analyses, and other pos-
sible sources (besides the ENSO signal) of controlling
ENSO predictability also need to be further discussed,
such as nonlinearity and stochastic noise. Neverthe-
less, this study is to date the first work to discuss both
ENSO deterministic and probabilistic predictabilities
using ensemble forecasts and long-term predictions.
The results and conclusions found in this EPS might
be helpful for the study of ENSO predictability.
Acknowledgements. The authors wish to thank the
two anonymous reviewers for their very helpful comments
and suggestions. This research is supported by the Chinese
Academy of Science (Grant No. KZCX2-YW-202), Na-
tional Basic Research Program of China (2006CB403600)
and National Natural Science Foundation of China (Grant
Nos. 40437017 and 40805033).
REFERENCES
Balmaseda,M.A.,M.K.Davey,andD.L.T.Anderson,
1995: Decadal and seasonal dependence of ENSO
prediction skill. J. Climate,8, 2705–2715.
Barnston, A. G., M. Glantz, and Y. He, 1999: Predictive
skill of statistical and dynamical climate models in
SST forecasts during the 1997–98 El Ni˜no and the
1998 La Ni˜na onset. Bull. Amer. Meteor. Soc.,80,
217–243.
Chen, D., M. A. Cane, A. Kaplan, S. E. Zebiak, and D.
Huang, 2004: Predictability of El Ni˜no in the past
148 years. Nature,428, 733–736.
DeWitt, D. G., 2005: Retrospective forecasts of interan-
372 ENSEMBLE ENSO HINDCAST OVER PAST 120 YEARS VOL. 26
nual sea surface temperature anomalies from 1982 to
present using a directly coupled atmosphere-ocean
general circulation model. Mon. Wea. Rev.,133,
2972–2995.
Efron, B., and R. Tibshirani, 1986: Bootstrap methods
for standard errors, confidence intervals, and other
measures of statistical accuracy. Statistical Science,
1, 54–77.
Evensen, G., 2003: The ensemble Kalman filter: Theoret-
ical formulation and practical implementation. Ocean
Dynamics,53, 343–367.
Evensen, G., 2004: Sampling strategies and square root
analysis schemes for the EnKF. Ocea n Dyn ami cs,54,
539–560.
Goswami, B. N., and J. Shukla, 1991: Predictability of
a coupled ocean-atmosphere model. J. Climate,4,
3–22.
Hamill, T. M., 2001: Interpretation of rank histograms for
verifying ensemble forecasts. Mon. Wea. Rev.,129,
550–560.
Kirtman, B. P., and P. S. Schopf, 1998: Decadal variabil-
ity in ENSO predictability and prediction. J. Cli-
mate,11, 2804–2822.
Kirtman, B. P., 2003: The COLA anomaly coupled
model: Ensemble ENSO prediction. Mon. Wea. Rev.,
131, 2324–2341.
Keenlyside, N., and R. Kleeman, 2002: On the annual
cycle of the zonal currents in the equatorial Pacific.
J. Geophys. Res.,107, doi: 10.1029/2000JC0007111.
Latif, M., and Coauthors, 1998: A review of the pre-
dictability and prediction of ENSO. J. Geophys. Res.,
103, 14,375–14,393.
Mason, S. J., and N. E. Graham, 1999: Conditional prob-
abilities, relative operating characteristics, and rela-
tive operating levels. Wea. Forecasti ng,14, 713–725.
McCreary, J. P., 1981: A linear stratified ocean model of
the equatorial undercurrent. Philosophical Transac-
tions of the Royal Society (London), 298, 603–635.
Moore, A., and Coauthors, 2006: Optimal forcing pat-
terns for coupled models of ENSO. J. Climate,19,
4683–4699.
Peng, P., and A. Kumar, 2005: A large ensemble analysis
of the influence of tropical SSTs on seasonal atmo-
spheric variability. J. Climate,15, 1068–1085.
Smith, T. M., and R. W. Reynolds, 2004: Improved ex-
tended reconstruction of SST (1854–1997). J. Cli-
mate,17, 2466–2477.
Smith, T. M., R. W. Reynolds, T. C. Peterson, and J.
Lawrimore, 2008: Improvements to NOAA’s histor-
ical merged land-ocean surface temperature analysis
(1880–2006). J. Climate,21, 2283–2296.
Talagrand, O., R. Vautard, and B. Strauss, 1998: Evalu-
ation of probabilistic prediction systems. Proc. Sem-
inar on Predictability, Reading, United Kingdom,
ECMWF, 1–26.
Tang, Y., and W. W. Hsieh, 2003: ENSO simulation and
predictions using a hybrid coupled model with data
assimilation. J. Meteor. Soc. Japan,81, 1–19.
Tang, Y., Z. Deng, X. Zhou, and Y. Cheng, 2008: Inter-
decadal variation of ENSO predictability in multiple
models. J. Climate,21, 4811–4833.
Tziperman, E., L. Stone, M. A. Cane, and H. Jarosh,
1994: El Ni˜no chaos: Overlapping of resonances
between the seasonal cycle and the Pacific ocean-
atmosphere oscillator. Science,264, 72–74.
Zhang, R.-H., S. E. Zebiak, R. Kleeman, and N. Keenly-
side, 2003: A new intermediate coupled model for El
Ni˜no simulation and prediction. Geophys. Res. Lett.,
30(19), 2012, doi: 10.1029/2003GL018010.
Zhang, R.-H., S. E. Zebiak, R. Kleeman, and N. Keenly-
side, 2005: Retrospective El Ni˜no forecast using an
improved intermediate coupled model. Mon. Wea.
Rev.,133, 2777–2802.
Zheng, F., J. Zhu, R.-H. Zhang, and G.-Q. Zhou,
2006a: Ensemble hindcasts of SST anomalies in
the tropical Pacific using an intermediate cou-
pled model. Geophys. Res. Lett.,33, L19604, doi:
10.1029/2006GL026994.
Zheng, F., J. Zhu, R.-H. Zhang, and G.-Q. Zhou, 2006b:
Improved ENSO forecasts by assimilating sea surface
temperature observations into an intermediate cou-
pled model. Adv. Atmos. S ci. ,23(4), 615–624, doi:
10.1007/s00376-006-0615-z.
Zheng, F., 2007: Research on ENSO ensemble predic-
tions. Ph. D. dissertation, Institute of Atmospheric
Physics, Chinese Academy of Sciences, 159pp. (in
Chinese)
Zheng, F., J. Zhu, and R.-H. Zhang, 2007: Impact of al-
timetry data on ENSO ensemble initializations and
predictions. Geophys. Res. Lett.,34, L13611, doi:
10.1029/2007GL030451.
Zheng, F., and J. Zhu, 2008: Balanced multivariate model
errors of an intermediate coupled model for ensem-
ble Kalman filter data assimilation. J. Geophys. Res.,
113, C07002, doi: 10.1029/2007JC004621.
Zheng, F., H. Wang, and J. Zhu, 2009: Impacts on ENSO
ensemble prediction: Initial-error perturbations vs.
model-error perturbations. Chinese Science Bulletin.
(in press)
... Performing a retrospective forecast (or hindcast) for ENSO is a common practice to assess its deterministic or probabilistic prediction skill. Many efforts have focused on the deterministic skill of ENSO with various intermediate complexity models (ICMs; Chen et al., 2004;Cheng et al., 2010;Liu et al., 2018;Zheng & Yu, 2017;Zheng et al., 2009), hybrid coupled models (Deng & Tang, 2009;Tang & Deng, 2011;Tang et al., 2008) and complicated coupled general circulation models (CGCMs; Huang et al., 2017;Johnson et al., 2019;Kirtman et al., 2014;Lin et al., 2020;Liu et al., 2022;Luo et al., 2008;MacLachlan et al., 2015;Merryfield et al., 2013;Qiao et al., 2013;Saha et al., 2014;Takaya et al., 2017;Weisheimer et al., 2020;Zhu et al., 2017). Kirtman (2003) first emphasized that ENSO prediction should be probabilistic, so as to provide additional uncertainty estimates that could not be gleaned from the ensemble mean. ...
... Kirtman (2003) first emphasized that ENSO prediction should be probabilistic, so as to provide additional uncertainty estimates that could not be gleaned from the ensemble mean. Thereafter, several studies investigated the probabilistic skill with long-term ENSO hindcasts in the ICMs, and they have reported that cold and warm events are more predictable than neutral conditions (Chen & Cane, 2008;Zheng et al., 2009), and that the probabilistic skill is sensitive to how the ensemble is constructed (Cheng et al., 2010). Other efforts have focused on the CGCMs, and have also found the highest probability skill for cold events, followed by warm events and then by neutral events (DeWitt, 2005;Kirtman, 2003). ...
... Because of the small ensemble spread at lead month 3 (Figure 1a), there is a U-type distribution in the rank histogram, whereby observations occur more frequently in the bins at the tails of the distribution. This is consistent with the results reported by Zheng et al. (2009) in ICM. As the ensemble spread develops with increasing lead time, the distribution of the rank histogram gradually flattens. ...
Article
Full-text available
In this study, we investigate probabilistic predictability for the El Niño‐Southern Oscillation (ENSO) by assessing both actual prediction skill and potential predictability using a long‐term retrospective forecast from a complicated coupled general circulation model (CGCM). Our results indicate that above and below normal events are more predictable than neutral events. The probabilistic prediction skill suffers prominent “Spring Predictability Barrier” and undergoes notable interdecadal variation. For the above and below normal events, the lowest probabilistic prediction skills appear during 1920–1940 and the higher prediction skills occur after the 1960s. The seasonal and interdecadal variability of the probabilistic prediction skill stems mainly from the variability of the ENSO signal intensity. There is much room for improvement for the predictability of all three categories of ENSO events. At least an additional 1 or 2 months of skillful probabilistic predictions can be expected to progress in the future. To our knowledge, this is the first study to use a CGCM to evaluate probabilistic predictability for ENSO at various time scales.
... The relatively short retrospective forecast periods only contain a few ENSO cycles, which are inadequate for us to achieve statistically robust cognize for the ENSO predictability, particularly with respect to the interdecadal variation. The existing long-term ENSO retrospective predictions have been mainly conducted with intermediate (Chen et al. 2004;Zheng et al. 2009;Cheng et al. 2010;Liu et al. 2019;Gao et al. 2020) and hybrid coupled models (Tang et al. 2008a;Deng and Tang 2009;Tang and Deng 2011). These efforts revealed the interdecadal variation of the actual prediction skill for the canonical ENSO and the possible reasons accounting for this phenomenon. ...
... Previous studies have shown that the predictability of the EP ENSO has undergone a significant interdecadal variation in intermediate (Chen et al. 2004;Zheng et al. 2009;Cheng et al. 2010) and hybrid coupled models (Tang et al. 2008a;Deng and Tang 2009;Tang and Deng 2011). This is also the case for the CGCM ensemble prediction system, as shown in Fig. 4. ...
Article
Full-text available
In this study, we evaluated the predictability of the two flavors of the El Niño Southern Oscillation (ENSO) based on a long-term retrospective prediction from 1881 to 2017 with the Community Earth System Model. Specifically, the Central-Pacific (CP) ENSO has a more obvious Spring Predictability Barrier and lower deterministic prediction skill than the Eastern-Pacific (EP) ENSO. The potential predictability declines with lead time for both the two flavors of ENSO, and the EP ENSO has a higher upper limit of the prediction skill as compared with the CP ENSO. The predictability of the two flavors of ENSO shows distinct interdecadal variation for both actual skill and potential predictability; however, their trends in the predictability are not synchronized. The signal component controls the seasonal and interdecadal variations of predictability for the two flavors of ENSO, and has larger contribution to the CP ENSO than the EP ENSO. There is significant scope for improvement in predicting the two flavors of ENSO, especially for the CP ENSO.
... This paper adopted the ENSO ensemble prediction system (EPS) developed at the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences Zheng et al., 2009;, to evaluate the key processes of the 2020/ 21 La Niña event. This system utilizes the ensemble Kalman filter (EnKF) data assimilation method (Zheng and Zhu, 2010), which is based on an intermediate coupled model, and establishes ENSO real-time prediction by taking the initial uncertainty and the uncertainty in the prediction process into account (Zheng et al., 2009). ...
... This paper adopted the ENSO ensemble prediction system (EPS) developed at the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences Zheng et al., 2009;, to evaluate the key processes of the 2020/ 21 La Niña event. This system utilizes the ensemble Kalman filter (EnKF) data assimilation method (Zheng and Zhu, 2010), which is based on an intermediate coupled model, and establishes ENSO real-time prediction by taking the initial uncertainty and the uncertainty in the prediction process into account (Zheng et al., 2009). The implementation of a 20-year retrospective 12-month ensemble forecast experiment proved that the EPS can successfully predict the possibility of ENSO events 1 year in advance . ...
Article
Full-text available
The 2020/21 La Niña was not well predicted by most climate models when it started in early-mid 2020. This paper adopted an El Niño-Southern Oscillation (ENSO) ensemble prediction system to evaluate the key physical processes in the development of this cold event by performing a clustering analysis of 100 ensemble member predictions 1 year in advance. The abilities of two clustering approaches were first examined in regard to capturing the development of the 2020/21 La Niña event. One approach was index clustering, which adopted only the 12-month Niño3.4 indices in 2020 as an indicator, and the other was pattern clustering through contrasting the evolution of sea surface temperature (SST) anomalies over the tropical Pacific in 2020 for clustering. Pattern clustering surpasses index clustering in better describing the evolution over the off-equatorial and equatorial regions during the 2020/21 La Niña. Consequently, based on the pattern clustering approach, a comparison of the selected most (five best) and least (five worst) representative ensemble members illustrated that the predominance of anomalous southeasterly winds over the central equatorial Pacific in spring 2020 played a crucial role in initiating the moderate La Niña event in 2020/21, by preventing the development of westerly winds over the warm pool. Moreover, the inherent spring predictability barrier (SPB) was still a major challenge for improving the prediction skill of the 2020/21 La Niña event when the prediction occurred across the spring season.
... The gray shading represents the prediction spread. also checked the relative operating characteristic (ROC; Mason and Graham 1999) skill, which is another widely used probabilistic measure (Dewitt, 2005;Chen and Cane, 2008;Zheng et al., 2009) and essentially reacting the similar characteristics of resolution term of BSS (Yang et al., 2021). Compared with any one particular model, the MME also exhibits the highest ROC skill at each lead times for all three categories of ENSO events (not shown). ...
Article
Full-text available
The El Niño and Southern Oscillation (ENSO) is the primary source of predictability for seasonal climate prediction. To improve the ENSO prediction skill, we established a multi-model ensemble (MME) prediction system, which consists of 5 dynamical coupled models with various complexities, parameterizations, resolutions, initializations and ensemble strategies, to account for the uncertainties as sufficiently as possible. Our results demonstrated the superiority of the MME over individual models, with dramatically reduced the root mean square error and improved the anomaly correlation skill, which can compete with, or even exceed the skill of the North American Multi-Model Ensemble. In addition, the MME suffered less from the spring predictability barrier and offered more reliable probabilistic prediction. The real-time MME prediction adequately captured the latest successive La Niña events and the secondary cooling trend six months ahead. Our MME prediction has, since April 2022, forecasted the possible occurrence of a third-year La Niña event. Overall, our MME prediction system offers better skill for both deterministic and probabilistic ENSO prediction than all participating models. These improvements are probably due to the complementary contributions of multiple models to provide additive predictive information, as well as the large ensemble size that covers a more reasonable uncertainty distribution.
... In this study, we adopted the ensemble prediction system (EPS) developed at the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (Zheng et al., 2006(Zheng et al., , 2007(Zheng et al., , 2009Zhu, 2010, 2016), and evaluated its performance in predicting the moderate 2020/21 La Niña event (please refer to the online supplementary file for details). The skill of this ENSO prediction system is documented in and Zheng and Yu (2017), in which a 20-year retrospective forecast comparison shows that good forecast skill of the EPS with a prediction lead time of up to one year is possible. ...
Article
Full-text available
Several consecutive extreme cold events impacted China during the first half of winter 2020/21, breaking the low-temperature records in many cities. How to make accurate climate predictions of extreme cold events is still an urgent issue. The synergistic effect of the warm Arctic and cold tropical Pacific has been demonstrated to intensify the intrusions of cold air from polar regions into middle-high latitudes, further influencing the cold conditions in China. However, climate models failed to predict these two ocean environments at expected lead times. Most seasonal climate forecasts only predicted the 2020/21 La Niña after the signal had already become apparent and significantly underestimated the observed Arctic sea ice loss in autumn 2020 with a 1–2 month advancement. In this work, the corresponding physical factors that may help improve the accuracy of seasonal climate predictions are further explored. For the 2020/21 La Niña prediction, through sensitivity experiments involving different atmospheric-oceanic initial conditions, the predominant southeasterly wind anomalies over the equatorial Pacific in spring of 2020 are diagnosed to play an irreplaceable role in triggering this cold event. A reasonable inclusion of atmospheric surface winds into the initialization will help the model predict La Niña development from the early spring of 2020. For predicting the Arctic sea ice loss in autumn 2020, an anomalously cyclonic circulation from the central Arctic Ocean predicted by the model, which swept abnormally hot air over Siberia into the Arctic Ocean, is recognized as an important contributor to successfully predicting the minimum Arctic sea ice extent.
... The data assimilation method is a technique of statistical combination that combines the forecasted result with the initial observation data. This technique is used to correct the initial data that are to be fed into the EICM [13,14]. The process of data assimilation between oceanic and atmospheric improved the El Nino forecasts compared to the forecasting result without data assimilation [15]. ...
Article
Full-text available
The Ensemble Intermediate Coupled Model (EICM) is a model used for studying the El Niño-Southern Oscillation (ENSO) phenomenon in the Pacific Ocean, which is anomalies in the Sea Surface Temperature (SST) are observed. This research aims to implement Cressman to improve SST forecasts. The simulation considers two cases in this work: the control case and the Cressman initialized case. These cases are simulations using different inputs where the two inputs differ in terms of their resolution and data source. The Cressman method is used to initialize the model with an analysis product based on satellite data and in situ data such as ships, buoys, and Argo floats, with a resolution of 0.25 × 0.25 degrees. The results of this inclusion are the Cressman Initialized Ensemble Intermediate Coupled Model (CIEICM). Forecasting of the sea surface temperature anomalies was conducted using both the EICM and the CIEICM. The results show that the calculation of SST field from the CIEICM was more accurate than that from the EICM. The forecast using the CIEICM initialization with the higher-resolution satellite-based analysis at a 6-month lead time improved the root mean square deviation to 0.794 from 0.808 and the correlation coefficient to 0.630 from 0.611, compared the control model that was directly initialized with the low-resolution in-situ-based analysis.
... For instance, Wang et al. [19] constructed a physical based empirical model which is primarily built on SST anomalies, and found this model has good performance in predicting the western Pacific Subtropical High. Currently, the SST anomalies in the equatorial Pacific associated with the most prominent interannual variability-ENSO can be successfully predicted three seasons ahead [5,[20][21][22][23][24][25][26]. The SST anomalies in the equatorial Indian Ocean associated with the so-called Indian Ocean Dipole (IOD) can be predicted three to four months in advance [27][28][29][30]. ...
Article
Full-text available
The Atlantic Niño/Niña, one of the dominant interannual variability in the equatorial Atlantic, exerts prominent influence on the Earth’s climate, but its prediction skill shown previously was unsatisfactory and limited to two to three months. By diagnosing the recently released North American Multimodel Ensemble (NMME) models, we find that the Atlantic Niño/Niña prediction skills are improved, with the multi-model ensemble (MME) reaching five months. The prediction skills are season-dependent. Specifically, they show a marked dip in boreal spring, suggesting that the Atlantic Niño/Niña prediction suffers a “spring predictability barrier” like ENSO. The prediction skill is higher for Atlantic Niña than for Atlantic Niño, and better in the developing phase than in the decaying phase. The amplitude bias of the Atlantic Niño/Niña is primarily attributed to the amplitude bias in the annual cycle of the equatorial sea surface temperature (SST). The anomaly correlation coefficient scores of the Atlantic Niño/Niña, to a large extent, depend on the prediction skill of the Niño3.4 index in the preceding boreal winter, implying that the precedent ENSO may greatly affect the development of Atlantic Niño/Niña in the following boreal summer.
Book
Full-text available
There are many applications of mathematical physics in several fields of basic science and engineering. Thus, we have tried to provide the Special Issue “Modern Problems of Mathematical Physics and Their Applications” to cover the new advances of mathematical physics and its applications. In this Special Issue, we have focused on some important and challenging topics, such as integral equations, ill-posed problems, ordinary differential equations, partial differential equations, system of equations, fractional problems, linear and nonlinear problems, fuzzy problems, numerical methods, analytical methods, semi-analytical methods, convergence analysis, error analysis and mathematical models. In response to our invitation, we received 31 papers from more than 17 countries (Russia, Uzbekistan, China, USA, Kuwait, Bosnia and Herzegovina, Thailand, Pakistan, Turkey, Nigeria, Jordan, Romania, India, Iran, Argentina, Israel, Canada, etc.), of which 19 were published and 12 rejected.
Article
Full-text available
In this study, we conducted an ensemble retrospective prediction from 1881 to 2017 using the Community Earth System Model to evaluate El Niño–Southern Oscillation (ENSO) predictability and its variability on different time scales. To our knowledge, this is the first assessment of ENSO predictability using a long-term ensemble hindcast with a complicated coupled general circulation model (CGCM). Our results indicate that both the dispersion component (DC) and signal component (SC) contribute to the interannual variation of ENSO predictability (measured by relative entropy). Specifically, the SC is more important for ENSO events, whereas the DC is of comparable importance for short lead times and in weak ENSO signal years. The SC dominates the seasonal variation of ENSO predictability, and an abrupt decrease in signal intensity results in the spring predictability barrier feature of ENSO. At the interdecadal scale, the SC controls the variability of ENSO predictability, while the magnitude of ENSO predictability is determined by the DC. The seasonal and interdecadal variations of ENSO predictability in the CGCM are generally consistent with results based on intermediate complexity and hybrid coupled models. However, the DC has a greater contribution in the CGCM than that in the intermediate complexity and hybrid coupled models. Significance Statement El Niño–Southern Oscillation (ENSO) is a prominent interannual signal in the global climate system with widespread climatic influence. Our current understanding of ENSO predictability is based mainly on long-term retrospective forecasts obtained from intermediate complexity and hybrid coupled models. Compared with those models, complicated coupled general circulation models (CGCMs) include more realistic physical processes and have the potential to reproduce the ENSO complexity. However, hindcast studies based on CGCMs have only focused on the last 20–60 years. In this study, we conducted an ensemble retrospective prediction from 1881 to 2017 using the Community Earth System Model in order to evaluate ENSO predictability and examine its variability on different time scales. To our knowledge, this is the first assessment of ENSO predictability using a long-term ensemble hindcast with a CGCM.
Article
Full-text available
In this study, El Niño–Southern Oscillation (ENSO) retrospective forecasts were performed for the 120 yr from 1881 to 2000 using three realistic models that assimilate the historic dataset of sea surface temperature (SST). By examining these retrospective forecasts and corresponding observations, as well as the oceanic analyses from which forecasts were initialized, several important issues related to ENSO predictability have been explored, including its interdecadal variability and the dominant factors that control the interdecadal variability. The prediction skill of the three models showed a very consistent interdecadal variation, with high skill in the late nineteenth century and in the middle–late twentieth century, and low skill during the period from 1900 to 1960. The interdecadal variation in ENSO predictability is in good agreement with that in the signal of interannual variability and in the degree of asymmetry of ENSO system. A good relationship was also identified between the degree of asymmetry and the signal of interannual variability, and the former is highly related to the latter. Generally, the high predictability is attained when ENSO signal strength and the degree of asymmetry are enhanced, and vice versa. The atmospheric noise generally degrades overall prediction skill, especially for the skill of mean square error, but is able to favor some individual prediction cases. The possible reasons why these factors control ENSO predictability were also discussed.
Article
Full-text available
The ensemble Kalman filter (EnKF) depends on a set of ensemble forecasts to calculate the background error covariances. Without model error perturbations and the inflation of forecast ensembles, the spread of the ensemble forecasts can collapse rapidly. There are several ways to generate model perturbations, i.e., perturbations in model parameters/parameterizations, perturbations in the forcing fields of the model and adding some error terms to the right-hand side of the model equations. In this paper, we focus on the "adding model error terms" approach, which utilizes a first-order Markov chain model. This approach is suitable to those unforced models, such as the coupled atmosphere-ocean models. However, for a multivariate model, the balance between different model variables could be an important issue in building its model-error model. In this paper, we focus on building a balanced error model for an intermediate coupled model for El Niño-Southern Oscillation (ENSO) predictions. A simple approach to build such a model-error model is proposed on the basis of the multivariate empirical orthogonal functions method. EnKF data assimilation experiments with different configurations of multivariate model error treatments (no model errors, unbalanced and balanced model errors) are performed using realistic sea surface temperature (SST) and sea level (SL) observations. Results show that it is necessary to develop balanced, multivariate model-error models in order to successfully assimilate both SST and SL observations. The hindcasts initialized from these different assimilation experiment results also demonstrate that the balanced model errors can yield more balanced initial conditions that lead to improved predictions of ENSO events.
Article
Full-text available
The El Niño/Southern Oscillation (ENSO) predictions strongly depend on the accuracy and dynamical consistency of the coupled initial conditions. Based on the proposed ensemble Kalman filter (EnKF), a new initialization scheme for the ENSO ensemble prediction system (EPS) was designed and tested in an intermediate coupled model (ICM). The inclusion of this scheme in the ICM leads to substantial improvements in ENSO prediction skill via the successful assimilation of both observed sea surface temperature (SST) and TOPEX/Poseidon/Jason-1 (T/P/J) altimeter data into the initial ensemble conditions. Comparisons with the original ensemble hindcast experiment show that the ensemble prediction skills were significantly improved out to a 12-month lead time by improving sea level (SL) initial conditions for better parameterization of subsurface thermal effects. It is clearly demonstrated that improvement in forecast skill can result from the multivariate and multi-observational ensemble data assimilation.
Article
Full-text available
Based on our developed ENSO (El Niño-Southern Oscillation) ensemble prediction system (EPS), the impacts of stochastic initial-error and model-error perturbations on ENSO ensemble predictions are examined and discussed by performing four sets of 14-a retrospective forecast experiments in both a deterministic and probabilistic sense. These forecast schemes are differentiated by whether they considered the initial or model stochastic perturbations. The comparison results suggest that the stochastic model-error perturbations, which are added into the modeled physical fields to mainly represent the uncertainties of the physical model, have significant, positive impacts on improving the ensemble prediction skills during the entire 12-month forecast process. However, the stochastic initial-error perturbations have relatively small impacts on the ensemble prediction system, and its impacts are mainly focusing on the first 3-month predictions.
Article
The relative operating characteristic (ROC) curve is a highly flexible method for representing the quality of dichotomous, categorical, continuous, and probabilistic forecasts. The method is based on ratios that measure the proportions of events and nonevents for which warnings were provided. These ratios provide estimates of the probabilities that an event will be forewarned and that an incorrect warning will be provided for a nonevent. Some guidelines for interpreting the ROC curve are provided. While the ROC curve is of direct interest to the user, the warning is provided in advance of the outcome and so there is additional value in knowing the probability of an event occurring contingent upon a warning being provided or not provided. An alternative method to the ROC curve is proposed that represents forecast quality when expressed in terms of probabilities of events occurring contingent upon the warnings provided. The ratios used provide estimates of the probability of an event occurring given the forecast that is issued. Some problems in constructing the curve in a manner that is directly analogous to that for the ROC curve are highlighted, and so an alternative approach is proposed. In the context of probabilistic forecasts, the ROC curve provides a means of identifying the forecast probability at which forecast value is optimized. In the context of continuous variables, the proposed relative operating levels curve indicates the exceedence threshold for defining an event at which forecast skill is optimized, and can enable the forecast user to estimate the probabilities of events other than that defined by the forecaster.
Article
A large number of ensemble hindcasts (or retrospective forecasts) of tropical Pacific sea surface temperature (SST) have been made with a coupled atmosphere–ocean general circulation model (CGCM) that does not employ flux correction in order to evaluate the potential skill of the model as a seasonal forecasting tool. Oceanic initial conditions are provided by an ocean data assimilation system. Ensembles of seven forecasts of 6-month length are made starting each month in the 1982 to 2002 period. Skill of the coupled model is evaluated from both a deterministic and a probabilistic perspective. The skill metrics are calculated using both the bulk method, which includes all initial condition months together, and as a function of initial condition month. The latter method allows a more objective evaluation of how the model has performed in the context in which forecasts are actually made and applied. The deterministic metrics used are the anomaly correlation and the root-mean-square error. The coupled model deterministic skill metrics are compared with those from persistence and damped persistence reference forecasts. Despite the fact that the coupled model has a large cold bias in the central and eastern equatorial Pacific this coupled model is shown to have forecast skill that is competitive with other state-of-the-art forecasting techniques. Potential skill from probabilistic forecasts made using the coupled model ensemble members are evaluated using the relative operating characteristics method. This analysis indicates that for most initial condition months this coupled model has more skill at forecasting cold events than warm or neutral events in the central Pacific. In common with other forecasting systems, the coupled model forecast skill is found to be lowest for forecasts passing through the Northern Hemisphere (NH) spring. Diagnostics of this so-called spring predictability barrier in the context of this coupled model indicate that two factors likely contribute to this predictability barrier. First, the coupled model shows a too-weak coupling of the surface and subsurface temperature anomalies during NH spring. Second, the coupled-model-simulated signal-to-noise ratio for SST anomalies is much lower during NH spring than at other times of the year, indicating that the model’s potential predictability is low at this time.
Article
A simple coupled model is used to examine decadal variations in El Niño-Southern Oscillation (ENSO) prediction skill and predictability. Without any external forcing, the coupled model produces regular ENSO-like variability with a 5-yr period. Superimposed on the 5-yr oscillation is a relatively weak decadal amplitude modulation with a 20-yr period. External uncoupled atmospheric `weather noise' that is determined from observations is introduced into the coupled model. Including the weather noise leads to irregularity in the ENSO events, shifts the dominant period to 4 yr, and amplifies the decadal signal. The decadal signal results without any external prescribed changes to the mean climate of the model.Using the coupled simulation with weather noise as initial conditions and for verification, a large ensemble of prediction experiments were made. The forecast skill and predictability were examined and shown to have a strong decadal dependence. During decades when the amplitude of the interannual variability is large, the forecast skill is relatively high and the limit of predictability is relatively long. Conversely, during decades when the amplitude of the interannual variability is low, the forecast skill is relatively low and the limit of predictability is relatively short. During decades when the predictability is high, the delayed oscillator mechanism drives the sea surface temperature anomaly (SSTA), and during decades when the predictability is low, the atmospheric noise strongly influences the SSTA. Additional experiments indicate that the relative effectiveness of the delayed oscillator mechanism versus the external noise forcing in determining interannual SSTA variability is strongly influenced by much slower timescale (decadal) variations in the state of the coupled model.
Article
Results are described from a large sample of coupled ocean-atmosphere retrospective forecasts during 1980-99. The prediction system includes a global anomaly coupled general circulation model and a state-of-the-art ocean data assimilation system. The retrospective forecasts are initialized each January, April, July, and October of each year, and ensembles of six forecasts are run for each initial month, yielding a total of 480 1-yr predictions. In generating the ensemble members, perturbations are added to the atmospheric initial state only. The skill of the prediction system is analyzed from both a deterministic and a probabilistic perspective. The probabilistic approach is used to quantify the uncertainty in any given forecast. The deterministic measures of skill for eastern tropical Pacific SST anomalies (SSTAs) suggest that the ensemble mean forecasts are useful up to lead times of 7-9 months. At somewhat shorter leads, the forecasts capture some aspects of the variability in the tropical Indian and Atlantic Oceans. The ensemble mean precipitation anomaly has disappointingly low correlation with observed rainfall. The probabilistic measures of skill (relative operating characteristics) indicate that the distribution of the ensemble provides useful forecast information that could not easily be gleaned from the ensemble mean. In particular, the prediction system has more skill at forecasting cold ENSO events compared to warm events. Despite the fact that the ensemble mean rainfall is not well correlated with the observed, the ensemble distribution does indicate significant regions where there is useful information in the forecast ensemble. In fact, it is possible to detect that droughts over land are more predictable than floods. It is argued that probabilistic verification is an important complement to any deterministic verification, and provides a useful and quantitative way to measure uncertainty.
Article
With a 3D Var assimilation scheme, several types of observations—sea surface temperatures (SST), sea level height anomalies (SLHA), and the upper ocean 400 meter depth-averaged heat content anomalies (HCA)—were assimilated into a hybrid coupled model of the tropical Pacific. The ocean analyses, and prediction skills of the SST anomalies (SSTA) from the assimilation of each type of observation, were presented for 1980-998. SST assimilation, besides improving the simulation of SSTA, also slightly improved the HCA and SLHA simulations in the equatorial Pacific, especially in the east. The ocean analyses with the assimilation of SLHA improved the simulations of SSTA, SLHA and HCA in the equatorial Pacific, while the assimilation of HCA improved the SLHA and HCA simulations. For ENSO predictions, assimilating SST yielded the best prediction skills for the Niño3 region SSTA at lead times of 3 months or shorter, but severely degraded the predictions at longer lead times. The best Niño3 SSTA predictions for lead times longer than 3 months came from the initializations with the assimilation of HCA and SLHA data. Assimilating SLHA yielded prediction skills for the Niño3 SSTA almost as good as assimilating HCA, indicating considerable potential for improving ENSO predictions from altimetry data. In this study, a neural network (NN) approach was used to find the nonlinear statistical relations among model variables for the assimilation of HCA and SLHA. Using NN yielded better prediction skills than using multiple linear regression.
Article
An improved SST reconstruction for the 1854-1997 period is developed. Compared to the version 1 analysis, in the western tropical Pacific, the tropical Atlantic, and Indian Oceans, more variance is resolved in the new analysis. This improved analysis also uses sea ice concentrations to improve the high-latitude SST analysis and a modified historical bias correction for the 1939-41 period. In addition, the new analysis includes an improved error estimate. Analysis uncertainty is largest in the nineteenth century and during the two world wars due to sparse sampling. The near-global average SST in the new analysis is consistent with the version 1 reconstruction. The 95% confidence uncertainty for the near-global average is 0.4°C or more in the nineteenth century, near 0.2°C for the first half of the twentieth century, and 0.1°C or less after 1950.