# Seeking for the rational basis of the median model: the optimal combination of multi-model ensemble results

**ABSTRACT** In this paper we present an approach for the statistical analysis of multi-model ensemble results. The models considered here are operational long-range transport and dispersion models, also used for the real-time simulation of pollutant dispersion or the accidental release of radioactive nuclides. We first introduce the theoretical basis (with its roots sinking into the Bayes theorem) and then apply this approach to the analysis of model results obtained during the ETEX-1 exercise. We recover some interesting results, supporting the heuristic approach called "median model", originally introduced in Galmarini et al. (2004a, b). This approach also provides a way to systematically reduce (and quantify) model uncertainties, thus supporting the decision-making process and/or regulatory-purpose activities in a very effective manner.

**0**Bookmarks

**·**

**64**Views

- Kukkonen J, Balk T, Schultz D. M, Baklanov A, Klein T, Miranda A. I, Monteiro A, Hirtl M, Tarvainen V, Boy M, [......], Sokhi R, Lehtinen K, Karatzas K, San José R, Astitha M, Kallos G, Schaap M, Reimer E, Jakobs H, Eben K[Show abstract] [Hide abstract]

**ABSTRACT:**www.atmos-chem-phys.net/12/1/2012/ATMOSPHERIC CHEMISTRY AND PHYSICS 01/2012; 12(12):1-87. · 5.51 Impact Factor - [Show abstract] [Hide abstract]

**ABSTRACT:**In this paper, we investigate applicability of Bayesian model averaging (BMA) methodology to atmospheric dispersion multimodel ensemble system within the context of emergency response applications. The BMA method can be used both to evaluate model predictions and to combine model results using BMA weighing factors. We analyze time evolution of BMA weights and include a detailed quantitative comparison of different combinations of model results performed by the means of statistical indicators. The analysis allows us to identify similarities and differences among different combined models. Finally, we question the portability of BMA weights among various cases. From the analysis it follows that BMA can be applied in considered problems; however, the median of the model results also performs well and produces more conservative results.Journal of Geophysical Research 01/2010; 115. · 3.17 Impact Factor - SourceAvailable from: K. I. Hoi[Show abstract] [Hide abstract]

**ABSTRACT:**In this study, an adaptive probabilistic algorithm was developed for the selection of the most plausible Kalman filter based time-varying statistical model by using Bayesian approach. The method was validated by a case study involving the prediction of the sample time series recorded in Macau between 2001 and 2005. Two types of statistical models, namely the time-varying autoregressive model of order p, abbreviated as the TVAR (p) model, and the timevarying autoregressive model with exogenous inputs, abbreviated as the TVAREX model, were adopted to predict the sample time series. By judging upon the model occurring plausibility conditional on the measured data, it was found that using a longer past history did not guarantee better prediction performance. On the contrary, inclusion of explanatory variables which reflect the mechanism of the physical problem can better capture the actual system.Procedia Engineering 01/2011; 14:2585-2592.

Page 1

Atmos. Chem. Phys., 7, 6085–6098, 2007

www.atmos-chem-phys.net/7/6085/2007/

© Author(s) 2007. This work is licensed

under a Creative Commons License.

Atmospheric

Chemistry

and Physics

Seeking for the rational basis of the Median Model: the optimal

combination of multi-model ensemble results

A. Riccio1, G. Giunta1, and S. Galmarini2

1Dept. of Applied Science, University of Naples “Parthenope”, Napoli, Italy

2European Commission – DG Joint Research Centre, Institute for Environment and Sustainability, Ispra, Italy

Received: 23 March 2007 – Published in Atmos. Chem. Phys. Discuss.: 27 April 2007

Revised: 21 November 2007 – Accepted: 1 December 2007 – Published: 11 December 2007

Abstract. In this paper we present an approach for the sta-

tistical analysis of multi-model ensemble results. The mod-

els considered here are operational long-range transport and

dispersion models, also used for the real-time simulation of

pollutant dispersion or the accidental release of radioactive

nuclides.

We first introduce the theoretical basis (with its roots sink-

ing into the Bayes theorem) and then apply this approach to

the analysis of model results obtained during the ETEX-1

exercise. We recover some interesting results, supporting the

heuristic approach called “median model”, originally intro-

duced in Galmarini et al. (2004a, b).

This approach also provides a way to systematically re-

duce (and quantify) model uncertainties, thus supporting the

decision-makingprocessand/orregulatory-purposeactivities

in a very effective manner.

1Introduction

Standard meteorological/air quality practice, such as the pre-

diction of the future state of the atmosphere, typically pro-

ceeds conditionally on one assumed model. The model is the

result of the work of many area-expert scientists, e.g. meteo-

rologists, computational scientists, statisticians, and others.

Nowadays, several models are available for the forecast of

variables of meteorological and/or air quality interest, but,

even when using the same ancillary (e.g. initial and bound-

ary) data, they could give different answers to the scientific

question at hand. This is a source of uncertainty in drawing

conclusions, and the typical approach, that is of condition-

ing on a single model deemed to be “the best”, ignores this

source of uncertainty and underestimates the possible effects

of a false forecast.

Correspondence to: A. Riccio

(angelo.riccio@uniparthenope.it)

Ensemble prediction aims at reducing this uncertainty by

means of techniques designed to strategically sample the

forecast pdf, e.g. the breeding of growing modes (Toth and

Kalnay, 1993) or singular vectors (Molteni et al., 1996) in the

weather forecasting field.

Recently, anumberofworksinairqualitymodeling(Delle

Monache and Stull, 2003; Pagowski et al., 2005, 2006a;

Pagowski and Grell, 2006b; Mallet and Sportisse, 2006;

Delle Monache et al., 2006a, b, c; Zhang et al., 2007) suc-

cessfully applied different techniques to demonstrate the ad-

vantage of deterministic ensemble forecasts compared with

forecasts provided by individual models.

The advantages of ensemble prediction are twofold:

– ensemble estimates average out non-predictable compo-

nents, and,

– provide reliable information on uncertainties of pre-

dicted parameters from the diversity amongst ensemble

members.

Recently, the multi-model ensemble prediction system

(Krishnamurti et al., 1999) has been introduced. Instead of

conditioning on a single (ensemble) modeling system, the re-

sults from different climate forecasting models are combined

together. The so-called “superensemble” system demon-

strated to be far superior, in terms of forecasts, to any en-

semble mean.

The multimodel approach has been successfully applied

also to atmospheric dispersion predictions (Galmarini et al.,

2001, 2004a, b) where the uncertainty of weather forecast

sums and mixes with that stemming from the description of

the dispersion process. The methodology relies on the anal-

ysis of the forecasts of several models used operationally by

national meteorological services and environmental protec-

tion agencies worldwide to forecast the evolution of acciden-

tal releases of harmful materials. The objectives are clear:

after the release of hazardous material into the atmosphere, it

Published by Copernicus Publications on behalf of the European Geosciences Union.

Page 2

6086Riccio et al.: Rational basis of the “median model”

is extremely important to support the decision-making pro-

cess with any relevant information and to provide a com-

prehensive analysis of the uncertainties and the confidence

that can be put into the the dispersion forecast. Galmarini et

al. (2004a) showed how the intrinsic differences among the

models can become a useful asset to be exploited for the sake

of a more educated support to decision making by means

of the definition of ad-hoc parameters and treatments of the

model predictions. Among them the definition of the the so-

called Median Model defined as a new set of model results

constructed from the distribution of the model predictions.

The Median Model was shown to be able of outperforming

the results of any single deterministic model in reproducing

the cloud measured during the ETEX experiment (Girardi et

al., 1998).

At the end of their paper Galmarini et al. (2004b) mention:

“At present we are not in the position of providing a rigorous

explanation on why the median model should perform bet-

ter than the single models.”...“Furthermore the conclusions

presented in this paper should be generalized and placed in a

more rigorous theoretical framework”.

This work moves its steps from the above mentioned sen-

tences. In particular we will focus on the second statement as

the first seems to fish deep in the conundrums of theoretical

statistics. More explicitly the questions tackled here are:

1. isitpossibletoplacethemultimodelensembleapproach

within a sound theoretical framework?

2. how to quantify the discrepancies between each ensem-

ble member and observations?

3. And between ensemble-based predictions and observa-

tions?

4. In the case of ensemble-based simulations, predictions

are obtained by merging results from each member. It

is reasonable to suppose that ensemble member predic-

tions are correlated. Even in the case of multimodel

simulations, it is expected that results from different

models are correlated, since they often share similar an-

cillary data, e.g. input data, physics parameterizations,

numerical approaches, and so on. In the case of “cor-

related models”, we expect that data are “clustered”,

thus biasing the ensemble-based results and producing

too much optimistic confidence intervals. How to work

around these problems?

5. Can some of the parameters described in Galmarini et

al. (2004a) be presented in a coherent theoretical frame-

work?

In this work we used a well-known statistical approach

to multimodel data analysis, i.e. Bayesian Model Averaging

(BMA), which is a standard method for combining predictive

distributions from different sources. The BMA predictive

probability density function (pdf) of any quantity of interest

is a weighted average of pdfs centered on the individual bias-

corrected forecasts, where the weights are equal to posterior

probabilities of the models generating the forecasts.

More specifically the objectives of this work consist in the:

– evaluation of the BMA weights, in order to sort the pre-

dictive skill of models;

– quantification of the systematic bias of each model;

– estimation of some useful statistical indexes introduced

in Galmarini et al. (2004a; 2004b),

– exploration of similarities and differences between our

approach and the “median model”,

– quantification of the correlations between models, as a

measure of interdependency.

First, we introduce the theoretical context (the Bayesian

framework), under which ensemble modeling, and much

other, can be placed. In Sect. 3 the BMA approach is

described; this approach provides the way to interpret the

weights used to combine the ensemble members results.

Next (Sect. 4), we introduce the notion of independence and

advance some suggestions about how to take into account the

relations among models. In Sect. 5 a Bayesian hierarchical

model, implementing the procedure to calculate the weights

and the bias of each model, is derived and applied to the test

case of the ETEX-1 experiment. The results are analyzed

and discussed, bringing the “median model” heuristically in-

troduced by Galmarini et al. (2004a, 2b) into a theoretical

framework.

2 Bayes theorem and ensemble prediction

The Bayes theorem plays a fundamental role in the fields of

ensemble modeling, data assimilation, sensitivity and uncer-

tainty analysis. The Bayesian view has been acknowledged

to be the most natural approach for combining various infor-

mationsourceswhilemanagingtheirassociateduncertainties

in a statistically consistent manner (Berliner, 2003).

The optimal combination of ensemble members has its

roots in the Bayes theorem. Essentially, the Bayes theorem

may be expressed as

p(final analysis|ens data) ∝ p(ens data|final analysis)

×p(final analysis).

The power of the Bayes theorem relies on the fact that it

relates the quantity of interest, the probability that the ‘fi-

nal analysis’ is true given the data from the ensemble, to the

probability that we would have observed the data if the final

analysis were true, that is to the likelihood function. The last

term on the right side, p(final analysis), the prior probabil-

ity, represents our state of knowledge (or ignorance) about

Atmos. Chem. Phys., 7, 6085–6098, 2007 www.atmos-chem-phys.net/7/6085/2007/

Page 3

Riccio et al.: Rational basis of the “median model”6087

the “true state” (the final analysis) before data have been an-

alyzed; p(ens data|final analysis) is the likelihood function;

the product of the two yields the posterior probability func-

tion, that is our state of knowledge about the truth in the light

of the data. In a sense, the Bayes theorem can be seen as

a learning process, updating the prior information using the

data from the ensemble predictions.

For sake of clarity, it is useful to briefly review the key

equations in an ensemble prediction system. The practical

implementation of Bayes theorem requires the specification

of a suitable probability model for each ensemble member.

For example, consider two ensemble members. If each p×1

ensemble member state, x{1,2}, is (multivariate) normally dis-

tributed

?x1= x + ε1

where the p×1 vector x is the “true” (final analysis) state and

ε1and ε2are (multivariate) normally distributed errors with

mean zero and covariances ?1and ?2, respectively, then the

Bayesian posterior solution equals to

x2= x + ε2

(1)

x|x1,x2∼ N(xa,?)

withthefinalanalysisxa, andcorrespondingerrorcovariance

?, given by

??−1xa= ?−1

The notation “∼ N(µ,R)” means distributed as a multivari-

ate normal distribution with mean µ and covariance R.

Therefore, the data from the two ensemble members, x1

and x2, can be merged into an optimal estimate, the final

analysis, xa, provided that the linearity and gaussianity as-

sumptions in (1) are a realistic representation of the process

and one can estimate the matrices ?1and ?2. Moreover, the

combination of the two members is optimal in the log score

sense, i.e.

−E?logp(xa)?≤ −E?logp?x{1,2}

since the precision (i.e. the inverse of the covariance matrix)

of the final analysis is the sum of the precision of each mem-

ber. In other words, the optimal combination makes the pos-

terior distribution sharper and the MAP (maximum a poste-

riori) estimate less uncertain.

We can put a step forward this analysis, by using the

Bayes theorem to combine the results of a multimodel en-

semble prediction system into a skillful and well-calibrated

final analysis. Krishnamurti et al. (2000) has defined this en-

tity a “superensemble approach”.

1x1+ ?−1

+ ?−1

2x2

?−1= ?−1

12

.

(2)

??

3The BMA approach

Consider the following scenario:

one assumed model, a researcher gathered data concern-

instead of relying on

ing the state of the atmosphere from different meteoro-

logical centers.The advantages of comparing different

models are evident: each model is an imperfect represen-

tation of the real world and contains several approxima-

tions/parameterizations/lack of physics representations, etc..

Inferences obtained from a single model is risky, since they

do not take into account for the model uncertainties. On

the other hand, the comparison among several models may

highlight the models’ deficiencies, since it is highly unlikely

that each physical phenomenon is equally represented by all

models. The drawbacks of ignoring model uncertainties have

been recognized by many authors a long time ago (e.g., see

the collection of papers in Dijkstra, 1988), but little attention

has been devoted until now.

The problem is how to combine the results from differ-

ent models in a skillful summary. In the statistical litera-

ture the problem of comparing/combining results from dif-

ferent models is a long-standing approach. In his seminal

book, Theory of Probability, Jeffreys (1961) developed a

methodology for quantifying the evidence in favor of a given

model/hypothesis. He introduced the Bayes factor which is

the posterior odds of two hypotheses when their prior proba-

bilities are equal.

In order to introduce the Bayes factor, assume that

data x have arisen from two competing hypotheses/models,

M1 and M2, according to a likelihood function p(x|M1)

and p(x|M2).

p(M2)=1−p(M1), the data produce a posteriori probabili-

ties p(M1|x) and p(M2|x)=1−p(M1|x). From the Bayes

theorem, we obtain

Given a priori probabilities p(M1) and

p(Mk|x) =

p(x|Mk)p(Mk)

p(x|M1)p(M1) + p(x|M2)p(M2)fork =1,2,

(3)

so that,

p(M1|x)

p(M2|x)=p(x|M1)

and the transformation from prior to posterior odds is simply

the multiplication by the Bayes factor

p(x|M2)

p(M1)

p(M2),

B12=p(x|M1)

p(x|M2).

In other words,

posterior odds = Bayes factor × prior odds.

If the two models are equally probable a priori, the Bayes

factor immediately provides the evidence for the first model

with respect to the second one, by transforming the prior

opinion through considerations on the data.

In the case of multiple competing models, Eq. (3) can be

easily generalized to

p(Mk|x) =

p(x|Mk)p(Mk)

?K

Atmos. Chem. Phys., 7, 6085–6098, 2007

k=1p(x|Mk)p(Mk)

fork=1,2,...,K , (4)

www.atmos-chem-phys.net/7/6085/2007/

Page 4

6088Riccio et al.: Rational basis of the “median model”

and, as usual in any Bayesian analysis, the posterior infer-

ence of a quantity of interest, say θ, e.g. a future observation

or a model parameter, can be obtained from its ppd (posterior

predictive distribution), i.e.

p(θ|x) =

K

?

k=1

p(θ|Mk,x)p(Mk|x).

(5)

In this case, the ppd is the average of the posterior distribu-

tion over all models, each weighted by their posterior proba-

bilities. The weights come from (4) and can be used to assess

theusefulnessofensemblemembers, i.e.asabasisforselect-

ing the most skillful model ensemble members: high (close

to one) posterior model probability, p(Mk|x), provides the

quantitative basis to estimate the usefulness of model k in

predicting the parameter of interest, thus playing the same

role as Bayes factors for multiple competing models.

Model (5) is known as BMA (Bayesian Model Average)

in the statistical literature. BMA works around the problem

of conditioning on a single model, taking into account for the

information from different models.

Recently, Raftery and Zheng (2003) reviewed the proper-

ties of BMA. There also several realistic simulation studies

on the performance of BMA in different contexts, e.g. in lin-

earregression(Rafteryetal., 1997), loglinearmodels(Clyde,

1999), logistic regression (Viallefont et al., 2001), wavelets

(Clyde and George, 2000) and medium-range weather fore-

casting models (Raftery et al., 2005).

3.1The properties of BMA

In their paper, Raftery et al. (2005) developed an EM-based

(Expectation Maximization) algorithm to estimate the pa-

rameters in Eq. (5). They were interested in the calibration

of the University of Washington mesoscale short-range mul-

timodel ensemble system (Grimit and Mass, 2002). They

used normal distributions to model the uncertainty of each

ensemble member, but different distributions may be used,

as well. A plug-in implementing BMA is freely available for

the R statistical software.

Apart from implementation details, several analytical re-

sults can be derived. It can be shown that the posterior BMA

mean and variance are:

Var[θ|x] =?K

+Var[θk|Mk,x]

whereˆθk=E [θ|Mk,x], i.e. the expected value of θ condi-

tional on model k alone, i.e. having assumed p(Mk|x) = 1.

As can be seen from Eq. (6), the expected value is the

weighted average over all models, and the variance is de-

composed into two terms: the first term takes into account

E [θ|x]

=?K

k=1ˆθkp(Mk|x)

??

k=1

ˆθk−?K

i=1ˆθip(Mi|x)

?2+

?

p(Mk|x) ,

(6)

the between-models ensemble variance, i.e. the spread of

the ensemble prediction, while the second term the within-

models ensemble variance, i.e. the internal uncertainty of

each model. Verbally,

Predictive variance = between ens. variance

+withinens. variance

It can be presumed that within-ensemble variance does not

capture all the sources of uncertainty. In an ensemble ap-

proach, the estimation of confidence intervals, based only

on the ensemble spread, may be optimistic, because they do

not properly take into account the internal variability of the

model, so that the output of any predicted variable may be

not calibrated. By calibrated we mean simply that intervals

or events that we claim to have probability p happen a pro-

portionp ofthetimeonaverageinthelongrun. Forexample,

a 90% prediction interval veryfing at a given time and place

is defined so that 90% of verification observations effectively

lay between the 90% upper and lower bounds. Uncalibrated

ensemble predictions tend to be under-dispersive, and this

behavior has often been observed (see Coelho et al., 2004, as

an example of an application of a model ensemble approach

to a climatological problem). Of course, BMA is well cali-

brated on the training dataset, but it has been shown that it

also gives satisfactory results for the predicted observations

(Raftery et al., 2005).

Another interesting result is the correlation of the model

ensemble error with the ensemble spread. Equation (6) pro-

vides a theoretical basis for this finding, since it relates the

predictive model ensemble variance to the between-model

ensemble variance. Whitaker and Loughe (1998) provide

several examples from real-world meteorological ensemble

data, showing the relationship between error and spread; see

also Raftery et al. (2005) for a more-in-depth discussion of

error-spread correlation in BMA modeling.

4 Independence and correlation

If different models are used to simulate the same phe-

nomenon, e.g. weather, climate or the dispersion of radioac-

tivematerial, theyprobablywillgivesimilarresponses. Now,

suppose that all model results agree in giving a wrong predic-

tion; without any observational support, this situation can-

not be discerned. Potentially, model ensemble results may

lead to erroneous interpretations, and this is more probable

if models are strongly dependent (i.e. all biased toward the

wrong answer). We can say that a dependent model does

not convey “newly fresh information”, but it replicates the

(wrong/right) answer given by the previous models.

Technically, independence can be defined by the

joint/marginal probability densities.

p(y1,y2) the joint pdf of two random variables, y1and y2;

denote by p1(y1) the marginal pdf of y1, and similarly for y2.

Then y1and y2are independent if, and only if, the joint pdf

Let us denote by

Atmos. Chem. Phys., 7, 6085–6098, 2007www.atmos-chem-phys.net/7/6085/2007/

Page 5

Riccio et al.: Rational basis of the “median model”6089

is factorizable in the product of the corresponding marginal

pdfs, i.e.

p(y1,y2) = p1(y1)p2(y2).

The extension to any number K of random variables can be

straightforwardly defined, in which case the joint density is

the product of K terms.

This definition can be used to derive an important property

of independent random variables. Given two functions, f1

and f2, we have

(7)

E[f1(y1)f2(y2)] = E[f1(y1)]E[f2(y2)].

This can be easily proved by applying (7).

(8)

E[f1(y1)f2(y2)]

=

?

= E[f1(y1)]E[f2(y2)].

Equality in Eq. (7) means that the statistical properties of

any random variable cannot be predicted from the others; for

example, if a relationship such as y2=f(y1) holds, the joint

pdf is not factorizable because p(y2|y1) ?= p(y2).

In the case of independent random variables the interpre-

tation of BMA weights is meaningful. For example, if we

have three independent models, then

(9)

? ?

f1(y1)f2(y2)p(y1,y2)dy1dy2

?

=

f1(y1)p(y1)dy1

f2(y2)p(y2)dy2

E[π1y1+ π2y2+ π3y3]

= π1E[y1] + π2E[y2] + π3E[y3].

But, if we suppose that the third model is linearly related to

theothers, i.e.y3=a31y1+a32y2, itisstraightforwardtoshow

that

(10)

E[π1y1+ π2y2+ π3y3]

= (π1+ a31π3)E[y1] + (π2+ a32π3)E[y2].

This example shows the difficulties in the interpretation of

BMA weights: if models are linearly dependent, they cannot

be strictly identified.

The concept of independence is central in information

theory, and several measures of independence has been de-

veloped, as for example mutual information or negentropy,

e.g. see Cover and Thomas (1991) or Papoulis (1991).

Usually variables are not independent, but it is possi-

ble to find a proper transformation, say z1=g1(y1,y2) and

z2=g2(y1,y2), so that the transformed variables are inde-

pendent. Unfortunately, there is no general way to select the

proper transformation, nor the mutual information or negen-

tropy can be easily calculated, but, if the definition of inde-

pendence is relaxed, some general and interesting results can

be obtained.

(11)

A weaker form of independence is uncorrelatedness. Two

random variables are uncorrelated if their covariance is zero:

E[y1y2] = E[y1]E[y2],

which follows directly from (8), taking f1(y1)=y1 and

f2(y2)=y2. On the other hand, uncorrelatedness does not

imply independence. For example, as shown by Hyvarinen

and Oja (2000), assume that (y1,y2) are discrete-valued vari-

ables and follow such a distribution that the pairs are, with

probability 1/4, equal to any of the following values: (0,1),

(0,−1), (1,0), (−1,0). Then y1and y2are uncorrelated, as

can be simply calculated, but

(12)

E[y2

1y2

2] = 0 ?=1

4= E[y2

1]E[y2

2].

Because the condition in Eq. (8) is violated, y1and y2are not

independent.

In some special cases, uncorrelatedness implies indepen-

dence. This is the case for normally (or lognormally) dis-

tributed data. For example, denote by ? the covariance ma-

trix of K-dimensional normally distributed data, then

?

If the ys are uncorrelated, ?−1is a diagonal matrix. Then,

by the properties of the exponential function, Eq. (13) can

be written as the product of K functions, each dependent on

only one component, i.e.:

?

K

?

satisfying the definition of independence in Eq. (7). Even if

variables are correlated, they can be made uncorrelated if the

frameofreferenceisproperlyroto-translated. LetU?UT=?

the eigendecomposition of the covariance matrix. The pro-

jection of the original variables onto the directions repre-

sented by the eigenvectors of ?, i.e. (z − ¯ z)=UT(y − ¯ y),

allows to obtain independently distributed variables, as can

be easily proved:

?

= exp

?

See Fig. 1 for a fictitious example of bivariate, normally dis-

tributed, data.

p(y) ∝ exp

−1

2(y − ¯ y)T?−1(y − ¯ y)

?

.

(13)

exp

−1

2(y − ¯ y)T?−1(y − ¯ y)

?

?

=

=

k=1

exp

−1

2(yk− ¯ yk)T?−1

k(yk− ¯ yk)

?

(14)

exp

−1

?

2(y − ¯ y)T?−1(y − ¯ y)

−1

−1

?

(15)

2(y − ¯ y)TU?−1UT(y − ¯ y)

?

= exp

2(z − ¯ z)T?−1(z − ¯ z)

?

.

www.atmos-chem-phys.net/7/6085/2007/ Atmos. Chem. Phys., 7, 6085–6098, 2007

Page 6

6090 Riccio et al.: Rational basis of the “median model”

Figures

−4−2024681012

−4

−2

0

2

4

6

8

10

12

−4−2024681012

−4

−2

0

2

4

6

8

10

12

Fig. 1. An example of bivariate normally distributed data. On the left the data in the original frame

Fig. 1. An example of bivariate normally distributed data. On the

left the data in the original frame of reference; on the right the same

data, projected onto the eigenvectors of the covariance matrix, so

that the two new directions are uncorrelated. The arrows indicate

the axes of the ellipsoid.

of reference; on the right the same data, projected onto the eigenvectors of the covariance matrix, so

that the two new directions are uncorrelated. The arrows indicate the axes of the ellipsoid.

29

Other measures, such as mutual information or negen-

tropy, are much more difficult to calculate than correlations;

so the eigendecomposition of the covariance matrix may be

seen as a viable approximation to explore dependences be-

tween data or highlight the role of systematic deficiencies of

model results, as will be shown in Sect. 6.

5The estimation procedure

Now we have all the elements to proceed with the analysis

of the results of the multi-model ensemble that will consti-

tute our case study. The ensemble analysed in this work

is an extended version of that originally analysed by Gal-

marini et al. (2004b). To summarize we will be looking at 25

simulations of the ETEX-1 release (Girardi et al., 1998) per-

formed by independent groups world wide. Each simulation

andthereforeeachensemblememberisproducedwithdiffer-

ent atmospheric dispersion models and is based on weather

fields generated by (most of the time) different Global Circu-

lation Models (GCM). All the simulation relate to the same

release conditions. For details on the groups involved in

the exercise and the model characteristics refer to Galmarini

et al. (2004b). Nine additional sets are presently available

for this analysis. These include one set of results from the

Danish Meteorological office (DMI), one set from the Ko-

rean Atomic Energy Agency, three sets from the Finnish met

service (FMI), one set from UK-Metoffice, three sets from

Meteo-France. In this study we also took care to mask the

originofthesetsaswearenotinterestedinrankingthemodel

results. However in order to allow for the inter-comparability

of the present results with those previously obtained by Gal-

marini et al. (2004b) we have kept the same coding for the

original 16 members (m1-m16) that were used therein and

added 9 additional codes (m17-m25) for the newly available

sets randomly associated to the new models listed above.

Using the Bayes’ theorem, model parameters can be esti-

mated from the posterior pdf. Hereafter zidenotes the ith

observation and yikthe corresponding predicted value from

−20−15−10−505101520

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Model m20

log(ng/m3)

−20−15−10−505101520

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Model m02

log(ng/m3)

−20−15−10−505101520

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Model m19

log(ng/m3)

−20−15−10−505101520

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Model m12

log(ng/m3)

−20−10010203040

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Model m04

log(ng/m3)

−20−15−10−505101520

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Model m08

log(ng/m3)

Fig.2. Histogram ofthe differencesbetweenmodelresults andcorrespondingobservationsfor some

Fig. 2. Histogram of the differences between model results and

corresponding observations for some selected models. From left

to right, and then from top to bottom: m20, m02, m19, m12, m04

and m08. Logarithms were taken for both the model results and

observations.

selected models. From left to right, and then from top to bottom: m20, m02, m19, m12, m04 and

m08. Logarithms were taken for both the model results and observations.

30

the kth model. The BMA posterior pdf reads

p(θ|π·,y··,z·) =

K

?

k=1

πkp(θk|y·k,z·)

(16)

p(θk|y·k,z·) is the posterior pdf based on model k alone, and

πkis the posterior probability (weight) of model k being cor-

rect given the data, and reflects how well model k fits the

data. θkis the vector of parameters charactering the posterior

pdf of model k.

In BMA it is customary to choose the functions p(·|·)

from the same family; in this work we selected log-normal

functions; so, prior to any analysis, we log-transformed

observations and model-predicted concentrations, originally

expressed as ng/m3.The motivation for this choice was

based on the consideration that “errors” appeared to be log-

normally distributed. In Fig. 2 the histogram of the dif-

ferences between (log-transformed) model results and ob-

servations is shown; as can be seen, some models behave

reasonably well, with data approximately log-normally dis-

tributed around the observations.

of log-normal distributions automatically avoids the prob-

lem of getting finite probabilities for negative concentration

Moreover, the choice

Atmos. Chem. Phys., 7, 6085–6098, 2007www.atmos-chem-phys.net/7/6085/2007/

Page 7

Riccio et al.: Rational basis of the “median model”6091

values. However, there are some models for which devia-

tions from log-normality are pronounced; for example, m08

is extremely diffusive, with a large fraction of results less

than observations (resulting in the negative skewness of the

empirical pdf). Also, note that all these distributions are not

exactly centered on zero, i.e. there is a model-dependent bias.

This is particularly relevant for m04, whose results are sys-

tematically higher than observations.

In order to avoid that a large number of small values exert

a disproportionate influence on BMA results, we discarded

all observations with values less than 10−2ng/m3, close to

thethreshold(10−3ng/m3)oftheanalyticaltechnique; more-

over, model values equal to zero were substituted with very

small values (in order to avoid “-Inf” warnings due to the

application of logarithms).

Markov chain Monte Carlo (McMC) simulation (Gilks et

al., 1996) was used to explore the posterior pdf. The basic

procedure of Monte Carlo simulation is to draw a large set of

?

pdf in this work). One can then approximate the expectation

of any function f(θ) by the sample mean as follows:

samples

θ(l)

k

?L

l=1, from the target distribution (the posterior

E(f) =

?

p(θ|·)f(θ)dθ ≈1

L

L

?

l=1

f(θ(l)),

(17)

L is the number of samples from the target distribution.

In this work we exploited a Gibbs sampler (Geman and

Geman, 1984) to explore the posterior pdf. The Gibbs sam-

pler alternates two major phases: obtaining draws for param-

eters from the posterior pdf of each model, and obtaining

draws for the weights given the model parameters.

In the first phase, we drew a sequence of samples

?

The Gibbs sampler was implemented as follows:

(b(l)

k,σ(l)

k)

?L

l=1for each model k.

for k = 1 : K

Initialize b(1)

for l = 2 : L

draw b(l)

k

and σ(1)

k

kfrom p(bk|σ(l−1)

draw σ(l)

k

end

end

k

,y·k,z·)

from p(σk|b(l)

k,y·k,z·)

By its construction (Gilks et al., 1996), the Gibbs sampler

algorithm guarantees that the chain generates a sequence of

?

tributed.

Having assumed log-normal distributions and spatio-

temporally independent data, the posterior pdf for model k

values

(b(l)

k,σ(l)

k)

?L

l=1which are p(bk,σk|·) identically dis-

is

p(bk,σk|y·k,z·) ∼

n ?

i=1

N (yik− zi,σk)p(bk)p(σk).

(18)

p(bk) and p(σk) are the prior probabilities for the bias and

its covariance.

Weplacedthecustomaryflatprioronthebiasandassumed

a fairly vague prior for the variance, i.e. we assumed that the

prior variance was inverse-gamma distributed with a mean

of 9 and variance of 36. In this case Gibbs sampling is easy

to apply because it can be demonstrated that the conditional

posterior distributions of the Gibbs sampler in the previous

algorithm have canonical forms, i.e. a normal distribution for

the bias and an inverse-gamma for the variance; for a def-

inition of these functions, and how to draw from them, see

Gelman et al. (2003).

In a preliminary test we run three chains in parallel; the

Gelman and Rubin test (Gelman and Rubin, 1992) suggested

that convergence is reached almost immediately (after a few

iterations). We then run a single long (5500 iterations) chain

and conservatively discarded the first 500 iterations, well be-

yond the “burn-in” period suggested by the Gelman and Ru-

bin test. The sample means were estimated from the remain-

ing iterations using Eq. (17), and errors were computed by

batching, to account for the correlation in the Markov chain

(Roberts, 1996). Table 1 shows the posterior values for the

biasandstandarddeviations, alongwiththeirerrors(i.e.stan-

dard deviations calculated from the McMC sequence).

In the second phase, we sampled the posterior distribution

to get a sequence of model weights. If we look at Eq. (16) as

the mixture of K competing models, the estimation process

can be simplified with the introduction of the binary random

variables, ζik, with

If θ(l)

k

ζi= (ζi1,...,ζiK), then the selection of the ‘best’ model in

explaining the ith observation can be viewed as the outcome

of a multinomial random process (Gelman et al., 2003), i.e.

ζik=

1 if the kth model is the ‘best’ model in predicting

the ith observation

0 otherwise.

indicates a shorthand notation for (b(l)

k,σ(l)

k), and

p(ζi) = Multin(ζi|pi1,...,piK)

=

ζi1ζi2···ζiK

The factors piks in Eq. (19), are the posterior pdf values of

each model, re-normalized so that their sum over index k is

equal to 1, i.e.

?

1

?

pζi1

i1···pζiK

iK.

(19)

pik=

p(θ(l)

k|yik,zi)

k=1p(θ(l)

?K

k|yik,zi)

,

(20)

which coincides with the Bayes’ factor for the kth model in

explaining the ith observation in Eq. (4).

www.atmos-chem-phys.net/7/6085/2007/Atmos. Chem. Phys., 7, 6085–6098, 2007

Page 8

6092Riccio et al.: Rational basis of the “median model”

Fig. 3. Comparisonbetween observations(left) and predictions (right)made by m04 at hours T0+12

concentrations are expressed as ng/m3, while m04 results as g/m3.

and T0+24. Note that observed concentrations are expressed as ng/m3, while m04 results as g/m3.

31

Fig. 3. Comparison between observations (left) and predictions (right) made by m04 at hours T0+12 and T0+24. Note that observed

From the properties of a multinomial random process, a

drawforζifrom(19)isavectorwithK−1componentsequal

to zero, and one component (that corresponding to the “best”

model) equal to one, i.e.?K

pik, given by Eq. (20).

The selection process was repeated for each observation

and iterated for each θ(l)

k

sample, as implemented in the fol-

lowing algorithm:

k=1ζik= 1 for any i. Each model

has a probability to be selected as the ‘best’ model equal to

for l = 1 : L

Set θ(l)

for i = 1 : N

set pikfor any k as in Eq. (20)

draw ζ(l)

i

from p(ζi|pi1,...,piK)

end

end

k

= (b(l)

k,σ(l)

k)

Table 1 shows the expected values (with their standard devi-

ations) of the fraction of times each model is selected as the

“best” model, averaged over all McMC iterations.

Computational costs are negligible; 100 iterations took

about 12s (using Matlab as computing environment installed

on a PC with an Intel Centrino Core2 T7200@2GHz CPU

and 2048M of main memory), so that the whole estimation

process can be accomplished within a few minutes. This

makes this data analysis framework suitable for real-time ap-

plications, too.

6 Results

Essentially, the objectives of this work consist in the:

– evaluation of the BMA weights, in order to sort the pre-

dictive skill of models;

– quantification of the systematic bias of each model;

– estimation of some useful statistical indexes, e.g. APL

(Above Percentile Level) or ATL (Above Threshold

Level), introduced in Galmarini et al. (2004a, b),

– exploration of similarities and differences between our

approach and the “median model”,

– quantification of the correlations between models, as a

measure of interdependency.

Atmos. Chem. Phys., 7, 6085–6098, 2007www.atmos-chem-phys.net/7/6085/2007/

Page 9

Riccio et al.: Rational basis of the “median model”6093

Fig. 3. (continued) Comparison between observations (left) and predictions (right) made by m04 at

that observed concentrations are expressed as ng/m3, while m04 results as g/m3.

hours T0+36, T0+48 and T0+60. Note that observed concentrations are expressed as ng/m3, while

m04 results as g/m3.

32

Fig. 3. (Continued.) Comparison between observations (left) and predictions (right) made by m04 at hours T0+36, T0+48 and T0+60. Note

We will show that the results of our theoretical framework

provides an answer to all these items. The results of the op-

timization procedure are reported in Table 1.

As can be seen, the a posteriori values of the weights can

be clustered in several groups: the majority of model weights

are close to the a priori value (1/25=0.04); a second group

(models m04 and m08) present a below-average value. Cor-

respondingly, there is a group of three models: m02, m19

and m20 (and to a lesser extent model m12, too), for which

the weights are significantly higher than the a priori value.

The bias reported in Table 1 is a measure of how much (on

the log-scale) the model predicted values should be shifted

so that their mean value coincide with the mean value of ob-

servations. It can be noted that model m04 largely overesti-

mates the observations, with a mean bias of about 11.6 on the

log scale (remember that an additive bias on the log-scale is

equivalent to a multiplicative bias on the linear scale). Also,

note that the standard deviation of this bias is considerably

larger than those of other models, suggesting that probably

something went wrong with this model. As Fig. 3 shows, the

physics of dispersion has been qualitatively captured, but,

during the first hours after release, the predicted values are

extremely high (with a concentration as high as 6g/m3close

to the site of release), due to a problem with the source emis-

sion strength as pointed out in Galmarini et al. (2004b). The

differences between model results and observations tend to

disappear during the day after the release, but the highest

concentration is predicted over Poland instead of Denmark,

as shown by Fig. 3.

Models tend to underestimate observations: the overall

mean bias, excluding model m04, is −0.91, corresponding to

a shrinking factor of about 0.4; even if m04 is included, the

overall mean bias remains negative, i.e. −0.32. It can also

be shown that the bias is not uniformly distributed over time:

www.atmos-chem-phys.net/7/6085/2007/ Atmos. Chem. Phys., 7, 6085–6098, 2007

Page 10

6094Riccio et al.: Rational basis of the “median model”

Fig. 4. 50th APL from equation (21) (left column), observations (middle column), and 50th APL

from Galmarini et al. (2004b), at T0+24 (uppermost row), T0+48 (middle row) and T0+60 (lowermost row).

from the “Median Model” (right column) adapted from Galmarini et al. (2004b), at T0+24 (upper-

models generally tend to overestimate observations close to

time of release, and underestimate observations during the

day after. We can conjecture that the well-known deficien-

cies of Eulerian models in correctly representing the sub-

grid effects, and the extra-diffusion introduced by numer-

ical approaches, play an important role in determining the

time tendency of the bias. However, our statistical analysis

is not powerful enough to gain an insight into these physi-

cal/numerical aspects.

The sampled weights and parameters can be used to calcu-

late some useful statistics, e.g. APL (Above Percentile Level)

or ATL (Above Threshold Level).

In Galmarini et al. (2004a), the APLp(x,y,t) is defined as

the pth percentile from the K models at a specific time t and

spatial location (x,y). The APLp(·,·,t) can be graphically

represented as a two-dimensional surface, e.g. see Fig. 6 in

Galmarini et al. (2004a).

The expected value of this index can be straightforwardly

estimated from the BMA results, too. For example, the ex-

pected APL50is the concentration c?so that

?log(c?)

for any spatio-temporal location denoted by index i.

most row), T0+48 (middle row) and T0+60 (lowermost row).

33

Fig. 4. 50th APL from Eq. (21) (left column), observations (middle column), and 50th APL from the Median Model (right column) adapted

K

?

k=1

πk

−∞

p(bk,σk|yik,zi)d log(c) = 0.5(21)

It is worth noticing that this value coincides with the

APL50 index defined in Galmarini et al. (2004a) if a weight

equal to 1/K, a bias equal to zero and small standard devi-

ations equal for all models were used in Eq. (21), that is if

the a priori values for weights and parameters were used and

uncertainties were ignored.

Figure 4 shows the APL50index calculated from Eq. (21),

compared with observations and the APL50 adapted from

Galmarini et al. (2004b). As can be seen, the APL50index

from Eq. (21) substantially gives the same results as those

from Galmarini et al. (2004b); roughly speaking, this is due

to the fact that weights are approximately the same for the

majority of models, and there are largely compensating ef-

fects between the bias of the different models, so that this en-

semble analysis indicates a complementarity between model

results.

The evidence for complementarity of model results is also

supported by the following result. Figure 5 plots the contri-

bution of each model in determining the BMA median val-

ues. For each model, we calculated the following integral:

1

n

n

?

i=1

?log(c?)

−∞

πkp(bk,σk|yik,zi)d log(c) ,

Atmos. Chem. Phys., 7, 6085–6098, 2007www.atmos-chem-phys.net/7/6085/2007/

Page 11

Riccio et al.: Rational basis of the “median model”6095

Table 1. Model weights, bias and standard deviations estimated by

the BMA optimization procedure. The corresponding uncertainties

(standard deviations) of each parameter are reported within paren-

thesis. The bias and standard deviations are estimated on the log-

scale. Each model is tagged with an integer number shown in the

first column.

#Weight BiasStd.Dev.

m01

m02

m03

m04

m05

m06

m07

m08

m09

m10

m11

m12

m13

m14

m15

m16

m17

m18

m19

m20

m21

m22

m23

m24

m25

0.0387 (±0.0041)

0.0642 (±0.0055)

0.0365 (±0.0041)

0.0109 (±0.0022)

0.0398 (±0.0043)

0.0415 (±0.0043)

0.0375 (±0.0042)

0.0162 (±0.0027)

0.0353 (±0.0041)

0.0413 (±0.0044)

0.0359 (±0.0040)

0.0503 (±0.0048)

0.0425 (±0.0044)

0.0358 (±0.0040)

0.0393 (±0.0043)

0.0430 (±0.0045)

0.0294 (±0.0037)

0.0410 (±0.0043)

0.0538 (±0.0049)

0.0694 (±0.0055)

0.0399 (±0.0042)

0.0462 (±0.0045)

0.0357 (±0.0041)

0.0397 (±0.0043)

0.0360 (±0.0040)

−0.15 (±0.04)

0.53 (±0.03)

−0.73 (±0.05)

11.63 (±0.17)

−2.65 (±0.05)

−2.10 (±0.04)

−0.64 (±0.05)

−2.38 (±0.14)

−1.01 (±0.05)

0.59 (±0.04)

−0.57 (±0.05)

0.37 (±0.04)

−0.61 (±0.04)

−1.50 (±0.05)

−2.45 (±0.05)

−0.52 (±0.04)

−0.59 (±0.07)

−0.11 (±0.04)

0.73 (±0.03)

−2.00 (±0.03)

−2.04 (±0.04)

−0.95 (±0.03)

−1.35 (±0.05)

−1.87 (±0.04)

−3.15 (±0.05)

2.8 (±0.03)

1.77 (±0.02)

2.95 (±0.03)

11 (±0.12)

2.9 (±0.03)

2.77 (±0.03)

3.26 (±0.04)

9.76 (±0.11)

3.1 (±0.03)

2.76 (±0.03)

3.01 (±0.03)

2.27 (±0.03)

2.53 (±0.03)

3.06 (±0.04)

2.91 (±0.03)

2.56 (±0.03)

4.21 (±0.05)

2.79 (±0.03)

2.09 (±0.02)

1.62 (±0.02)

2.81 (±0.03)

2.31 (±0.03)

3.42 (±0.04)

2.78 (±0.03)

3.39 (±0.04)

where c?the the median concentration calculated from (21)

and n is the number of distinct spatio-temporal locations.

Apart from models m04 and m08 which contribute to a lesser

extent, and model m20 which contribute to a greater extent,

all other models contribute with similar proportions. There-

fore, at different times and/or spatial locations, models alter-

natively contribute to define the BMA median result, without

no clear dominant subset. This result reflects very closely

that found by Galmarini et al. (2004b).

Also, it should be stressed that the specific values of

weights may depend on the selected database, as well as

on the assumptions exploited in this work (e.g log-normal

deviations of model predictions from observations); differ-

ences in the relative performance are expected using differ-

ent databases and/or implicit assumptions. However, there is

no reason to assume that the ETEX-1 database acts as a ‘spe-

cial’ case, and we expect that models will continue to behave

in a well-balanced manner also using other databases. Re-

sults from the ENSEMBLE project (Galmarini et al., 2004a)

suggest this is indeed the case.

123456789 10 111213 141516171819202122232425

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Fig. 5. Contribution of each model to the determination of the BMA 50th percentile. Values are

Fig. 5. Contribution of each model to the determination of the BMA

50th percentile. Values are normalized so that their sum is equal to

one. The numbers of the x-axis indicate the model tags.

normalized so that their sum is equal to one. The numbers of the x-axis indicate the model tags.

34

We can move a step forward the analysis of differences

and similarities between the BMA approach and the Median

Model, by exploring the distribution of the latent variables

ζik. As can be seen from Eq. (19), the vector of latent vari-

ables {ζi1,...,ζiK} is sampled from a multinomial distribu-

tion, where each member has a probability to be “extracted”

equal to pik, given by Eq. (20). pikmeasures the “distance”

of the value predicted by the kth model from the correspond-

ing ith observation, so the kth model is selected with a low

probability if it is farther than other models from the ith ob-

servation. We can explore the distribution of the ζikto search

for any systematic structure.

This kind of analysis provides information analogous to

the ATL or Space Overlap index. In Galmarini et al. (2004b)

the ATL is defined as the surface given by the normalized

number of models that, at a given time, predict values above

a given threshold ct, namely

ATL(x,y,t)=100

?δk=1

K

K

?

k=1

δk

where

if ck(x,y,t) ≥ ct

otherwise

δk=0

(22)

An analogous information can be deduced from the ζikvari-

ables, too. WedefinethePBS(ProbabilityofBeingSelected)

index as follow

PBSik= 1 −1

L

L

?

l=1

ζ(l)

ik,

(23)

where L is the total number of McMC iterations. This index

is close to 0 if model k performs much better than the other

modelsinexplainingtheithobservation, i.eifthemeanvalue

of ζiktends to 1; conversely, it tends to 1 if model k is one

www.atmos-chem-phys.net/7/6085/2007/Atmos. Chem. Phys., 7, 6085–6098, 2007

Page 12

6096Riccio et al.: Rational basis of the “median model”

Fig. 6. The PBS index for m08. The areas for which PBS ≥ 0.985 have been contoured with black

solid lines.

35

Fig. 6. The PBS index for m08. The areas for which PBS≥0.985

have been contoured with black solid lines.

of the worst model in explaining the ith observation. Fig-

ure 6 shows the PBS index for m08. A PBS average value of

about 0.98(≈1.0−0.016) can be deduced for this model (see

Table 1).

In Fig. 6 the areas for which PBS ≥0.985 have been con-

toured with black lines; the result is a “leopardized” struc-

ture.The leopard-like spots are due to the fact that we

have not introduced any physical information is our sampling

strategy: obviously model’s results are spatio-temporally

correlated, so we could expect a smoothly varying surface

of the PBS index, but in Eq. (18) we implicitly assumed

that model results are independently distributed in space and

time. Notwithstanding this lack of physical coherence, there

are some remarkable structures: the “bump” protruding over

the Scandinavian region and that over Eastern Romania. It

can be shown that these spots are due to high model concen-

trations which are not represented, neither by observations

nor by the majority of other model results. This finding has

already been outlined by Galmarini et al. (2004b) using the

ATL index. They showed that the protrusion over the Scan-

dinavian area corresponds to ATL ≈1, i.e. a characteristic

showed only by m08 (see Figs. 3 and 4 in Galmarini et al.,

2004b).

As a final example of the potentialities of this approach,

we analyze the information that can be gained from the

eigendecomposition of the covariance matrix. Each model

in (19) can be independently selected from all others; how-

ever, models cannot be completely independent since they

simulate the same phenomenon, described by well defined

physical laws. As explained in Sect. 4, a viable approxima-

tion to quantify dependences among models is correlation.

To this aim, we changed model Eq. (18) to

p(b,?|y··,z·) ∼

n ?

i=1

N (yi·− zi,?)p(b)p(?).

(24)

Table 2. Components of some selected eigenvectors of the esti-

mated covariance matrix. Values greater than 0.35 have been re-

ported as bold. See the text for more details.

#Eig. 1Eig. 2 Eig. 22Eig. 23Eig. 24Eig. 25

m01

m09

m18

m02

m23

m13

m14

m03

m17

m04

m05

m10

m12

m06

m19

m11

m15

m07

m20

m16

m08

m21

m22

m24

m25

−0.090

−0.050

−0.067

−0.040

−0.047

−0.065

−0.090

−0.032

−0.107

−0.818

−0.001

−0.092

−0.052

−0.021

−0.050

−0.065

−0.066

−0.064

−0.018

−0.076

−0.498

−0.036

−0.054

−0.037

−0.035

0.013

−0.032

−0.035

−0.037

−0.064

−0.014

0.011

−0.036

−0.004

0.532

−0.041

−0.014

−0.040

−0.018

−0.038

−0.027

0.013

−0.014

−0.019

−0.003

−0.836

−0.018

−0.016

−0.017

0.010

−0.001

0.035

−0.024

0.070

−0.000

0.110

0.018

0.040

−0.026

−0.003

−0.044

0.186

0.124

0.148

−0.028

0.031

0.029

−0.016

−0.004

0.186

−0.004

0.083

−0.924

0.045

−0.064

0.129

−0.003

-0.010

−0.150

0.040

0.160

−0.062

−0.022

−0.025

−0.010

−0.021

−0.023

0.620

0.019

−0.671

−0.013

−0.044

0.027

−0.249

0.035

0.003

−0.099

0.107

0.064

0.026

−0.025

0.032

0.007

0.834

−0.028

0.016

0.034

−0.009

−0.005

0.002

−0.011

−0.012

−0.198

−0.000

−0.454

0.019

0.000

−0.041

0.217

−0.007

−0.009

0.000

0.052

−0.028

−0.045

−0.011

0.037

0.010

−0.372

−0.013

−0.040

−0.017

0.030

0.028

0.010

−0.020

−0.001

0.014

−0.024

−0.261

0.020

0.003

−0.008

0.884

−0.021

0.014

0.031

−0.028

−0.030

−0.039

where now ? is the K−dimensional matrix of covariances

between models. p(?) is the prior pdf for ?, for which we

chose a non-informative inv-Wishart distribution.

The analysis of the expected values of the covariance ma-

trix says what models show correlated deviations from the

observations.

As shown in Eq. (15) and Fig. 1, the eigenvectors of

the covariance matrix correspond to the directions of inde-

pendent components if data are normally (or log-normally)

distributed. The magnitudes of the components of each

eigenvector immediately say to what extent each model con-

tributes to that independent component.

In Table 2 we report the eigenvectors corresponding to the

two largest eigenvalues (Eig. 1 and Eig. 2) and to the three

smallest eigenvalues (Eig. 23, Eig. 24 and Eig. 25). As can

be seen, the first two eigenvectors are dominated by the com-

ponents corresponding to m04 and m08, and all other models

have negligible projections on these two vectors.

The first two eigenvalues (data not shown) explain about

61% of the total variance; of course this is not surprising,

since, as can be seen from table 1, m04 and m08 are asso-

ciated with the largest variances. This means that, not only

m04 and m08 are associated with a great bias, but they also

significantly co-vary (i.e. the spatio-temporal pattern of their

bias is similar) and are not significantly correlated with all

other models, because their projection over the successive

eigenvectors is negligible.

Atmos. Chem. Phys., 7, 6085–6098, 2007www.atmos-chem-phys.net/7/6085/2007/

Page 13

Riccio et al.: Rational basis of the “median model”6097

It is worth noticing that, while models m04 and m08 are

positively correlated along the direction of the first eigenvec-

tor (components with the same sign), they are negatively cor-

related along the direction of the second eigenvector. This is

due to the fact that model m08 is extremely diffusive, so that

it predicts positive concentrations even where model m04

shows zero values (remember that model m04 predicts ex-

tremely high values on the mean); the first set of data is clus-

tered along the first eigenvector, and the second set along the

second eigenvector.

There are also significant correlations between models

m02 and m19 and models m02 and m20; Eig. 23 also shows

that model m19 is significantly correlated with model m12.

Remember that these models show the highest BMA weights

(see Table 1). The data from all other models are projected

more uniformly among the remaining eigenvectors.

We conjecture that models m02, m19 and m20 perform

better than the others because their data share a similar

spatio-temporal pattern, and this similarity is highlighted by

the significant correlations between their bias.

In a model selection perspective, the analysis of the co-

variance matrix can be used to pick those models showing

independent features. If a model would be sacrificed, it is

better to discard a model with a low BMA weight and well

correlated with other models.

7Conclusions and final considerations

The results presented in the previous section highlight the

advantages of the BMA framework:

1. the weights provide the quantitative basis to judge if

there is an “outlier model”, but, instead of disregard-

ing its values, they are bias-corrected, weighted and

included in the final analysis satisfying an optimality

criterion, i.e. so that the posterior probability is maxi-

mized;

2. the McMC approach provides the way to quantify the

uncertainties of each estimated parameter, so that any

decision making or regulatory-purpose activity, can be

supported by an adequate uncertainty analysis;

3. a deeper analysis, based on the distribution of un-

observed indicators, ζik, allows to detect the outliers

among the model-predicted values, i.e. a very low mean

value of ζikindicates that the ith observation is very dif-

ferent from the kth model-predicted value. This analy-

sis can be projected onto the physical space/time, thus

playing a role similar to several other statistical indexes,

e.g. the Agreement in Threshold Level or Space Over-

lap, originally introduced in Galmarini et al. (2004a;

2004b);

4. the analysis of the covariance matrix can be used to in-

spect the similarities and/or differences between model

results. We can look at the values projected onto the

eigenvectors of the covariance matrix as “orthogonal”

data, i.e. data forecast by independent models, whose

variations cannot be explained by the other components.

In a model selection perspective, the number of inde-

pendent model can be selected as those associated with

the most “interesting” (uncorrelated) directions.

As outlined in Galmarini et al. (2004b), the “Median

Model” results provide an estimate that is superior to any

single deterministic model simulation, with obvious benefits

for regulatory-purpose applications or for the support to

decision making. We can look at our ensemble analysis as

the a posteriori justification of the Median Model results.

Edited by: R. Vautard

References

Berliner, L. M.: Physical-statistical modeling in geophysics, J. Geo-

phys. Res., 108, 8776, doi:10.1029/2002JD002865, 2003.

Clyde, M. A.: Bayesian model averaging and model search strate-

gies (with Discussion), in: Bayesian Statistics 6, edited by:

Bernardo, J. M. et al., 157–185, Oxford University Press, Ox-

ford, 1999.

Clyde, M. A. and George, E. I.: Flexible empirical Bayes estimation

for wavelets, J. R. Stat. Soc. Ser. A–G, 62, 681–698, 2000.

Coelho, C. A. S., Pezzulli, S., Balmaseda, M., Doblas-Reyes, F. J.,

and Stephenson, D. B.: Forecast Calibration and Combination:

A Simple Bayesian Approach for ENSO, J. Climate, 17, 1504–

1516, 2004.

Cover, T. M. and Thomas, J. A.: Elements of Information Theory,

Wiley, 1991.

Delle Monache, L. and Stull, R. B.: An ensemble air-quality fore-

cast over western Europe during an ozone episode, Atmos. Env-

iron., 37, 3469–3474, 2003.

Delle Monache, L., Deng, X., Zhou, Y., and Stull, R.: Ozone en-

semble forecasts: 1. A new ensemble design, J. Geophys. Res.,

111, D05307, doi:10.1029/2005JD006310, 2006a.

Delle Monache, L., Nipen, T., Deng, X., Zhou, Y., and

Stull, R. B.: Ozone ensemble forecasts: 2. A Kalman fil-

ter predictor bias correction, J. Geophys. Res., 111, D05308,

doi:10.1029/2005JD006311, 2006b

Delle Monache, L., Hacker, J. P., Zhou, Y., Deng, X., and Stull,

R. B.: Probabilistic aspects of meteorological and ozone re-

gional ensemble forecasts, J. Geophys. Res., 111, D24307,

doi:10.1029/2005JD006917, 2006c.

Dijkstra, T. K.: On Model Uncertainty and its Statistical Implica-

tions, Springer Verlag, Berlin, 1988.

Fritch, J. M., Hilliker, J., Ross, J., and Vislocky, R. L.: Model con-

sensus, Weather Forecast, 15, 571–582, 2000.

Galmarini, S., Bianconi, R., Bellasio, R., and Graziani, G.: Fore-

casting consequences of accidental releases from ensemble dis-

persion modelling, J. Environ. Radioactiv., 57, 203–219, 2001.

Galmarini, S., Bianconi, R., Klug, W., Mikkelsen, T., Addis, R., An-

dronopoulos, S., Astrup, P., Baklanov, A., Bartniki, J., Bartzis, J.

C., Bellasio, R., Bompay, F., Buckley, R., Bouzom, M., Cham-

pion, H., D’Amours, R., Davakis, E., Eleveld, H., Geertsema, G.

www.atmos-chem-phys.net/7/6085/2007/Atmos. Chem. Phys., 7, 6085–6098, 2007

Page 14

6098Riccio et al.: Rational basis of the “median model”

T., Glaab, H., Kollax, M., Ilvonen, M., Manning, A., Pechinger,

U., Persson, C., Polreich, E., Potemski, S., Prodanova, M., Salt-

bones, J., Slaper, H.,Sofief, M. A., Syrakov, D., Sorensen, J. H.,

Van der Auwera, L., Valkama, I., and Zelazny, R.: Ensemble

dispersion forecasting–Part I: concept, approach and indicators,

Atmos. Environ., 38, 4607–4617, 2004a.

Galmarini, S., Bianconi, R., Addis, R., Andronopoulos, S., Astrup,

P., Bartzis, J. C., Bellasio, R., Buckley, R.,Champion, H., Chino,

M., D’Amours, R., Davakis, E., Eleveld, H., Glaab, H., Man-

ning, A., Mikkelsen, T., Pechinger, U., Polreich, E., Prodanova,

M., Slaper, H., Syrakov, D., Terada, H., and Van der Auwera, L.:

Ensemble dispersion forecasting–Part II: application and evalua-

tion, Atmos. Environ., 38, 4619–4632, 2004b.

Gelman, A. and Rubin, D. B.: Inference from iterative simulation

using multiple sequences, Stat. Sci., 7, 457–472, 1992.

Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B.: Bayesian

Data Analysis, Chapman and Hall/CRC, Boca Raton, Florida,

2003.

Geman, S. and Geman, D.: Stochastic relaxation, Gibbs distribu-

tions and the Bayesian restoration of images, Transactions on

Pattern Analysis and Machine Intelligence, 6(6), 721–741, 1984.

Gilks, W. R., Richardson, S., and Spiegelhalter, D. J.: Markov

Chain Monte Carlo in Practice, Chapman and Hall/CRC, Boca

Raton, Florida, 1996.

Girardi, F., Graziani, G., van Veltzen, D., Galmarini, S., Mosca,

S., Bianconi, R., Bellasio, R., and Klug, W.: The ETEX project.

EUR Report 181-43 EN. Office for official publications of the

European Communities, Luxembourg, 108pp., 1998.

Grimit, E. P. and Mass, C. F.: Initial results of a mesoscale short-

range ensemble forecasting system over the Pacific Northwest,

Weather Forecast, 17, 192–205, http://isis.apl.washington.edu/

bma/index.jsp, 2002.

Hyv¨ arinen, A. and Oja, E.: Independent Component Analysis:

Algorithms and Applications, Neural Networks, 13, 411–430,

2000.

Hou, D., Kalnay, E., and Droegemeier, K. K.: Objective verification

of the SAMEX‘98 ensemble forecast, Mon. Weather Rev., 129,

73–91, 2001.

Jeffreys, H.: Theory of Probability, 3rd Edition, Oxford University

Press, 1961.

Krishnamurti, T. N., Kishtawal, C. M., Zhang, Z., LaRow, T., Ba-

chiochi, D., Williford, E., Gadgil, S., and Surendran, S.: Mul-

timodel ensemble forecasts for weather and seasonal climate.

Mon. Weather Rev., 116, 907–920, 2000.

Mallet, V. and Sportisse, B: Ensemble-based air quality forecasts:

A multimodel approach applied to ozone, J. Geophys. Res., 111,

D18302, doi:10.1029/2005JD006675, 2006.

Molteni, F., Buizza, R., Palmer, T. N., and Petroliagis, T.: The

ECMWF ensemble system: Methodology and validation, Q. J.

Roy. Meteor. Soc., 122, 73–119, 1996.

Pagowski, M., Grell, G. A., McLeen, S. A., et al., 2005: A simple

method to improve ensemble-based ozone forecasts, Geophys.

Res. Lett., 32, L07814, doi:10.1029/2004GL022305

Pagowski, M., Grell, G. A., Devenyi, D., Peckham, S. E., McKeen,

S. A., Gong, W., Delle Monache, L., McHenry, J. N., McQueen,

J., and Lee, P.: Application of dynamic linear regression to im-

prove the skill of ensemble-based deterministic ozone forecasts,

Atmos. Environ., 40, 3240–3250, 2006a

Pagowski, M. and Grell, G. A.: Ensemble-based ozone forecasts:

Skill and economic value, J. Geophys. Res., 111, D23S30,

doi:10.1029/2006JD007124, 2006b.

Papoulis, A.: Probability, Random Variables, and Stochastic Pro-

cesses, Mc-Graw-Hill, 1991.

Raftery, A. E., Madigan, D. and Hoeting, J. A.: Model selection and

accounting for model uncertainty in linear regression models, J.

Am. Stat. Assoc., 92, 179–191, 1997.

Raftery, A. E.and Zheng, Y.: Long-run performance of Bayesian

model averaging, J. Am. Stat. Assoc., 98, 931–938, 2003.

Raftery, A. E., Gneiting, T., Balabdaoui, F., and Polakowski, M.:

Using Bayesian Model Averaging to Calibrate Forecast Ensem-

bles, Mon. Weather Rev., 133, 1155–1174, 2005.

Roberts, W. R.: Markov chain concepts related to sampling algo-

rithms, in: Markov Chain Monte Carlo in Practice, edited by:

Gilks, W. R., Richardson, S., and Spiegelhalter, D. J., Chapman

and Hall, 45–57, 1996.

Toth, Z. and Kalnay, E.: Ensemble forecasting at the NMC: The

generation of perturbations, B. Am. Meteorol. Soc., 74, 2317–

2330, 1993.

Viallefont, V., Raftery, A. E., and Richardson, S.: Variable selection

and Bayesian model averaging in case-control studies, Statistics

in Medicine, 20, 3215–3230, 2001.

Whitaker, J. S. and Loughe, A. F.: The relationship between En-

semble Spread and Ensemble Mean Skill, Mon. Weather Rev.,

126, 3292–3302, 1998.

Zhang, F., Bei, N., Nielsen-Gammon, J. W., Li, G., Zhang, R., Stu-

art, A., and Aksoy, A.: Impacts of meteorological uncertainties

on ozone pollution predictability estimated through meteorolog-

ical and photochemical ensemble forecasts, J. Geophys. Res.,

112, D04304, doi:10.1029/2006JD007429, 2007.

Atmos. Chem. Phys., 7, 6085–6098, 2007www.atmos-chem-phys.net/7/6085/2007/

#### View other sources

#### Hide other sources

- Available from psu.edu
- Available from Angelo Riccio · Jun 1, 2014