Page 1

BiometricsDOI: 10.1111/j.1541-0420.2007.01039.x

Bayesian Distributed Lag Models: Estimating Effects of Particulate

Matter Air Pollution on Daily Mortality

L. J. Welty,1,∗R. D. Peng,2S. L. Zeger,2and F. Dominici2

1Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine,

680 North Lake Shore Drive, Suite 1102, Chicago, Illinois 60611, U.S.A.

2Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health,

615 North Wolfe Street, Baltimore, Maryland 21205, U.S.A.

∗email: lwelty@northwestern.edu

Summary.

ables as covariates; its corresponding distributed lag (DL) function describes the relationship between the

lag and the coefficient of the lagged exposure variable. DLagMs have recently been used in environmental

epidemiology for quantifying the cumulative effects of weather and air pollution on mortality and morbid-

ity. Standard methods for formulating DLagMs include unconstrained, polynomial, and penalized spline

DLagMs. These methods may fail to take full advantage of prior information about the shape of the DL

function for environmental exposures, or for any other exposure with effects that are believed to smoothly

approach zero as lag increases, and are therefore at risk of producing suboptimal estimates. In this article,

we propose a Bayesian DLagM (BDLagM) that incorporates prior knowledge about the shape of the DL

function and also allows the degree of smoothness of the DL function to be estimated from the data. We

apply our BDLagM to its motivating data from the National Morbidity, Mortality, and Air Pollution Study

to estimate the short-term health effects of particulate matter air pollution on mortality from 1987 to 2000

for Chicago, Illinois. In a simulation study, we compare our Bayesian approach with alternative methods

that use unconstrained, polynomial, and penalized spline DLagMs. We also illustrate the connection be-

tween BDLagMs and penalized spline DLagMs. Software for fitting BDLagM models and the data used in

this article are available online.

A distributed lag model (DLagM) is a regression model that includes lagged exposure vari-

Key words: Air pollution; Bayes; Distributed lag; Mortality; NMMAPS; Penalized splines; Smoothing;

Time series.

1. Introduction

Distributed lag models (DLagMs; Almon, 1965) are regression

models that include lagged exposure variables, or distributed

lags (DLs), as covariates. They have recently been employed

in environmental epidemiology for estimating short-term cu-

mulative effects of environmental exposures on daily mortal-

ity or morbidity (e.g., Pope et al., 1991; Pope and Schwartz,

1996; Braga et al., 2001; Zanobetti et al., 2002; Kim, Kim,

and Hong, 2003; Bell McDermott, Zeger, Samet, and Do-

minici, 2004; Goodman, Dockery, and Clancy, 2004; Welty

and Zeger, 2005). DLagMs are specialized types of varying-

coefficient models (Hastie and Tibshirani, 1993) and dynamic

linear models (Ravines, Schmidt, and Migon, 2006).

For Poisson log-linear DLagMs that estimate the effects

of lagged air pollution levels on daily mortality counts, the

sum of the DL coefficients is interpreted as the percentage

increase in daily mortality associated with a one unit in-

crease in air pollution on each of the previous days. Because

the time from exposure to event will almost certainly vary in

a population, this sum is a more appropriate measure of the

effect of short-term exposure than a single day’s coefficient.

Results from previous time series studies suggest that com-

pared to DLagMs, models with single day pollution exposures

might underestimate the risk of mortality associated with air

pollution (Schwartz, 2000; Zanobetti et al., 2003; Goodman

et al., 2004; Roberts, 2005).

Exposure variables, such as ambient air pollution levels,

may be highly correlated over time, making DL coefficients

difficult to estimate. A general solution is to constrain the co-

efficients as a function of lag. Common constraints include a

polynomial (Almon, 1965) or a spline (Corradi, 1977). Esti-

mating DLagMs as varying-coefficient models constrains the

coefficients to follow a natural cubic spline (Hastie and Tib-

shirani, 1993). The DL function for air pollution and mor-

tality has been estimated with polynomial constraints (e.g.,

Schwartz, 2000, Braga et al., 2001; Kim et al., 2003; Bell,

Samet, and Dominici, 2004; Goodman et al., 2004), spline

constraints (Zanobetti et al., 2000), and without constraints

(Zanobetti et al., 2003).

Each type of constraint on the DL coefficients is an appli-

cation of prior knowledge to model specification. In the con-

text of air pollution and mortality, prior knowledge suggests

that short-term risk of mortality varies smoothly as a func-

tion of lag and decreases to zero. Prior knowledge about the

effects of air pollution on mortality at early lags is limited.

There may be short delays in health effects after exposure,

C ? 2008, The International Biometric Society

1

Page 2

2

Biometrics

as suggested by studies of single day pollution exposures that

find the largest effect on mortality at lag day 1 (Zmirou et al.,

1988; Katsouyanni et al., 2001; Dominici et al., 2003). In the

scenario of mortality displacement (Schimmel and Murawsky,

1978), in which high air pollution levels may advance by sev-

eral days the deaths of frail individuals, the DL function may

be zero or positive at early lags, then decrease and become

negative (Zanobetti et al., 2000, 2002). If there were both a

delay in health effect and mortality displacement, hypotheses

concerning the sign or smoothness of the DL function at early

lags would be tenuous at best.

For more appropriate model specification and improved es-

timation, it may be advisable to formulate DLagMs so that

(i) coefficients are constrained to approach zero smoothly

with increasing lag and (ii) early coefficients are relatively

unconstrained. Neither polynomial nor spline constraints, the

most common methods for specifying DLagMs, include this

prior information in estimation. In this article, we develop

Bayesian DLagMs (BDLagMs) that incorporate our under-

standing of the relationship between short-term fluctuations

of particulate matter (PM) air pollution and daily fluctuations

in mortality counts. Our prior distribution specifies that as

lag increases, the DL function will have increasing smooth-

ness and approach zero. An advantage of our approach is

that the degree of smoothness of the DL function is estimated

from the data. We note that BDLagMs have been explored in

economics (e.g., Leamer, 1972; Schiller, 1973; Ravines et al.,

2006), and autoregressive priors have been used generally to

smooth time-dependent coefficients in generalized linear mod-

els (e.g., Fahrmeir and Knorr-Held, 1997; Manda and Meyer,

2005). However, our prior is quite different from those using

a constant degree of smoothness (Schiller, 1973), a particu-

lar parametric form (Leamer, 1972; Ravines et al., 2006), or

an autoregressive structure (e.g., Fahrmeir and Knorr-Held,

1997; Manda and Meyer, 2005).

We apply our BDLagM to data from the National Mor-

bidity, Mortality, and Air Pollution Study (NMMAPS) to es-

timate the shape of the DL function between daily PM and

daily deaths for Chicago, Illinois from 1987 to 2000. We exam-

ine the sensitivity of the estimated DL function to the speci-

fication of the BDLagM prior. We compare the air pollution

effect estimated with the BDLagM to that estimated using

unconstrained maximum likelihood (ML). We also compare

air pollution effects estimated under the full formulation of

the BDLagM, computed using a Gibbs sampler, to those es-

timated under an approximate formulation, computed using

a closed form expression.

We also conduct a simulation study comparing BDLagMs

to unconstrained, polynomial, and penalized spline DLagMs.

For penalized spline DLagMs, we compare estimates obtained

using generalized cross validation (GCV) and restricted maxi-

mum likelihood estimation (REML; Ruppert, Wand, and Car-

roll, 2003). We include DLagMs that are consistent with bi-

ological knowledge along with DLagMs for which our BD-

LagMs may be misspecified.

Because constraining DL coefficients is a way of smooth-

ing, we consider how our Bayesian approach relates to pe-

nalized spline DLagMs. We demonstrate that BDLagMs are

analogous to penalized spline DLagMs with a specific penalty

matrix derived from the BDLagM prior.

Though our BDLagM formulation was motivated by a de-

sire to model flexibly the DL function between lagged PM

levels and daily mortality counts, it is relevant to situations

in which the lagged effects of an exposure on an outcome

are unknown for the first few lags but are believed to dissi-

pate with lag. Using BDLagMs with repeated measures data

would require extensions to our approach. For documenta-

tion and to encourage implementation, our BDLagM soft-

ware is available online at http://www.ihapss.jhsph.edu/

software/BayesDLM/.

2. Bayesian DLagMs

Let ytand xtbe the outcome and exposure time series. We

consider a generalized linear DLagM g(E[yt|x1,...,xt]) =

?L

is the vector of the DL coefficients to be estimated. Initially

we will consider the normal linear model E[yt|x1,...,xt] =

?θ?xt−?, with Yt independent normal with constant vari-

ance.

The goal is to specify a prior on θ = (θ0, θ1,...,θL)?that

is uninformative on the DL coefficients for small ? but that

constrains the coefficients with larger ? to be smoother and ap-

proach zero. We assume θ ∼ N(0, Ω), where Ω is constructed

so that for increasing lag the diagonal elements decrease to

zero (Var(θ?) → 0) and the off–diagonal elements in its corre-

lation matrix increase to one (Cor(θ?−1, θ?) → 1). Care must

be taken to construct Ω so that it remains positive definite.

A natural approach is to define Ω = ABA, where AATis the

diagonal matrix of the individual variances of the θ?s, and B is

the correlation matrix for θ. Specifying an appropriate Ω may

then be achieved by setting A equal to the Cholesky decom-

position of a diagonal matrix with the desired prior variances

and setting B equal to the correlation matrix for increasingly

correlated normal random variables.

To define A, let the parameter σ2be the prior variance of

θ0, and set Var(θ1) = v1σ2,...,Var(θL) = vLσ2where the v?s

are a decreasing sequence of weights such that 1 ≥ v1≥ ··· ≥

vL> 0. We parameterize them by v?(η1) = exp(η1?), η1≤ 0,

so that the hyperparameter η1governs how quickly the prior

variances of the θ?s approach zero. Choosing the exponential

function is convenient but not required. Let V(η1) be the

diagonal matrix with entries 1, v1(η1)1/2,...,vL(η1)1/2. We

set A = σV(η1).

To specify the correlation matrix B, we similarly define

w?(η2) = exp(η2?), η2 ≤ 0, to be a decreasing sequence of

weights, and M(η2) to be the (L + 1) × (L + 1) diago-

nal matrix with entries 1, w1(η2),...,wL(η2). We let B =

W(η2), where W(η2) is the correlation matrix derived from

the covariance matrix M(η2)M(η2)?+ {IL+1− M(η2)}1L+1×

1?L+1{IL+1− M(η2)}?, where by 1L+1we mean a (L + 1) × 1

vector of ones and by IL+1we mean the (L + 1) × (L + 1)

identity matrix. Then W(η2) is the correlation matrix for

the mixture of normal random variables M(η2)X1+ {IL+1−

M(η2)}1L+1X2 where X1 ∼ N(0, IL+1) and X2 ∼ N(0, 1).

The first few elements of the independent X1 are weighted

more heavily than the corresponding first few elements of the

dependent 1L+1X2, and the latter elements of the dependent

1L+1X2are weighted more heavily than the latter elements of

the independent X1. The parameter η2controls how quickly

the mixture moves from independent to dependent. The final

?=0θ?xt−?where L is the maximum lag and θ = (θ0,...,θL)?

Page 3

Bayesian Distributed Lag Models

3

form for the prior on θ is then N(0, σ2Ω(η)), where Ω(η) =

V(η1)W(η2)V(η1) and η = (η1, η2)?.

Letˆθ be the ML estimate of the unconstrained DL co-

efficients and let Σ be the sample covariance matrix. For a

normal linear DLagM,ˆθ is N(θ, Σ), so the posterior for θ

conditional on η and σ is

θ |ˆθ,η,σ2∼ N

??1/σ2Ω(η)−1+ Σ−1?−1Σ−1ˆθ,

?1/σ2Ω(η)−1+ Σ−1?−1?

.

(1)

For a general linear DLagM, the posterior distribution for θ

may not be available in closed form, but it may be computed

through Gibbs sampling or other Markov chain Monte Carlo

methods (e.g., Carlin and Louis, 2000). We discuss such an

approach for our PM air pollution and mortality example, in

which the Ytare Poisson distributed daily mortality counts,

log(E[yt|x1,...,xt]) =?L

is Poisson.

The influence of the prior distribution in estimating θ

depends on the values of hyperparameters σ2and η =

(η1, η2)?. The hyperparameter σ2, the prior variance of θ0,

can be viewed as a tuning parameter determining the starting

point of the DL function. In practice there is little informa-

tion in the data to jointly estimate σ2and η. We therefore

assume σ2is ten times the estimated statistical variance of θ0

so that even for relatively large values of η, the prior has little

to no influence on the first few DL coefficients. We examine

sensitivity of BDLagM estimates to choice of σ in Section 5.

Rather than setting values for η = (η1, η2)?and directly de-

termining the influence of the prior, we let η = (η1, η2)?have

a discrete uniform prior on N1× N2, where N1and N2are

finite sets of possible values for η1 and η2. Then the poste-

rior distribution for θ can be defined as the weighted sum

p(θ |ˆθ) =?

probability density. Under the assumption thatˆθ ∼ N(θ,Σ),

the marginal posterior density of the hyperparameter η is

available in closed form. For a given η∗:

?=0θ?xt−?, and the likelihood forˆθ

ηp(θ |ˆθ,η)p(η |ˆθ), where p denotes a general

p(η∗|ˆθ) =

|σ2Ω(η∗)Σ−1+ I|−1/2exp

?

−1

?

2

ˆθ

??

Σ−1− Σ−1?

??

Σ−1+

1

σ2Ω(η∗)−1?−1

1

σ2Ω(η)−1?−1

Σ−1

?

?

ˆθ

?

?

η

|σ2Ω(η)Σ−1+ I|−1/2exp

−1

2

ˆθ

Σ−1− Σ−1?

Σ−1+

Σ−1

ˆθ

?. (2)

Sufficiently large ranges for N1 and N2 insure that the

data drive the strength or weakness of the prior distribution

and therefore the eventual smoothness of the estimated DL

function.

3. Bayesian DLagMs and Penalized Splines

Following the well-established connection between nonpara-

metric smoothing and Bayesian modeling (e.g., Silverman,

1985), we illustrate the relationship between normal linear

BDLagMs and p-spline DLagMs. We show that estimating

the normal linear DL function under model (1) is analogous

to fitting a p-spline to DL coefficients with penalty derived

from our prior. An advantage of this connection is that our

method of putting a prior directly on the coefficients may be

viewed as a transparent means for eliciting p-spline penalties,

which are otherwise difficult to relate to biological or other

prior knowledge.

Let θ = U γ, where U is a spline basis matrix and γ

is a vector of spline coefficients. Letˆθ be the ML esti-

mate of θ, and assume thatˆθ = U γ + ν, ν ∼ N(0,Σ), where

Σ is the estimated covariance matrix forˆθ. Under a p-

spline approach, we estimate γ by minimizing the criterion

(ˆθ − U γ)?Σ−1(ˆθ − U γ) + λγTDγ, where λ is a penalty pa-

rameter and D a positive semidefinite matrix (Eilers and

Marx, 1996; Ruppert et al., 2003).

To show the connection between minimizing this criterion

and estimating the BDLagM, (1), we reformulate the p-spline

in its Bayesian formˆθ |γ ∼ N(U γ,Σ) and γ ∼ N(0, Γ),

where Γ is the prior covariance matrix of γ. Because θ =

U γ, the prior on γ translates to prior θ ∼ N(0, U ΓU?). In

(1) we assume θ ∼ N(0, σ2Ω(η)), so we need Γ such that

U ΓU?= σ2Ω(η), or Γ(η) = R−1Q?σ2Ω(η)QR?−1where QR

is U’s qr-decomposition.

Underthis formulation

is, uptoa constant,

1

2γ?U?(U Γ(η)W?)−1U γ, and maximizing the log poste-

rior for γ is equivalent to minimizing the above criterion with

λ = 1 and D = U?(U Γ(η) W?)−1U (Silverman, 1985; Green

and Silverman, 1994). For a given value of the hyperparame-

ter η, the estimated DL coefficients are given by the posterior

mean U (U?Σ−1U + U?(U Γ(η)U?)−1U−1)−1U?Σ−1ˆθ, and the

equivalent degrees of freedom equal the trace of the smoother

matrix

X(XTΣ−1X

+

XT(XΓ(η)XT)−1X−1)XTΣ−1

(Ruppert et al., 2003).

Though a prior on DL coefficients may be translated to

a specific p-spline penalty, the spline approach requires that

the DL function follow a specific form, θ = U γ. For our air

pollution mortality example, we found that using a b-spline

basis with L + 1 degrees of freedom produced estimates of θ

identical to those from the BDLagM. In the following simula-

tion study, we compare BDLagMs to p-splines with penalties

unrelated to the prior.

the

−1

logposterior for

γ

2(ˆθ − U γ)?Σ−1(ˆθ − U γ) −

4. Simulation Study

We conducted a simulation study to compare BDLagMs with

four methods for estimating DL functions—unconstrained,

polynomial, p-splines with penalty parameter chosen by GCV,

and p-splines estimated with REML. We generated data un-

der 25 different sets of true DL coefficients, including examples

for which coefficients do not decrease to zero and smoothness

does not increase with lag. We categorize the DL functions

by four characteristics: (1) shape—decaying exponential (E),

step function (St), or gamma distribution (G); (2) latency—

0 or 2, the number of initial coefficients equal to zero; (3)

oscillation—as described by (−1)?mod 2, to mimic mortality

displacement; and (4) maximum nonzero lag−7 or 14, the lag

Page 4

4

Biometrics

by which the coefficients are less than 0.01. We also considered

a null DL function with all zero coefficients. All DL functions

included current day (? = 0). We set L = 14 as in the sub-

sequent air pollution mortality example. Except for the null

model, all the DL functions were normalized so the sum of

squares of the DL coefficients is 1. We refer to the nonnull

functions by [Shape]o([latency], [max lag]), where the super-

script indicates oscillation.

Under each of the 25 scenarios, we generated 500 outcome

series yt from the model yt= δ?14

i.i.d. N(0,1), and δ is a constant to balance signal and noise.

For the exposure series xt we used mean centered PM10 for

1996 from Chicago, Illinois because there were no missing ob-

servations and the autocorrelation is similar to what we ex-

perience when estimating the association between PM10and

mortality for Chicago for 1987–2000. For simplicity we take

the ?tto be independent N(0, 1), noting that our simulations

still apply to situations in which the ?tare autocorrelated be-

cause application of an appropriate linear filter will result in

a new DLagM with independent normal errors. We set δ =

0.25 to generate moderate evidence for a total effect,?θ?,

in nonnull models (we empirically determined that δ = 0.25

generates ytsuch that the t-statistic for the ML estimate for

?

generate strong evidence for total effect (we empirically de-

termined that δ = 0.475 generates ytsuch that the t-statistic

for the ML estimate for?

each simulated data set we compared the DL functions un-

der five methods: (1) unconstrained ML; (2) the proposed

Bayes’ method (Bayes) using the normal posterior as in (1);

(3) ML with a polynomial of degree four (Poly); (4) a pe-

nalized spline with penalty chosen by GCV (GCV); and (5)

a penalized spline estimated with REML (REML). We also

considered estimating the DL function using an AR-1 model.

With the exception of the null model and St0(2, 14), the AR-1

model was not competitive, and was substantially worse when

the DL function oscillates then goes to zero.

Figure 1 shows the estimated DL functions (white) av-

eraged across the 500 simulations with the 95% confidence

bands (gray) for 24 of the true DL functions (black) (results

not pictured for null model). Results are reported for δ =

0.25. Visual inspection of this figure indicates that the BD-

LagM performs consistently well and estimates the true DL

function with narrower confidence bands than other methods.

To quantify the comparison, we summarize the mean

squared errors of the estimated total effect (?θ?) and DL

coefficients at lags 0, 7, and 14 under the five estimation meth-

ods and for the 25 scenarios. Table 1 summarizes the results

for δ = 0.25. Results for δ = 0.475 are available in Web Ta-

ble 1. Mean squared errors are expressed as percentages of

the mean squared error of the corresponding unconstrained

ML estimates. Values smaller than 100 favor the proposed

estimation methods with respect to unconstrained ML.

When the DL function decreases to zero, BDLagM is 10 to

15% better at estimating the total effect than ML, whereas

Poly, GCV, and REML perform comparably to ML. Results

are similar for δ = 0.25 and δ = 0.475. The better performance

of the Bayesian method with respect its competitors is mainly

due to its greater flexibility in estimating the DL coefficients

at the longer lags. Bayes is consistently 20–30% better than

ML for lag 0; GCV and REML may be substantially better or

?=0θ?xt−?+ ?t where ?t ∼

?θ? is approximately two). Similarly we set δ = 0.475 to

?θ? is approximately four). For

substantially worse. However, Bayes consistently outperforms

the others in estimating the lag 7 and the lag 14 coefficients

for scenarios in which the coefficients go to zero by lag 7 or 14.

When the BDLagM is misspecified and the DL coefficients do

not decrease smoothly to zero, performance of the BDLagM is

less predictable. Bayes may estimate the total effect only 5%

worse than ML (and Poly and REML), or nearly 15% better

(superior to Poly, GCV, REML).

Mortality counts are often modeled with Poisson log-linear

regression, so we also examine how our results extend to

the Poisson case. We simulated data from Yt∼ Poisson(µt),

log(µt) = log(100) + Σ?=14

by 100 were determined empirically to approximate Chicago

mortality levels in 1996. For each set of DL coefficients, we

generated 1000 mortality series. We estimated the posterior

distribution for θ two ways—using (1) (approximatingˆθ as

normal) or a Gibbs sampler. Web Table 2 compares the mean

squared errors of the total effects. The errors are comparable,

suggesting that the simulation results for normal outcomes

are not necessarily misleading for Poisson outcomes.

?=0xt−?θ?/100. The offset and division

5. Application to Particulate Matter Air Pollution

and Mortality

In this section, we apply BDLagMs to daily time series of

PM with aerodynamic diameter less than 10 microns (PM10)

and nonaccidental deaths for Chicago, Illinois for the period

1987–2000. The data were collected from publicly available

sources as part of the NMMAPS. NMMAPS contains daily

time series of age classified mortality, temperature, dew point,

and PM10 for 109 U.S. cities from 1987 to 2000. We ana-

lyzed the time series for Chicago because it is the largest U.S.

city in NMMAPS with few missing PM10values. Additional

details regarding NMMAPS data assembly are available at

http://www.ihapss.jhsph.edu/ and are discussed in previ-

ous NMMAPS analyses (Samet, Zeger, Dominici, Curriero,

Dockery, Schwartz, and Zanobetti, 2000; Samet, Zeger, Do-

minici, Schwartz, and Dockery, 2000; Dominici et al., 2003).

Poisson log-linear regression is frequently used to estimate

the association between day-to-day variations in mortality

counts and day-to-day variations in ambient air pollution lev-

els. We accordingly assume that the mortality in Chicago on

day t, t = 1,...,5114, is a Poisson random variable Ytwith

expectation E[Yt] = µt. As above, we let θ = (θ0,...,θL)?

be the unknown DL coefficients we wish to estimate. We let

xtdenote the PM10time series and for t > L we let xtde-

note the length L + 1 vector of lagged PM10values (xt,...,

xt−L)?.

Multisite time series studies of single day exposure PM10

and mortality have found strong evidence of an association

between PM10 at lags l = 0, 1, and 2 and daily mortality

(e.g., Zmirou et al., 1988; Burnett, Cakmak, and Brook, 1998;

Katsouyanni et al., 2001; Dominici et al., 2003); single city

studies with DLagMs have similarly found the largest effects

in the first seven lags (e.g., Schwartz, 2000; Zanobetti et al.,

2003; Goodman et al., 2004). Though lags beyond two weeks

may have some influence on daily mortality (e.g., mortality

displacement), it is unlikely that lags beyond 2 weeks have

substantial influence on mortality compared to lags less than

2 weeks (Zanobetti et al., 2003). Models containing lags be-

yond 2 weeks are additionally difficult to estimate because

long-term averages of PM10 have strong seasonal variation.

Page 5

Bayesian Distributed Lag Models

5

Figure 1.

unconstrained ML, the proposed Bayesian method (Bayes), ML with a polynomial of degree four (Poly), a penalized spline

with penalty chosen by GCV (GCV), and a penalized spline estimated with REML (REML). Outcome series were simulated

under moderately strong evidence for the sum of the DL coefficients (δ = 0.25).

Mean estimated DL functions (white) and 95% posterior bands (gray) under five estimation methods—

We set L = 14 to capture the majority of short-term effects

of PM10on mortality without confounding estimation of DL

coefficients with seasonal trends in mortality.

When estimating air pollution health effects from time se-

ries studies it is important to account for potential time-

varying confounders such as weather, seasonality, and in-

fluenza epidemics (e.g., Schwartz, 1993; Samet et al., 1998;

Braga, Zanobetti, and Schwartz, 2000; Samoli et al., 2001;

Bell, Samet, and Dominici, 2004; Dominici, McDermott, and

Hastie, 2004; Peng, Dominici, and Louis, 2005; Welty and

Zeger, 2005). We let ztdenote the vector of time-varying co-

variates to include in the model, and we specify ztas in pre-

vious NMMAPS analyses (Dominici et al., 2003). The exact

specification is documented in the associated R code, availa-

ble at http://www.ihapss.jhsph.edu/software/BayesDLM/.

Our goal is to estimate the DL coefficients θ as part of the

generalized linear model

log(µt) = x?

tθ + z?

tβ. (3)

The estimate for 1000 × θ?corresponds to the percentage

increase in daily mortality associated with a 10µg/m3increase

in PM10at lag ?, and 1000 ×?14

increase in PM10at lags ? = 0,...,14.

Bayesian estimation of the generalized linear model in (3)

with our proposed prior for the DL coefficients θ requires two

?=0θ?corresponds to the per-

centage increase in daily mortality associated with a 10µg/m3

Page 6

6

Biometrics

Table 1

Mean squared errors of the estimates of the total effect and of the DL coefficients at lags 0, 7, and 14 obtained under four

estimation methods (Bayesian method (B), a polynomial with four degrees of freedom (P), a p-spline with penalty parameter

chosen by GCV (G), and a p-spline estimated with REML (R)) and for the 25 true DL functions. These results are reported

under the assumption of moderately strong evidence of a total effect (δ = 0.25). Mean squared errors are expressed as

percentages of the mean squared error of the corresponding ML estimates.

Total effectLag 0 Lag 7Lag 14

BPGRBPGRBPGRBPGR

E(0,7) 8999 102 998456175 1296 14 3662 10083 62

E(2,7)

9199 100 9978 47 59779 1131 162 135 94 102

E(0,14)

9199 10399 8147161 576 133683 96 8967

E(2,14)

Eo(0, 7)

Eo(2, 7)

Eo(0, 14)

Eo(2, 14)

9599 99997870 56628 11 22 156 108 9598

8999 10099 8158 119 1676 2242101 12992 78

89 99100997743 70 767 1647 122 14196 89

89 9910099 804874 162155037 262 134 9670

88 99100 9974 44 814911 50 58183 12410283

St(0,7)

97 99102 9975 55 7629 4029 27409 130103 69

St(2,7)99 99 9899 7488 4049 5038 2338 10126 86 75

St(0,14)

106 99 1029973 47 58107 13193 28 9596 37

St(2,14)

Sto(0, 7)

Sto(2, 7)

Sto(0, 14)

Sto(2, 14)

105 99 96 9972 5929 247 13256 30 95 7661

87 100 100 99826768 113 98206 411874 969950

8710010010073 617224 46179 51220597 10237

8699 10099 8152658472 18370135 180 35599248

86999999 7343 65 1533133 31142 188 316 93339

G(0,7)

92 9999 100737064149 1111 19223131 93 106

G(2,7)

92100 99100 75187 5594 162827 33496 8684

G(0,14)

999997 99 75 5727408152310 14968284

G(2,14)

Go(0, 7)

Go(2, 7)

Go(0, 14)

Go(2, 14)

9999 100997589 257118 18 27112014393 71

8810010099 717386 637276092 134 10642

87 999999 744285 13 10156953103 10838

8799100 9976 50 804163 180 591093 10096 40

86100 9999 71 4874 20 47 205482595 115 9235

Null 8999 969974 472110513 2431 9583 37

extensions from the general approach outlined in Section 2.

First, the likelihood for (Yt|xt, zt) is Poisson, so thatˆθ,

the ML estimates of θ, will not be normal and the posterior

distribution for θ |ˆθ will not have a closed form expression.

Second, usual Bayesian estimation requires specifying a joint

prior for θ and β, an untenable approach given the size of the

nonpollutant covariate matrix and its potential relationship

with the pollutant covariate matrix.

We propose two approaches. The first is to fit (3) and

treat the ML estimatesˆθ as N(θ, Σ), where Σ is the sam-

ple covariance matrix. This approach ignores the uncertainty

introduced by estimating β and relies on the asymptotic

normality of the Poisson likelihood, but allows us to esti-

mate θ directly using its closed form posterior (1). The sec-

ond approach is to fit the Poisson log-linear model using

a Gibbs sampler; details and code are available at http://

www.ihapss.jhsph.edu/software/BayesDLM/.

For both computational methods, we set the hyperprior on

η = (η1, η2) to be a discrete uniform distribution over N1×

N2, where N1 is a length 10 sequence ranging from −0.35

to −0.05 in equal intervals, and N2is a length 10 sequence

ranging from −0.37 to 0 in equal intervals. We selected the in-

terval for N1so that the ratio of the prior standard deviation

of θ0 to θL is bounded between 2 and 100. We selected the

values for N2so that the prior correlation of θL−1and θLis

bounded approximately by 0 and 0.99. We also set σ = 0.004,

slightly larger than the square root of ten times the estimated

variance in the ML estimate of θ0. The sensitivity of the esti-

mated BDLagMs to choices of σ and N1× N2is considered

below. We ran the Gibbs sampler for K = 5000 iterations, dis-

carding the first 1000 as burn-in. Diagnostic checks suggested

that the algorithm converged.

Figure 2 shows the posterior mean and the 95% posterior

region of the DL function for the association between PM10

Page 7

Bayesian Distributed Lag Models

7

Figure 2.

2000, using the last 4000 of 5000 iterations of the Gibbs sampler. The gray shaded region denotes the 95% posterior region.

Black dots indicate ML estimates for the unconstrained DL coefficients.

Posterior mean (white) of the DL function for the effect of PM10on mortality for Chicago, Illinois from 1987 to

and mortality in Chicago from 1987 to 2000. The black dots

indicate the unconstrained ML estimates of the DL coeffi-

cients. The strongest association between PM and mortality

occurs at lag 3: a 10µg/m3increase in PM10at lag 3 is associ-

ated with a 0.17% increase in mortality (95% posterior inter-

val [PI] 0.01%, 0.34%), all other lagged PM10levels remaining

constant. The drop in relative risk from lag 3 to lag 5 suggests

the possibility of mortality displacement. We estimate a to-

tal effect of −0.24% (95% PI −0.73%, 0.23%). The estimated

total effect using unconstrained ML, −0.19%, is similar, but

has a wider 95% confidence interval (−0.86%, 0.48%). The

joint posterior distribution of η = (η1, η2) (see Web Figure

1) favored models for which Var(θ?) → 0 quickly and Cor(θ?,

θ?+1) → 1 moderately or quickly.

Figure 3 compares posterior distributions of DL coefficients

from the Gibbs sampler (black) and the normal approxima-

tion (gray). The estimates from the two methods differ for

more moderate lags but are similar for early and later lags and

for the overall sum of DL coefficients. This pattern of agree-

ment and discrepancy is not surprising, given that we expect

the normal approximation and the true posterior distribution

to be most similar where the prior is weakest and the data

drive estimation (early lags) and where the prior is strongest

and drives estimation (later lags). The normal approxima-

tion was computationally faster than the Gibbs sampler (on

an AMD Opteron 848 system with a 2.2 GHz processor, 8.6

seconds versus 15.5 hours for 5000 iterations).

We examined the sensitivity of the BDLagM estimates to

the specification of the prior on η and the selection of the

value for σ (Web Figure 2). The value for σ2was initially set

to 10 times the estimated variance of θ0. Larger values of σ re-

sult in BDLagMs that more closely followed the unconstrained

ML estimates at longer lags. Smaller values of σ resulted in

BDLagMs with latter DL coefficients shrunk to zero. For σ =

0.04, 0.004, 0.0004, the initial DL coefficient estimates were

indistinguishable. The original discrete uniform prior on η1

was set so that the ratio of the prior standard deviation of θ0

to θLranged approximately from 2 to 100. We considered two

new priors for η1so that the ratio ranged from approximately

2 to 50 (more restrictive) or from 2 to 200 (less restrictive). We

did not consider alternate priors on η2because the prior was

already constructed to be as broad as possible without cre-

ating numerical instability. The BDLagMs estimated across

different prior distributions for η and σ = 0.004 were remark-

ably similar. We concluded that the estimated BDLagM is

not driven strongly by the range of values for η.

6. Discussion

We introduce a Bayesian approach to estimate DL functions

in time series models of air pollution and mortality. This for-

mulation uses prior knowledge about the shape of the DL

function, and allows the degree of smoothness of the DL

function to be estimated from the data. We illustrate in a

simulation study that when prior assumptions are valid, BD-

LagMs estimate DL coefficients with smaller mean squared

errors than three common methods—polynomial, spline, and

unconstrained DLagMs.

Page 8

8

Biometrics

Figure 3.

1987 to 2000 by estimation method. Distributions of DL coefficient estimates, by lag, and sum of DL coefficients (all in units

of 10−4) are shown for (i) the DL coefficient vector simulated from the normal approximate posterior distribution (gray) and

(ii) the estimates of DL coefficients from last 4000 iterations of the Gibbs sampler (black).

Comparison of estimation methods for DL coefficients of the effect of PM10on mortality for Chicago, Illinois from

We also show that our approach relates to using penalized

splines to estimate DL functions. Specifically, fitting a penal-

ized spline DLagM with a specific penalty matrix is analogous

to using a BDLagM with a normal prior on the DL coefficients.

An advantage of using the Bayesian approach is the simplicity

of formulating a prior distribution on DL coefficients rather

than specifying a penalty matrix.

Using the proposed BDLagM we estimated the association

between lagged exposures of PM10and mortality for Chicago,

Illinois from 1987 to 2000. We found that the largest effect of

PM10on mortality occurs at lag 3 and that the total effect is

equal to −0.21% (95% PI −0.86%, 0.41%). The shape of the

DL function is consistent with mortality displacement.

For the Chicago data we found that the BDLagM esti-

mated using the normal approximation to the likelihood (with

a posterior distribution for θ available in closed form) and

the Poisson likelihood (with a Gibbs sampler) yielded simi-

lar estimates for the total effect and for early and later DL

coefficients. The relatively large number of daily deaths in

Chicago (on average, 116) as well as the length of the time

series may account for the agreement between the two meth-

ods. For applications with outcome distributions that are not

approximately normal, we anticipate less agreement between

the two estimation methods and that the normal approximate

posterior would be a less efficient proposal distribution.

The BDLagM formulated for a single city time series study

may be naturally extended to a multicity framework. Multi-

city studies of mortality and air pollution use hierarchical

models to pool individual city relative risks across multiple

cities or counties, and have provided strong evidence for the

association between air pollution and mortality (Zmirou et al.,

1988; Burnett et al., 1998; Schwartz, 2000, Katsouyanni et

al., 2001; Samoli et al., 2001; Zanobetti et al., 2002, 2003;

Dominici et al., 2003). The hierarchical models used to date

have estimated risk for single lag PM exposures or the total

effect, which may not fully describe the relationship between

short-term health risk and air pollution exposure. Estimating

our BDLagM for multiple cities in a hierarchical model of an

overall DL function between air pollution and mortality would

provide additional understanding of the relationship between

air pollution and health (Peng, Dominici, and Welty, 2007).

A challenge to estimating our BDLagMs for multiple cities

is missing data. For many U.S. cities, PM air pollution is

measured 1 in every 6 days. Before estimating the outlined

Page 9

Bayesian Distributed Lag Models

9

BDLagMs for multicity studies, it will be necessary to de-

velop a version that estimates the DL coefficients in the pres-

ence of missing data. Accounting for missingness in the ex-

posure series would expand the applicability of the proposed

BDLagMs.

Given the equivalence between estimating DL functions us-

ing a penalized spline and putting a prior directly on the DL

coefficients, our Bayesian method may be viewed as a means

for eliciting a penalty matrix. P-spline penalties can be in-

terpreted as the size of jumps of a smooth’s third or higher

derivatives, which may be difficult to relate to biological or

other prior knowledge. Our method may be viewed as a trans-

parent or intuitive means for eliciting penalties that are con-

sistent with prior knowledge of the objective function. Our

approach is not limited to functions that increase in smooth-

ness as they approach zero; it could also be applied, for in-

stance, to monotonic functions. However, given the necessity

of choosing a value for σ2= Var(θ0), it could be imprudent to

use our approach to estimate DL functions about which there

is no prior knowledge about the range of θ0.

7. Supplementary Materials

Web Tables and Figures referenced in Sections 4 and 5 are

available under the Paper Information link on the Biometrics

website at http://www.biometrics.tibs.org.

Acknowledgements

Funding for the authors was provided by NIEHS RO1 grant

(ES012054-01), and by NIEHS Center in Urban Environmen-

tal Health (P30 ES 03819).

References

Almon, S. (1965). The distributed lag between capital appro-

priations and expenditures. Econometrica 33, 178–196.

Bell, M. L., Samet, J. M., and Dominici, F. (2004). Time-

series studies of particulate matter. Annual Review of

Public Health 25, 247–280.

Bell, M. L., McDermott, A., Zeger, S. L., Samet, J. M., and

Dominici, F. (2004). Ozone and short-term mortality in

95 US urban communities, 1987–2000. Journal of the

American Medical Association 19, 2372–2378.

Braga, F., Zanobetti, A., and Schwartz, J. (2000). Do respi-

ratory epidemics confound the association between air

pollution and daily deaths? European Respiratory Jour-

nal 16, 723–728.

Braga, F., Luis, A., Zanobetti, A., and Schwartz, J. (2001).

The time course of weather-related deaths. Epidemiology

12, 662–667.

Burnett, R., Cakmak, S., and Brook, J. (1998). The effect of

the urban ambient air pollution mix on daily mortality

rates in 11 Canadian cities. Canadian Journal of Public

Health 89, 152–156.

Carlin, B. P. and Louis, T. A. (2000). Bayes and Empirical

Bayes Methods for Data Analysis. Boca Raton, Florida:

Chapman and Hall.

Corradi, C. (1977). Smooth distributed lag estimators and

smoothing spline functions in Hilbert spaces. Journal of

Econometrics 5, 211–220.

Dominici, F., McDermott, A., Daniels, M. J., Zeger, S. L.,

and Samet, J. M. (2003). Revised Analysis of the Na-

tional Morbidity Mortality Air Pollution Study: Part II.

Cambridge, Massachusetts: The Health Effects Institute.

Dominici, F., McDermott, A., and Hastie, T. (2004). Im-

proved semi-parametric time series models of air pollu-

tion and mortality. Journal of the American Statistical

Association 468, 938–948.

Eilers, P. and Marx, B. (1996). Flexible smoothing with

b-splines and penalties. Statistical Science 1, 89–121.

Fahrmeir, L. and Knorr-Held, L. (1997). Dynamic discrete-

time duration models: Estimation via Markov chain

Monte Carlo. Sociological Methodology 27, 417–452.

Goodman, P. G., Dockery, D. W., and Clancy, L. (2004).

Cause-specific mortality and the extended effects of par-

ticulate pollution and temperature exposure. Environ-

mental Health Perspectives 112, 179–185.

Green, P. J. and Silverman, B. W. (1994). Nonparametric

Regression and Generalized Linear Models. Boca Raton,

Florida: Chapman and Hall.

Hastie, T. and Tibshirani, R. (1993). Varying-coefficient mod-

els. Journal of the Royal Statistical Society, Series B 4,

757–796.

Katsouyanni, K., Toulomi, G., Samoli, E., Gryparis, A.,

Le Tertre, A., Monopolis, Y., Rossi, G., Zmirou, D.,

Ballester, F., Boumghar, A., Anderson, H. R., Woj-

tyniak, B., Paldy, A., Braunstein, R., Pekkanen, J.,

Schindler, C., and Schwartz, J. (2001). Confounding and

effect modification in the short-term effects of ambi-

ent particles on total mortality: Results from 29 Euro-

pean cities within the APHEA2 project. Epidemiology

12, 521–531.

Kim, H., Kim, Y., and Hong, Y. (2003). The lag-effect pat-

tern in the relationship of particulate air pollution to

daily mortality in Seoul, Korea. International Journal of

Biometeorology 48, 25–30.

Leamer, E. E. (1972). A class of informative priors and dis-

tributed lag analysis. Econometrica 40, 1059–1081.

Manda, S. and Meyer, R. (2005). Age at first marriage in

Malawi: A Bayesian multilevel analysis using a discrete

time-to-event model. Journal of the Royal Statistical So-

ciety, Series A 168, 439–455.

Peng, R. D., Dominici, F., and Louis, T. A. (2005). Model

choice in time series studies of air pollution and mortal-

ity. Journal of the Royal Statistical Society, Series A 169,

179–203.

Peng, R. D., Dominici, F., and Welty, L. J. (2007). A Bayesian

hierarchical model for constrained distributed lag func-

tions: Estimating the time course of hospitilization asso-

ciated with air pollution exposure. Technical Report 128,

Department of Biostatistics, Johns Hopkins University,

Baltimore, Maryland.

Pope, C. A. and Schwartz, J. (1996). Time series for the

analysis of pulmonary health data. American Journal

of Respiratory and Critical Care Medicine 154, S229–

S233.

Pope, C. A. R., Dockery, D. W., Spengler, J. D., and

Raizenne, M. E. (1991). Respiratory health and pm10

pollution. A daily time series analysis. American Review

of Respiratory Diseases 144, 668–674.

Page 10

10

Biometrics

Ravines, R. R., Schmidt, A. M., and Migon, H. S. (2006). Re-

visiting distributed lag models through a Bayesian per-

spective. Applied Stochastic Models in Business and In-

dustry 22, 193–210.

Roberts, S. (2005). An investigation of distributed lag models

in the context of air pollution and mortality time series

analysis. Journal of the Air and Waste Management As-

sociation 55, 273–282.

Ruppert, D., Wand, M. P., and Carroll, R. J. (2003).

Semiparametric Regression. Cambridge, U.K.: Cambridge

University Press.

Samet, J., Zeger, S., Kelsall, J., Xu, J., and Kalkstein, L.

(1998). Does weather confound or modify the association

of particulate air pollution with mortality. Environmental

Research 77, 9–19.

Samet, J. M., Zeger, S. L., Dominici, F., Curriero, F., Dock-

ery, D. W., Schwartz, J., and Zanobetti, A. (2000). The

National Morbidity Mortality Air Pollution Study: Part II.

Cambridge, Massachusetts: The Health Effects Institute.

Samet, J. M., Zeger, S. L., Dominici, F., Schwartz, J., and

Dockery, D. W. (2000). The National Morbidity Mortality

Air Pollution Study: Part I. Cambridge, Massachusetts:

The Health Effects Institute.

Samoli, E., Schwartz, J., Wojtyniak, B., et al. (2001). Inves-

tigating regional differences in short-term effects of air

pollution on daily mortality in the APHEA project: A

sensitivity analysis for controlling long-term trends and

seasonality. Environmental Health Perspectives 109, 349–

353.

Schiller, R. J. (1973). A distributed lag estimator derived from

smoothness priors. Econometrica 41, 775–788.

Schimmel, B. and Murawsky, T. (1978). The relation of air

pollution to mortality. Journal of Occupational Medicine

18, 316–333.

Schwartz, J. (1993). Methodological issues in studies of air

pollution and daily counts of deaths or hospital admis-

sions. American Journal of Epidemiology 137, 1136–1147.

Schwartz, J. (2000). The distributed lag between air pollution

and daily deaths. Epidemiology 11, 320–326.

Silverman, B. W. (1985). Some aspects of the spline smooth-

ing approach to non-parametric regression curve fitting.

Journal of the Royal Statistical Society, Series B 47, 1–

52.

Welty, L. J. and Zeger, S. L. (2005). Are the acute effects

of particulate matter on mortality in the National Mor-

bidity, Mortality, and Air Pollution study the result of

inadequate control for weather and season? A sensitivity

analysis using flexible distributed lag models. American

Journal of Epidemiology 162, 80–88.

Zanobetti, A., Wand, M. P., Schwartz, J., et al. (2000). Gener-

alized additive distributed lag models: Quantifying mor-

tality displacement. Biostatistics 1, 279–292.

Zanobetti, A., Schwartz, J., Samoli, E., Gryparis, A.,

Touloumi, G., Atkinson, R., Le Tertre, A., Bobros,

J., Celko, M., Goren, A., Forsberg, B., Michelozzi, P.,

Rabczenko, D., Aranguez Ruiz, E., and Katsouyanni, K.

(2002). The temporal pattern of mortality responses to

air pollution: A multicity assessment of mortality dis-

placement. Epidemiology 13, 87–93.

Zanobetti, A., Schwartz, J., Samoli, E., Gryparis, A.,

Touloumi, G., Peacock, J., Anderson, R. H., LeTertre,

A., Bobros, J., Celko, M., Goren, A., Forsberg, B., Mich-

elozzi, P., Rabczenko, D., Hoyos, S. P., Wichmann, H.

E., and Katsouyanni K. (2003). The temporal pattern

of respiratory and heart disease mortality in response

to air pollution. Environmental Health Perspectives 111,

1188–1193.

Zmirou, D., Schwartz, J., Saez, M., Zanobetti, A., Wojty-

niak, B., Touloumi, G., Spix, C., Ponce de Le´ on, A., Le

Moullec, Y., Bacharova, L., Schouten, J., P¨ onk¨ a, A., and

Katsouyanni, K. (1988). Time-series analysis of air pol-

lution and cause-specific mortality. Epidemiology 9, 495–

503.

Received December 2005. Revised January 2008.

Accepted January 2008.