Bayesian Distributed Lag Models: Estimating Effects of Particulate Matter Air Pollution on Daily Mortality

Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, 680 North Lake Shore Drive, Suite 1102, Chicago, Illinois 60611, USA.
Biometrics (Impact Factor: 1.57). 05/2008; 65(1):282-91. DOI: 10.1111/j.1541-0420.2007.01039.x
Source: PubMed


A distributed lag model (DLagM) is a regression model that includes lagged exposure variables as covariates; its corresponding distributed lag (DL) function describes the relationship between the lag and the coefficient of the lagged exposure variable. DLagMs have recently been used in environmental epidemiology for quantifying the cumulative effects of weather and air pollution on mortality and morbidity. Standard methods for formulating DLagMs include unconstrained, polynomial, and penalized spline DLagMs. These methods may fail to take full advantage of prior information about the shape of the DL function for environmental exposures, or for any other exposure with effects that are believed to smoothly approach zero as lag increases, and are therefore at risk of producing suboptimal estimates. In this article, we propose a Bayesian DLagM (BDLagM) that incorporates prior knowledge about the shape of the DL function and also allows the degree of smoothness of the DL function to be estimated from the data. We apply our BDLagM to its motivating data from the National Morbidity, Mortality, and Air Pollution Study to estimate the short-term health effects of particulate matter air pollution on mortality from 1987 to 2000 for Chicago, Illinois. In a simulation study, we compare our Bayesian approach with alternative methods that use unconstrained, polynomial, and penalized spline DLagMs. We also illustrate the connection between BDLagMs and penalized spline DLagMs. Software for fitting BDLagM models and the data used in this article are available online.

    • "In matrix notation we can write g.μ/These two problems together often cause unstable estimates that are difficult to interpret. We can make the following two assumptions (seeWelty et al. (2009)) on interpretable and plausible shapes of the lag effects. First, we view β l , l = 1, : : : , L, as evaluations of a lag coefficient curve β.l/ and restrict the roughness of β.l/ by assuming a certain degree of regularity. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Heavy long-lasting rainfall can trigger earthquake swarms. We are interested in the specific shape of lagged rain influence on the occurrence of earthquakes at different depths at Mount Hochstaufen, Bavaria. We present a novel penalty structure for interpretable and flexible estimates of lag coefficients based on spline representations. We provide an easy-to-use implementation of our flexible distributed lag approach that can be used directly in the established R package mgcv for estimation of generalized additive models. This allows our approach to be immediately included in complex additive models for generalized responses even in hierarchical or longitudinal data settings, making use of established stable and well-tested inference algorithms. The benefit of flexible distributed lag modelling is shown in a detailed simulation study.
    No preview · Article · Sep 2014 · Journal of the Royal Statistical Society Series C Applied Statistics
    • "We conducted a simulation study to compare the existing geometric (GEO) and negative binomial (NB) delays as used in Toktay et al. (2000) and exponential (E) delay as used in Clottey et al. (2012), with the gamma delay (G), using the M–H algorithm for estimation. Data were generated under six different scenarios for the delay function coefficients based on specifications that had previously been employed to evaluate DLMs (Toktay, 2004; Welty et al., 2009). We categorized the delay functions by two characteristics: (i) type: discrete (i.e., geometric and negative binomial delays ) or continuous (i.e., exponential and gamma delays) and (ii) shape: 1, 2, or 3, indicating whether earlier or later lags had the largest coefficients. "
    [Show abstract] [Hide abstract]
    ABSTRACT: An important problem faced in closed-loop supply chains is ensuring a sufficient supply of reusable products (i.e., cores) to support reuse activities. Accurate forecasting of used product returns can assist in effectively managing the sourcing activities for cores. The application of existing forecasting models to actual data provided by an Original Equipment Manufacturer (OEM) remanufacturer resulted in the following challenges: (i) inherent difficulties in estimation due to long return lags in the data and (ii) required adjustments for initial conditions. This article develops methods to address these issues and illustrates the proposed approach using the data provided by the OEM remanufacturer. The cost implications of using the proposed method to source cores are also investigated. Results of the performed analysis showed that the proposed forecasting approach performed best when the product is in the maturity or decline stages of its life cycle, with the rate of product returns balanced with the demand volume for the remanufactured product. Forecasting product returns can therefore be best leveraged for reducing the acquisition costs of cores in such settings.
    No preview · Article · Jun 2014 · IIE Transactions
  • Source
    • "How to incorporate multiple lagged exposures in a high-dimensional response-exposure surface remains a problem where no consensus has been reached. Selecting predictors/complex models under a distributed lag structure remains an issue of ongoing research [88,89]. In many health effects studies, where direct measurements of personal exposure to multiple pollutants are not practical, ambient pollutants concentrations are often used as proxies for personal exposure [6,13]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: As public awareness of consequences of environmental exposures has grown, estimating the adverse health effects due to simultaneous exposure to multiple pollutants is an important topic to explore. The challenges of evaluating the health impacts of environmental factors in a multipollutant model include, but are not limited to: identification of the most critical components of the pollutant mixture, examination of potential interaction effects, and attribution of health effects to individual pollutants in the presence of multicollinearity. In this paper, we reviewed five methods available in the statistical literature that are potentially helpful for constructing multipollutant models. We conducted a simulation study and presented two data examples to assess the performance of these methods on feature selection, effect estimation and interaction identification using both cross-sectional and time-series designs. We also proposed and evaluated a two-step strategy employing an initial screening by a tree-based method followed by further dimension reduction/variable selection by the aforementioned five approaches at the second step. Among the five methods, least absolute shrinkage and selection operator regression performs well in general for identifying important exposures, but will yield biased estimates and slightly larger model dimension given many correlated candidate exposures and modest sample size. Bayesian model averaging, and supervised principal component analysis are also useful in variable selection when there is a moderately strong exposure-response association. Substantial improvements on reducing model dimension and identifying important variables have been observed for all the five statistical methods using the two-step modeling strategy when the number of candidate variables is large. There is no uniform dominance of one method across all simulation scenarios and all criteria. The performances differ according to the nature of the response variable, the sample size, the number of pollutants involved, and the strength of exposure-response association/interaction. However, the two-step modeling strategy proposed here is potentially applicable under a multipollutant framework with many covariates by taking advantage of both the screening feature of an initial tree-based method and dimension reduction/variable selection property of the subsequent method. The choice of the method should also depend on the goal of the study: risk prediction, effect estimation or screening for important predictors and their interactions.
    Full-text · Article · Oct 2013 · Environmental Health
Show more