David A Stephens

McGill University, Montréal, Quebec, Canada

Are you David A Stephens?

Claim your profile

Publications (83)174.42 Total impact

  • Michael P Wallace · Erica E M Moodie · David A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: Dynamic treatment regimens (DTRs) recommend treatments based on evolving subject-level data. The optimal DTR is that which maximizes expected patient outcome and as such its identification is of primary interest in the personalized medicine setting. When analyzing data from observational studies using semi-parametric approaches, there are two primary components which can be modeled: the expected level of treatment and the expected outcome for a patient given their other covariates. In an effort to offer greater flexibility, the so-called doubly robust methods have been developed which offer consistent parameter estimators as long as at least one of these two models is correctly specified. However, in practice it can be difficult to be confident if this is the case. Using G-estimation as our example method, we demonstrate how the property of double robustness itself can be used to provide evidence that a specified model is or is not correct. This approach is illustrated through simulation studies as well as data from the Multicenter AIDS Cohort Study.
    No preview · Article · Jan 2016 · Biometrics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Chronic malnutrition, termed stunting, is defined as suboptimal linear growth, affects one third of children in developing countries, and leads to increased mortality and poor developmental outcomes. The causes of childhood stunting are unknown, and strategies to improve growth and related outcomes in children have only had modest impacts. Recent studies have shown that the ecosystem of microbes in the human gut, termed the microbiota, can induce changes in weight. However, the specific changes in the gut microbiota that contribute to growth remain unknown, and no studies have investigated the gut microbiota as a determinant of chronic malnutrition. We performed secondary analyses of data from two well-characterized twin cohorts of children from Malawi and Bangladesh to identify bacterial genera associated with linear growth. In a case-control analysis, we used the graphical lasso to estimate covariance network models of gut microbial interactions from relative genus abundances and used network analysis methods to select genera associated with stunting severity. In longitudinal analyses, we determined associations between these selected microbes and linear growth using between-within twin regression models to adjust for confounding and introduce temporality. Reduced microbiota diversity and increased covariance network density were associated with stunting severity, while increased relative abundance of Acidaminococcus sp. was associated with future linear growth deficits. We show that length growth in children is associated with community-wide changes in the gut microbiota and with the abundance of the bacterial genus, Acidaminococcus. Larger cohorts are needed to confirm these findings and to clarify the mechanisms involved.
    Preview · Article · Dec 2015
  • Ashkan Ertefaie · Masoud Asgharian · David Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: In the causal adjustment setting, variable selection techniques based on one of either the outcome or treatment allocation model can result in the omission of confounders, which leads to bias, or the inclusion of spurious variables, which leads to variance inflation, in the propensity score. We propose a variable selection method based on a penalized objective function which considers the outcome and treatment assignment models simultaneously. The proposed method facilitates confounder selection in high-dimensional settings. We show that under regularity conditions our method attains the oracle property. The selected variables are used to form a doubly robust regression estimator of the treatment effect. We show that under some conditions our method attains the oracle property. Simulation results are presented and economic growth data are analyzed. Specifically, we study the effect of life expectancy as a measure of population health on the average growth rate of gross domestic product per capita.
    No preview · Article · Nov 2015
  • Benjamin Rich · Erica E M Moodie · David A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: There have been considerable advances in the methodology for estimating dynamic treatment regimens, and for the design of sequential trials that can be used to collect unconfounded data to inform such regimens. However, relatively little attention has been paid to how such methodology could be used to advance understanding of optimal treatment strategies in a continuous dose setting, even though it is often the case that considerable patient heterogeneity in drug response along with a narrow therapeutic window may necessitate the tailoring of dosing over time. Such is the case with warfarin, a common oral anticoagulant. We propose novel, realistic simulation models based on pharmacokinetic-pharmacodynamic properties of the drug that can be used to evaluate potentially optimal dosing strategies. Our results suggest that this methodology can lead to a dosing strategy that performs well both within and across populations with different pharmacokinetic characteristics, and may assist in the design of randomized trials by narrowing the list of potential dosing strategies to those which are most promising.
    No preview · Article · Nov 2015 · Biometrical Journal
  • Source
    Irene Vrbik · David A. Stephens · Michel Roger · Bluma G. Brenner
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application. Results: This paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis. Conclusions: Apart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease.
    Full-text · Article · Nov 2015 · BMC Bioinformatics
  • David A. Stephens

    No preview · Article · Sep 2015 · Biometrics
  • Olli Saarela · Elja Arjas · David A Stephens · Erica E M Moodie
    [Show abstract] [Hide abstract]
    ABSTRACT: While optimal dynamic treatment regimes (DTRs) can be estimated without specification of a predictive model, a model-based approach, combined with dynamic programming and Monte Carlo integration, enables direct probabilistic comparisons between the outcomes under the optimal DTR and alternative (dynamic or static) treatment regimes. The Bayesian predictive approach also circumvents problems related to frequentist estimators under the nonregular estimation problem. However, the model-based approach is susceptible to misspecification, in particular of the "null-paradox" type, which is due to the model parameters not having a direct causal interpretation in the presence of latent individual-level characteristics. Because it is reasonable to insist on correct inferences under the null of no difference between the alternative treatment regimes, we discuss how to achieve this through a "null-robust" reparametrization of the problem in a longitudinal setting. Since we argue that causal inference can be entirely understood as posterior predictive inference in a hypothetical population without covariate imbalances, we also discuss how controlling for confounding through inverse probability of treatment weighting can be justified and incorporated in the Bayesian setting. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
    No preview · Article · Aug 2015 · Biometrical Journal
  • Benjamin Rich · Erica E M Moodie · David A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: Individualized medicine is an area that is growing, both in clinical and statistical settings, where in the latter, personalized treatment strategies are often referred to as dynamic treatment regimens. Estimation of the optimal dynamic treatment regime has focused primarily on semi-parametric approaches, some of which are said to be doubly robust in that they give rise to consistent estimators provided at least one of two models is correctly specified. In particular, the locally efficient doubly robust g-estimation is robust to misspecification of the treatment-free outcome model so long as the propensity model is specified correctly, at the cost of an increase in variability. In this paper, we propose data-adaptive weighting schemes that serve to decrease the impact of influential points and thus stabilize the estimator. In doing so, we provide a doubly robust g-estimator that is also robust in the sense of Hampel (15).
    No preview · Article · Aug 2015 · The International Journal of Biostatistics
  • Source
    Daniel J. Graham · Emma J. McCoy · David A. Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper constructs a doubly robust estimator for continuous dose-response estimation. An outcome regression model is augmented with a set of inverse generalized propensity score covariates to correct for potential misspecification bias. From the augmented model we can obtain consistent estimates of mean average potential outcomes for distinct strata of the treatment. A polynomial regression is then fitted to these point estimates to derive a Taylor approximation to the continuous dose-response function. The bootstrap is used for variance estimation. Analytical results and simulations show that our approach can provide a good approximation to linear or nonlinear dose-response functions under various sources of misspecification of the outcome regression or propensity score models. Efficiency in finite samples is good relative to minimum variance consistent estimators.
    Preview · Article · Jun 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Amblyopia is the commonest visual disorder of childhood in Western societies, affecting, predominantly, spatial visual function. Treatment typically requires a period of refractive correction ('optical treatment') followed by occlusion: covering the nonamblyopic eye with a fabric patch for varying daily durations. Recent studies have provided insight into the optimal amount of patching ('dose'), leading to the adoption of standardized dosing strategies, which, though an advance on previous ad-hoc regimens, take little account of individual patient characteristics. This trial compares the effectiveness of a standardized dosing strategy (that is, a fixed daily occlusion dose based on disease severity) with a personalized dosing strategy (derived from known treatment dose-response functions), in which an initially prescribed occlusion dose is modulated, in a systematic manner, dependent on treatment compliance. A total of 120 children aged between 3 and 8 years of age diagnosed with amblyopia in association with either anisometropia or strabismus, or both, will be randomized to receive either a standardized or a personalized occlusion dose regimen. To avoid confounding by the known benefits of refractive correction, participants will not be randomized until they have completed an optical treatment phase. The primary study objective is to determine whether, at trial endpoint, participants receiving a personalized dosing strategy require fewer hours of occlusion than those in receipt of a standardized dosing strategy. Secondary objectives are to quantify the relationship between observed changes in visual acuity (logMAR, logarithm of the Minimum Angle of Resolution) with age, amblyopia type, and severity of amblyopic visual acuity deficit. This is the first randomized controlled trial of occlusion therapy for amblyopia to compare a treatment arm representative of current best practice with an arm representative of an entirely novel treatment regimen based on statistical modelling of previous trial outcome data. Should the personalized dosing strategy demonstrate superiority over the standardized dosing strategy, then its adoption into routine practice could bring practical benefits in reducing the duration of treatment needed to achieve an optimal outcome. ISRCTN ISRCTN12292232 .
    Full-text · Article · Apr 2015 · Trials
  • Ashkan Ertefaie · Masoud Asgharian · David A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: Length bias in survival data occurs in observational studies when, for example, subjects with shorter lifetimes are less likely to be present in the recorded data. In this paper, we consider estimating the causal exposure (treatment) effect on survival time from observational data when, in addition to the lack of randomization and consequent potential for confounding, the data constitute a length-biased sample; we hence term this a double-bias problem. We develop estimating equations that can be used to estimate the causal effect indexing the structural Cox proportional hazard and accelerated failure time models for point exposures in double-bias settings. The approaches rely on propensity score-based adjustments, and we demonstrate that estimation of the propensity score must be adjusted to acknowledge the length-biased sampling. Large sample properties of the estimators are established and their small sample behavior is studied using simulations. We apply the proposed methods to a set of, partly synthesized, length-biased survival data collected as part of the Canadian Study of Health and Aging (CSHA) to compare survival of subjects with dementia among institutionalized patients versus those recruited from the community and depict their adjusted survival curves.
    No preview · Article · Mar 2015 · The International Journal of Biostatistics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The nuclei of higher eukaryotic cells display compartmentalization and certain nuclear compartments have been shown to follow a degree of spatial organization. To date, the study of nuclear organization has often involved simple quantitative procedures that struggle with both the irregularity of the nuclear boundary and the problem of handling replicate images. Such studies typically focus on inter-object distance, rather than spatial location within the nucleus. The concern of this paper is the spatial preference of nuclear compartments, for which we have developed statistical tools to quantitatively study and explore nuclear organization. These tools combine replicate images to generate 'aggregate maps' which represent the spatial preferences of nuclear compartments. We present two examples of different compartments in mammalian fibroblasts (WI-38 and MRC-5) that demonstrate new knowledge of spatial preference within the cell nucleus. Specifically, the spatial preference of RNA polymerase II is preserved across normal and immortalized cells, whereas PML nuclear bodies exhibit a change in spatial preference from avoiding the centre in normal cells to exhibiting a preference for the centre in immortalized cells. In addition, we show that SC35 splicing speckles are excluded from the nuclear boundary and localize throughout the nucleoplasm and in the interchromatin space in non-transformed WI-38 cells. This new methodology is thus able to reveal the effect of large-scale perturbation on spatial architecture and preferences that would not be obvious from single cell imaging.
    Full-text · Article · Mar 2015 · Journal of The Royal Society Interface
  • Olli Saarela · David A Stephens · Erica E M Moodie · Marina B Klein
    [Show abstract] [Hide abstract]
    ABSTRACT: The purpose of inverse probability of treatment (IPT) weighting in estimation of marginal treatment effects is to construct a pseudo-population without imbalances in measured covariates, thus removing the effects of confounding and informative censoring when performing inference. In this article, we formalize the notion of such a pseudo-population as a data generating mechanism with particular characteristics, and show that this leads to a natural Bayesian interpretation of IPT weighted estimation. Using this interpretation, we are able to propose the first fully Bayesian procedure for estimating parameters of marginal structural models using an IPT weighting. Our approach suggests that the weights should be derived from the posterior predictive treatment assignment and censoring probabilities, answering the question of whether and how the uncertainty in the estimation of the weights should be incorporated in Bayesian inference of marginal treatment effects. The proposed approach is compared to existing methods in simulated data, and applied to an analysis of the Canadian Co-infection Cohort. © 2015, The International Biometric Society.
    No preview · Article · Feb 2015 · Biometrics
  • Daniel J. Graham · Emma J. McCoy · David A. Stephens

    No preview · Article · Feb 2015
  • Daniel J. Graham · Emma J. McCoy · David A. Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: Road network capacity expansions are frequently proposed as solutions to urban traffic congestion but are controversial because it is thought that they can directly “induce” growth in traffic volumes. This article quantifies causal effects of road network capacity expansions on aggregate urban traffic volume and density in U.S. cities using a mixed model propensity score (PS) estimator. The motivation for this approach is that we seek to estimate a dose-response relationship between capacity and volume but suspect confounding from both observed and unobserved characteristics. Analytical results and simulations show that a longitudinal mixed model PS approach can be used to adjust effectively for time-invariant unobserved confounding via random effects (RE). Our empirical results indicate that network capacity expansions can cause substantial increases in aggregate urban traffic volumes such that even major capacity increases can actually lead to little or no reduction in network traffic densities. This result has important implications for optimal urban transportation strategies. Supplementary materials for this article are available online.
    No preview · Article · Oct 2014 · Journal of the American Statistical Association
  • [Show abstract] [Hide abstract]
    ABSTRACT: To detect hGH doping in sport, the World Anti-Doping Agency (WADA)-accredited laboratories use the ratio of the concentrations of recombinant hGH (‘rec’) versus other ‘natural’ pituitary-derived isoforms of hGH (‘pit’), measured with two different kits developed specifically to detect the administration of exogenous hGH. The current joint compliance decision limits (DLs) for ratios derived from these kits, designed so that they would both be exceeded in fewer than 1 in 10,000 samples from non-doping athletes, are based on data accrued in anti-doping labs up to March 2010, and later confirmed with data up to February–March 2011. In April 2013, WADA asked the authors to analyze the now much larger set of ratios collected in routine hGH testing of athletes, and to document in the peer-reviewed literature a statistical procedure for establishing DLs, so that it be re-applied as more data become available.
    No preview · Article · Oct 2014 · Growth Hormone & IGF Research
  • Erica E M Moodie · David A Stephens · Marina B Klein
    [Show abstract] [Hide abstract]
    ABSTRACT: It is often the case that interest lies in the effect of an exposure on each of several distinct event types. For example, we are motivated to investigate in the impact of recent injection drug use on deaths due to each of cancer, end-stage liver disease, and overdose in the Canadian Co-infection Cohort (CCC). We develop a marginal structural model that permits estimation of cause-specific hazards in situations where more than one cause of death is of interest. Marginal structural models allow for the causal effect of treatment on outcome to be estimated using inverse-probability weighting under the assumption of no unmeasured confounding; these models are particularly useful in the presence of time-varying confounding variables, which may also mediate the effect of exposures. An asymptotic variance estimator is derived, and a cumulative incidence function estimator is given. We compare the performance of the proposed marginal structural model for multiple-outcome data to that of conventional competing risks models in simulated data and demonstrate the use of the proposed approach in the CCC. Copyright © 2013 John Wiley & Sons, Ltd.
    No preview · Article · Apr 2014 · Statistics in Medicine
  • Source
    Ashkan Ertefaie · Masoud Asgharian · David Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: The pervasive use of prevalent cohort studies on disease duration, increasingly calls for appropriate methodologies to account for the biases that invariably accompany samples formed by such data. It is well-known, for example, that subjects with shorter lifetime are less likely to be present in such studies. Moreover, certain covariate values could be preferentially selected into the sample, being linked to the long-term survivors. The existing methodology for estimation of the propensity score using data collected on prevalent cases requires the correct conditional survival/hazard function given the treatment and covariates. This requirement can be alleviated if the disease under study has stationary incidence, the so-called stationarity assumption. We propose a nonparametric adjustment technique based on a weighted estimating equation for estimating the propensity score which does not require modeling the conditional survival/hazard function when the stationarity assumption holds. Large sample properties of the estimator is established and its small sample behavior is studied via simulation.
    Full-text · Article · Mar 2014
  • Benjamin Rich · Erica Em Moodie · David A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: Due to the cost and complexity of conducting a sequential multiple assignment randomized trial (SMART), it is desirable to pre-define a small number of personalized regimes to study. We proposed a simulation-based approach to studying personalized dosing strategies in contexts for which a therapeutic agent's pharmacokinetic and pharmacodynamics properties are well understood. We take dosing of warfarin as a case study, as its properties are well understood. We consider a SMART in which there are five intervention points in which dosing may be modified, following a loading phase of treatment. Realistic SMARTs are simulated, and two methods of analysis, G-estimation and Q-learning, are used to assess potential personalized dosing strategies. In settings where outcome modelling may be complex due to the highly non-linear nature of the pharmacokinetic and pharmacodynamics mechanisms of the therapeutic agent, G-estimation provides for which the more promising method of estimating an optimal dosing strategy. Used in combination with the simulated SMARTs, we were able to improve simulated patient outcomes and suggest which patient characteristics were needed to best individually tailor dosing. In particular, our simulations suggest that current dosing should be determined by an individual's current coagulation time as measured by the international normalized ratio (INR), their last measured INR, and their last dose. Tailoring treatment only based on current INR and last warfarin dose provided inferior control of INR over the course of the trial. The ability of the simulated SMARTs to suggest optimal personalized dosing strategies relies on the pharmacokinetic and pharmacodynamic models used to generate the hypothetical patient profiles. This approach is best suited to therapeutic agents whose effects are well studied. Prior to investing in a complex randomized trial that involves sequential treatment allocations, simulations should be used where possible in order to guide which dosing strategies to evaluate.
    No preview · Article · Jan 2014 · Clinical Trials
  • Source
    Ashkan Ertefaie · Masoud Asgharian · David A. Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: In the causal adjustment setting, variable selection techniques based on either the outcome or treatment allocation model can result in the omission of confounders or the inclusion of spurious variables in the propensity score. We propose a variable selection method based on a penalized likelihood which considers the response and treatment assignment models simultaneously. We show that under some conditions our method attains the oracle property. The selected variables are used to form a double robust regression estimator of the treatment effect. Simulation results are presented and data from the National Supported Work Demonstration are analyzed.
    Full-text · Article · Nov 2013

Publication Stats

2k Citations
174.42 Total Impact Points


  • 2007-2016
    • McGill University
      • Department of Mathematics and Statistics
      Montréal, Quebec, Canada
  • 2002-2007
    • Imperial College London
      • Department of Mathematics
      Londinium, England, United Kingdom
  • 1995
    • Athens University of Economics and Business
      • Department of Statistics
      Athínai, Attica, Greece