David A Stephens

McGill University, Montréal, Quebec, Canada

Are you David A Stephens?

Claim your profile

Publications (59)115.5 Total impact

  • Benjamin Rich, Erica Em Moodie, David A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: Due to the cost and complexity of conducting a sequential multiple assignment randomized trial (SMART), it is desirable to pre-define a small number of personalized regimes to study. We proposed a simulation-based approach to studying personalized dosing strategies in contexts for which a therapeutic agent's pharmacokinetic and pharmacodynamics properties are well understood. We take dosing of warfarin as a case study, as its properties are well understood. We consider a SMART in which there are five intervention points in which dosing may be modified, following a loading phase of treatment. Realistic SMARTs are simulated, and two methods of analysis, G-estimation and Q-learning, are used to assess potential personalized dosing strategies. In settings where outcome modelling may be complex due to the highly non-linear nature of the pharmacokinetic and pharmacodynamics mechanisms of the therapeutic agent, G-estimation provides for which the more promising method of estimating an optimal dosing strategy. Used in combination with the simulated SMARTs, we were able to improve simulated patient outcomes and suggest which patient characteristics were needed to best individually tailor dosing. In particular, our simulations suggest that current dosing should be determined by an individual's current coagulation time as measured by the international normalized ratio (INR), their last measured INR, and their last dose. Tailoring treatment only based on current INR and last warfarin dose provided inferior control of INR over the course of the trial. The ability of the simulated SMARTs to suggest optimal personalized dosing strategies relies on the pharmacokinetic and pharmacodynamic models used to generate the hypothetical patient profiles. This approach is best suited to therapeutic agents whose effects are well studied. Prior to investing in a complex randomized trial that involves sequential treatment allocations, simulations should be used where possible in order to guide which dosing strategies to evaluate.
    Clinical Trials 01/2014; · 2.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Objective : To detect hGH doping in sport, the World Anti-Doping Agency (WADA)-accredited laboratories use the ratio of the concentrations of recombinant hGH (‘rec’) versus other ‘natural’pituitary-derived isoforms of hGH (‘pit’), measured with two different kits developed specifically to detect the administration of exogenous hGH. The current joint compliance decision limits (DLs) for ratios derived from these kits, designed so that they would both be exceeded in fewer than 1 in 10,000 samples from non-doping athletes, are based on data accrued in anti-doping labs up to March 2010, and later confirmed with data up to February-March 2011. In April 2013, WADA asked the authors to analyze the now much larger set of ratios collected in routine hGH testing of athletes, and to document in the peer-reviewed literature a statistical procedure for establishing DLs, so that it be re-applied as more data become available. Design : We examined the variation in the rec/pit ratios obtained for 21,943 screened blood (serum) samples submitted to the WADA accredited laboratories over the period 2009–2013. To fit the relevant sex- and kit-specific centiles of the logs of the ratios, we classified ‘rec/pit’ ratios based on low ‘rec’ and ‘pit’ values as ‘negative’ and fitted statistical distributions to the remaining log-ratios. The flexible data-driven quantile regression approach allowed us to deal with the fact that the location, scale and shape of the distribution of the modeled ‘rec/pit’ ratios varied with the concentrations of the ‘rec’ and ‘pit’ values. The between-kit correlation of the ratios was included in the fitting of the DLs, and bootstrap samples were used to quantify the estimation error in these limits. We examined the performance of these limits by applying them to the data obtained from investigator-initiated hGH administration studies, and in athletes in a simulated cycling stage race. Results : The mean and spread of the distribution of the modeled log-ratios depended in different ways on the magnitude of the rec and pit concentrations. Ultimately, however, the estimated limits were almost invariant to the concentrations, and similar to those obtained by fitting simpler (marginal) log-normal and Box-Cox transformed distributions. The estimated limits were similar to the (currently-used) limits fitted to the smaller datasets analyzed previously. In investigator-initiated instances, the limits distinguished recent use of rec-hGH from non-use. Conclusions : The distributions of the rec/pit ratios varied as a function of the rec and pit concentrations, but the patterns in their medians and spreads largely cancelled each other. Thus, ultimately, the kit- and sex-specific ratio DL obtained from the simpler model was very close to the ‘curve of DLs’ obtained from the more complex one. Both were close to previously established limits.
    Growth Hormone & IGF Research. 01/2014;
  • Erica E M Moodie, David A Stephens, Marina B Klein
    [Show abstract] [Hide abstract]
    ABSTRACT: It is often the case that interest lies in the effect of an exposure on each of several distinct event types. For example, we are motivated to investigate in the impact of recent injection drug use on deaths due to each of cancer, end-stage liver disease, and overdose in the Canadian Co-infection Cohort (CCC). We develop a marginal structural model that permits estimation of cause-specific hazards in situations where more than one cause of death is of interest. Marginal structural models allow for the causal effect of treatment on outcome to be estimated using inverse-probability weighting under the assumption of no unmeasured confounding; these models are particularly useful in the presence of time-varying confounding variables, which may also mediate the effect of exposures. An asymptotic variance estimator is derived, and a cumulative incidence function estimator is given. We compare the performance of the proposed marginal structural model for multiple-outcome data to that of conventional competing risks models in simulated data and demonstrate the use of the proposed approach in the CCC. Copyright © 2013 John Wiley & Sons, Ltd.
    Statistics in Medicine 11/2013; · 2.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The pervasive use of prevalent cohort studies on disease duration, increasingly calls for appropriate methodologies to account for the biases that invariably accompany samples formed by such data. It is well-known, for example, that subjects with shorter lifetime are less likely to be present in such studies. Moreover, certain covariate values could be preferentially selected into the sample, being linked to the long-term survivors. The existing methodology for estimation of the propensity score using data collected on prevalent cases requires the correct conditional survival/hazard function given the treatment and covariates. This requirement can be alleviated if the disease under study has stationary incidence, the so-called stationarity assumption. We propose a nonparametric adjustment technique based on a weighted estimating equation for estimating the propensity score which does not require modeling the conditional survival/hazard function when the stationarity assumption holds. Large sample properties of the estimator is established and its small sample behavior is studied via simulation.
    Stat. 11/2013; 3(1).
  • Source
    Ashkan Ertefaie, Masoud Asgharian, David A. Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: In the causal adjustment setting, variable selection techniques based on either the outcome or treatment allocation model can result in the omission of confounders or the inclusion of spurious variables in the propensity score. We propose a variable selection method based on a penalized likelihood which considers the response and treatment assignment models simultaneously. We show that under some conditions our method attains the oracle property. The selected variables are used to form a double robust regression estimator of the treatment effect. Simulation results are presented and data from the National Supported Work Demonstration are analyzed.
    11/2013;
  • Daniel J. Graham, Emma J. McCoy, David A. Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: . The paper investigates the link between area-based socio-economic deprivation and the incidence of child pedestrian casualties. The analysis is conducted by using data for small spatial zones within major British cities over the period 2001–2007. Spatial longitudinal generalized linear mixed models, estimated by using frequentist and Bayesian approaches, are used to address issues of confounding, spatial dependence and transmission of deprivation effects across zones (i.e. interference). The results show a consistent strong deprivation effect across model specifications. The incidence of child pedestrian casualties in the most deprived zones is typically greater than 10 times that in the least deprived zones. Modelling interference through a spatially auto-regressive covariate uncovers a substantially larger effect.
    Journal of the Royal Statistical Society Series A (Statistics in Society) 10/2013; 176(4). · 1.36 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Purpose: Explore compliance with occlusion treatment of amblyopia in the Monitored and Randomised Occlusion Treatment of Amblyopia Studies (MOTAS and ROTAS), using objective monitoring. Methods: Both studies had a 3-phase protocol: initial assessment, refractive adaptation and occlusion. In the occlusion phase, participants were instructed to dose for 6 hrs/day (MOTAS) or randomized to 6 or 12 hrs/day (ROTAS). Dose was monitored continuously using an Occlusion Dose Monitor (ODM). Results: 152 patients (71 male, 81 female; 122 Caucasian, 30 non-Caucasian) of mean ± sd age 68±18 months participated. Amblyopia was defined as an inter-ocular acuity difference of at least 0.1 logMAR and was associated with anisometropia in 50, strabismus in 44, and both (mixed) in 58. Median duration of occlusion was 99 days (interquartile range 72 days). Mean compliance was 44%, mean proportion of days with no patch worn was 42%. Compliance was lower (39%) on weekends compared to weekdays (46%, p=0.04), as was the likelihood of dosing at all (52% vs. 60%, p=0.028). Compliance was lower when attendance was less frequent (p < 0.001) and with prolonged treatment duration (p<0.001). Age, gender, amblyopia type and severity were not associated with compliance. Mixture modelling suggested three subpopulations of patch day doses: under 30 minutes; doses that achieve 30%-80% compliance; and doses that achieve around 100% compliance. Conclusions: This study shows that compliance with patching treatment averages less than 50% and is influenced by several factors. A greater understanding of these influences should improve treatment outcome.
    Investigative ophthalmology & visual science 07/2013; · 3.43 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: To explore how stereoacuity changes in patients while they are being treated for amblyopia. The Monitored Occlusion Treatment for Amblyopia Study (MOTAS) comprised 3 distinct phases. In the first phase, baseline, assessments of visual function were made to confirm the initial visual and binocular visual deficit. The second phase, refractive adaptation, now commonly termed "optical treatment," was an 18-week period of spectacle wear with measurements of logMAR visual acuity and stereoacuity with the Frisby test at weeks 0, 6, 12, and 18. In the third phase, occlusion, participants were prescribed 6 hours of patching per day. A total of 85 children were enrolled (mean age, 5.1 ± 1.5 years). In 21 children amblyopia was associated with anisometropia; in 29, with strabismus; and in 35, with both. At study entry, poor stereoacuity was associated with poor visual acuity (P < 0.001) in the amblyopic eye and greater angle of strabismus (P < 0.001). Of 66 participants, 25 (38%) who received refractive adaptation and 19 (29%) who received occlusion improved by at least one octave in stereoacuity, exceeding test-retest variability. Overall, 38 (45%) improved one or more octaves across both treatment phases. Unmeasureable stereoacuity was observed in 56 participants (66%) at study entry and in 37 (43%) at study exit. Stereoacuity improved for almost one half of the study participants. Improvement was observed in both treatment phases. Factors associated with poor or nil stereoacuity at study entry and exit were poor visual acuity of the amblyopic eye and large-angle strabismus.
    Journal of AAPOS: the official publication of the American Association for Pediatric Ophthalmology and Strabismus / American Association for Pediatric Ophthalmology and Strabismus 04/2013; 17(2):166-73. · 1.07 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Antimicrobial use is thought to suppress the intestinal microbiota, thereby impairing colonization resistance and allowing Clostridium difficile to infect the gut. Additional risk factors such as proton-pump inhibitors may also alter the intestinal microbiota and predispose patients to Clostridium difficile infection (CDI). This comparative metagenomic study investigates the relationship between epidemiologic exposures, intestinal bacterial populations and subsequent development of CDI in hospitalized patients. We performed a nested case–control study including 25 CDI cases and 25 matched controls. Fecal specimens collected prior to disease onset were evaluated by 16S rRNA gene amplification and pyrosequencing to determine the composition of the intestinal microbiota during the at-risk period. Results The diversity of the intestinal microbiota was significantly reduced prior to an episode of CDI. Sequences corresponding to the phylum Bacteroidetes and to the families Bacteroidaceae and Clostridiales Incertae Sedis XI were depleted in CDI patients compared to controls, whereas sequences corresponding to the family Enterococcaceae were enriched. In multivariable analyses, cephalosporin and fluoroquinolone use, as well as a decrease in the abundance of Clostridiales Incertae Sedis XI were significantly and independently associated with CDI development. Conclusions This study shows that a reduction in the abundance of a specific bacterial family - Clostridiales Incertae Sedis XI - is associated with risk of nosocomial CDI and may represent a target for novel strategies to prevent this life-threatening infection.
    Microbiome. 01/2013; 1(1).
  • Source
    Vahid Partovi Nia, David Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: 1 Vahid Partovi Nia (vahid.partovinia@polymtl.ca) is Assistant Pro-fessor of Statistics at Department of Mathematics and Industrial Engi-neering Ecole Polytechnique Montréal, Canada and David A. Stephens (d.stephens@math.mcgill.ca) is Professor of Statistics at Department of Math-ematics and Statistics, McGill University, Canada. 2 Summary Clustering may be described as the partitioning of data into homogeneous groups or clusters. Classical clustering techniques employ a measure of dis-similarity and optimize a criterion in order to find the allocation of data such as the k-means. Modern approaches are based on a mixture model where ho-mogeneous groups of data follow the same distribution and are fitted using the EM algorithm. The result of clustering obtained by the k-means or the EM is sensitive to the starting values. One way of making the fitting proce-dure insensitive to the initial value is to assume that the data grouping is a random variable. This is called Bayesian clustering and its fitting involves stochastic search or sampling from the grouping posterior using Markov chain Monte Carlo. In Bayesian clustering, labels are used to show grouping of subjects and a dendrogram is a tree providing visual guide to different data grouping. We discuss the R package labeltodendro which links these two concepts and is made to achieve two goals: the first goal is to provide a flex-ible environment for plotting any arbitrary dendrogram, and the second is to summarize a matrix of integer labels produced by a Markov chain Monte Carlo sampler, using a dendrogram.
    03/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: There is considerable interest in cell biology in determining whether, and to what extent, the spatial arrangement of nuclear objects affects nuclear function. A common approach to address this issue involves analyzing a collection of images produced using some form of fluorescence microscopy. We assume that these images have been successfully pre-processed and a spatial point pattern representation of the objects of interest within the nuclear boundary is available. Typically in these scenarios, the number of objects per nucleus is low, which has consequences on the ability of standard analysis procedures to demonstrate the existence of spatial preference in the pattern. There are broadly two common approaches to look for structure in these spatial point patterns. First a spatial point pattern for each image is analyzed individually, or second a simple normalization is performed and the patterns are aggregated. In this paper we demonstrate using synthetic spatial point patterns drawn from predefined point processes how difficult it is to distinguish a pattern from complete spatial randomness using these techniques and hence how easy it is to miss interesting spatial preferences in the arrangement of nuclear objects. The impact of this problem is also illustrated on data related to the configuration of PML nuclear bodies in mammalian fibroblast cells.
    PLoS ONE 01/2012; 7(5):e36841. · 3.73 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nuclear Magnetic Resonance (NMR) spectra are widely used in metabolomics to obtain profiles of metabolites dissolved in biofluids such as cell supernatants. Methods for estimating metabolite concentrations from these spectra are presently confined to manual peak fitting and to binning procedures for integrating resonance peaks. Extensive information on the patterns of spectral resonance generated by human metabolites is now available in online databases. By incorporating this information into a Bayesian model we can deconvolve resonance peaks from a spectrum and obtain explicit concentration estimates for the corresponding metabolites. Spectral resonances that cannot be deconvolved in this way may also be of scientific interest so we model them jointly using wavelets. We describe a Markov chain Monte Carlo algorithm which allows us to sample from the joint posterior distribution of the model parameters, using specifically designed block updates to improve mixing. The strong prior on resonance patterns allows the algorithm to identify peaks corresponding to particular metabolites automatically, eliminating the need for manual peak assignment. We assess our method for peak alignment and concentration estimation. Except in cases when the target resonance signal is very weak, alignment is unbiased and precise. We compare the Bayesian concentration estimates to those obtained from a conventional numerical integration method and find that our point estimates have sixfold lower mean squared error. Finally, we apply our method to a spectral dataset taken from an investigation of the metabolic response of yeast to recombinant protein expression. We estimate the concentrations of 26 metabolites and compare to manual quantification by five expert spectroscopists. We discuss the reason for discrepancies and the robustness of our methods concentration estimates.
    05/2011;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT:   We investigate simulation methodology for Bayesian inference in Lévy-driven stochastic volatility (SV) models. Typically, Bayesian inference from such models is performed using Markov chain Monte Carlo (MCMC); this is often a challenging task. Sequential Monte Carlo (SMC) samplers are methods that can improve over MCMC; however, there are many user-set parameters to specify. We develop a fully automated SMC algorithm, which substantially improves over the standard MCMC methods in the literature. To illustrate our methodology, we look at a model comprised of a Heston model with an independent, additive, variance gamma process in the returns equation. The driving gamma process can capture the stylized behaviour of many financial time series and a discretized version, fit in a Bayesian manner, has been found to be very useful for modelling equity data. We demonstrate that it is possible to draw exact inference, in the sense of no time-discretization error, from the Bayesian SV model.
    Scandinavian Journal of Statistics 02/2011; 38(1):1 - 22. · 1.17 Impact Factor
  • Erica E M Moodie, D A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: Longitudinal data are increasingly available to health researchers; these present challenges not encountered in cross-sectional data, not the least of which is the presence of time-varying confounding variables and intermediate effects. We review confounding and mediation in a longitudinal setting and introduce causal graphs to explain the bias that arises from conventional analyses. When both time-varying confounding and mediation are present in the data, traditional regression models result in estimates of effect coefficients that are systematically incorrect, or biased. In a companion paper (Moodie and Stephens in Int J Publ Health, 2010b, this issue), we describe a class of models that yield unbiased estimates in a longitudinal setting.
    International Journal of Public Health 12/2010; 55(6):701-3. · 1.99 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In the past 20 years cell biologists have studied the cell nucleus extensively, aided by advances in cell imaging technology and microscopy. Consequently, the volume of image data of the cell nucleus – and the compartments it contains – is growing rapidly. The spatial organisation of these nuclear compartments is thought to be fundamentally associated with nuclear function. However, the rules that oversee nuclear architecture remain unclear and controversial. As a result, there is an increasing need to replace qualitative visual assessment of microscope images with quantitative and automated methods. Such tools can substantially reduce manual labour and more importantly remove subjective bias. Quantitative methods can also increase the accuracy, sensitivity and reproducibility of data analysis. In this paper, we describe image processing and analysis methodology for the investigation of nuclear architecture, and the application of these methods to quantitatively explore the promyelocytic leukaemia (PML) nuclear bodies (NBs). PML NBs are linked with numerous nuclear functions including transcription and protein degradation. However, we know very little about the three-dimensional (3-D) architecture of PML NBs in relation to each other or within the general volume of the nucleus. Finally, we emphasise methods for the analysis of replicate images (of a given nuclear compartment across different cell nuclei) in order to aggregate information about nuclear architecture.
    10/2010: pages 173-187;
  • Erica E M Moodie, D A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: In this article, we introduce Marginal Structural Models, which yield unbiased estimates of causal effects of exposures in the presence of time-varying confounding variables that also act as mediators. We describe estimation via inverse probability weighting; estimation may also be accomplished by g-computation (Robins in Latent Variable Modeling and Applications to Causality, Springer, New York, pp 69-117, 1997; van der Wal et al. in Stat Med 28:2325-2337, 2009) or targeted maximum likelihood (Rosenblum and van der Laan in Int J Biostat 6, 2010). When both time-varying confounding and mediation are present in a longitudinal setting data, Marginal Structural Models are a useful tool that provides unbiased estimates.
    International Journal of Public Health 10/2010; 56(1):117-9. · 1.99 Impact Factor
  • Erica E M Moodie, David A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: In a longitudinal study of dose-response, it is often necessary to adjust for confounding or non-compliance, which may otherwise compromise the estimation of the true effect of a treatment. Using an approach based on the generalised propensity score (GPS)--a generalisation of the classical, binary treatment propensity score--it is possible to construct a balancing score that provides an estimation procedure for the true (unconfounded) direct effect of dose on response. Previously, the GPS has been applied only in a single interval setting; in this article, we extend the GPS methodology to the longitudinal setting to estimate the direct effect of a continuous dose on a longitudinal response. The methodology is applied to two simulated examples, and a real longitudinal dose-response investigation, the Monitored Occlusion Treatment of Amblyopia Study (MOTAS). In the treatment of childhood amblyopia, a common ophthalmological condition, occlusion therapy (patching) was for many decades the standard medical treatment, despite the fact that its efficacy was not quantified. MOTAS was revolutionary, as it was the first study to obtain precise measurements of the amount of occlusion each study participant received over the course of the study.
    Statistical Methods in Medical Research 05/2010; 21(2):149-66. · 2.36 Impact Factor
  • Ashkan Ertefaie, David A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: In observational studies for causal effects, treatments are assigned to experimental units without the benefits of randomization. As a result, there is the potential for bias in the estimation of the treatment effect. Two methods for estimating the causal effect consistently are Inverse Probability of Treatment Weighting (IPTW) and the Propensity Score (PS). We demonstrate that in many simple cases, the PS method routinely produces estimators with lower Mean-Square Error (MSE). In the longitudinal setting, estimation of the causal effect of a time-dependent exposure in the presence of time-dependent covariates that are themselves affected by previous treatment also requires adjustment approaches. We describe an alternative approach to the classical binary treatment propensity score termed the Generalized Propensity Score (GPS). Previously, the GPS has mainly been applied in a single interval setting; we use an extension of the GPS approach to the longitudinal setting. We compare the strengths and weaknesses of IPTW and GPS for causal inference in three simulation studies and two real data sets. Again, in simulation, the GPS appears to produce estimators with lower MSE.
    The International Journal of Biostatistics 01/2010; 6(2):Article 14. · 1.28 Impact Factor
  • Erica E M Moodie, David A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: We provide a brief editorial introduction to a special issue of The International Journal of Biostatistics dedicated to several papers presented at a workshop held at the Banff International Research Station, Canada, in May 2009.
    The International Journal of Biostatistics 01/2010; 6(2):Article 1. · 1.28 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we discuss model checking with residual diagnostic plots for g-estimation of optimal dynamic treatment regimes. The g-estimation method requires three different model specifications at each treatment interval under consideration: (1) the blip model; (2) the expected counterfactual model; and (3) the propensity model. Of these, the expected counterfactual model is especially difficult to specify correctly in practice and so far there has been little guidance as to how to check for model misspecification. Residual plots are a useful and standard tool for model diagnostics in the classical regression setting; we have adapted this approach for g-estimation. We demonstrate the usefulness of our approach in a simulation study, and apply it to real data in the context of estimating the optimal time to stop breastfeeding.
    The International Journal of Biostatistics 01/2010; 6(2):Article 12. · 1.28 Impact Factor

Publication Stats

914 Citations
115.50 Total Impact Points

Institutions

  • 2007–2014
    • McGill University
      • • Department of Mathematics and Statistics
      • • Department of Epidemiology, Biostatistics and Occupational Health
      Montréal, Quebec, Canada
    • University of Washington Seattle
      • Department of Biostatistics
      Seattle, WA, United States
  • 2004–2012
    • Imperial College London
      • • Department of Mathematics
      • • Division of Molecular Biosciences
      London, ENG, United Kingdom
    • Athens University of Economics and Business
      Athínai, Attica, Greece
  • 2005–2007
    • City University London
      • Division of Optometry and Visual Science
      London, ENG, United Kingdom
  • 2006
    • University of Oxford
      • Department of Statistics
      Oxford, England, United Kingdom