David A Stephens

McGill University, Montréal, Quebec, Canada

Are you David A Stephens?

Claim your profile

Publications (66)127.59 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: The nuclei of higher eukaryotic cells display compartmentalization and certain nuclear compartments have been shown to follow a degree of spatial organization. To date, the study of nuclear organization has often involved simple quantitative procedures that struggle with both the irregularity of the nuclear boundary and the problem of handling replicate images. Such studies typically focus on inter-object distance, rather than spatial location within the nucleus. The concern of this paper is the spatial preference of nuclear compartments, for which we have developed statistical tools to quantitatively study and explore nuclear organization. These tools combine replicate images to generate 'aggregate maps' which represent the spatial preferences of nuclear compartments. We present two examples of different compartments in mammalian fibroblasts (WI-38 and MRC-5) that demonstrate new knowledge of spatial preference within the cell nucleus. Specifically, the spatial preference of RNA polymerase II is preserved across normal and immortalized cells, whereas PML nuclear bodies exhibit a change in spatial preference from avoiding the centre in normal cells to exhibiting a preference for the centre in immortalized cells. In addition, we show that SC35 splicing speckles are excluded from the nuclear boundary and localize throughout the nucleoplasm and in the interchromatin space in non-transformed WI-38 cells. This new methodology is thus able to reveal the effect of large-scale perturbation on spatial architecture and preferences that would not be obvious from single cell imaging.
    Journal of the Royal Society, Interface / the Royal Society. 03/2015; 12(104).
  • [Show abstract] [Hide abstract]
    ABSTRACT: The purpose of inverse probability of treatment (IPT) weighting in estimation of marginal treatment effects is to construct a pseudo-population without imbalances in measured covariates, thus removing the effects of confounding and informative censoring when performing inference. In this article, we formalize the notion of such a pseudo-population as a data generating mechanism with particular characteristics, and show that this leads to a natural Bayesian interpretation of IPT weighted estimation. Using this interpretation, we are able to propose the first fully Bayesian procedure for estimating parameters of marginal structural models using an IPT weighting. Our approach suggests that the weights should be derived from the posterior predictive treatment assignment and censoring probabilities, answering the question of whether and how the uncertainty in the estimation of the weights should be incorporated in Bayesian inference of marginal treatment effects. The proposed approach is compared to existing methods in simulated data, and applied to an analysis of the Canadian Co-infection Cohort. © 2015, The International Biometric Society.
    Biometrics 02/2015; · 1.52 Impact Factor
  • Daniel J. Graham, Emma J. McCoy, David A. Stephens
    Bayesian Analysis. 01/2015;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Objective : To detect hGH doping in sport, the World Anti-Doping Agency (WADA)-accredited laboratories use the ratio of the concentrations of recombinant hGH (‘rec’) versus other ‘natural’pituitary-derived isoforms of hGH (‘pit’), measured with two different kits developed specifically to detect the administration of exogenous hGH. The current joint compliance decision limits (DLs) for ratios derived from these kits, designed so that they would both be exceeded in fewer than 1 in 10,000 samples from non-doping athletes, are based on data accrued in anti-doping labs up to March 2010, and later confirmed with data up to February-March 2011. In April 2013, WADA asked the authors to analyze the now much larger set of ratios collected in routine hGH testing of athletes, and to document in the peer-reviewed literature a statistical procedure for establishing DLs, so that it be re-applied as more data become available. Design : We examined the variation in the rec/pit ratios obtained for 21,943 screened blood (serum) samples submitted to the WADA accredited laboratories over the period 2009–2013. To fit the relevant sex- and kit-specific centiles of the logs of the ratios, we classified ‘rec/pit’ ratios based on low ‘rec’ and ‘pit’ values as ‘negative’ and fitted statistical distributions to the remaining log-ratios. The flexible data-driven quantile regression approach allowed us to deal with the fact that the location, scale and shape of the distribution of the modeled ‘rec/pit’ ratios varied with the concentrations of the ‘rec’ and ‘pit’ values. The between-kit correlation of the ratios was included in the fitting of the DLs, and bootstrap samples were used to quantify the estimation error in these limits. We examined the performance of these limits by applying them to the data obtained from investigator-initiated hGH administration studies, and in athletes in a simulated cycling stage race. Results : The mean and spread of the distribution of the modeled log-ratios depended in different ways on the magnitude of the rec and pit concentrations. Ultimately, however, the estimated limits were almost invariant to the concentrations, and similar to those obtained by fitting simpler (marginal) log-normal and Box-Cox transformed distributions. The estimated limits were similar to the (currently-used) limits fitted to the smaller datasets analyzed previously. In investigator-initiated instances, the limits distinguished recent use of rec-hGH from non-use. Conclusions : The distributions of the rec/pit ratios varied as a function of the rec and pit concentrations, but the patterns in their medians and spreads largely cancelled each other. Thus, ultimately, the kit- and sex-specific ratio DL obtained from the simpler model was very close to the ‘curve of DLs’ obtained from the more complex one. Both were close to previously established limits.
    Growth Hormone & IGF Research 10/2014; · 1.33 Impact Factor
  • Daniel J. Graham, Emma J. McCoy, David A. Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: Road network capacity expansions are frequently proposed as solutions to urban traffic congestion but are controversial because it is thought that they can directly “induce” growth in traffic volumes. This article quantifies causal effects of road network capacity expansions on aggregate urban traffic volume and density in U.S. cities using a mixed model propensity score (PS) estimator. The motivation for this approach is that we seek to estimate a dose-response relationship between capacity and volume but suspect confounding from both observed and unobserved characteristics. Analytical results and simulations show that a longitudinal mixed model PS approach can be used to adjust effectively for time-invariant unobserved confounding via random effects (RE). Our empirical results indicate that network capacity expansions can cause substantial increases in aggregate urban traffic volumes such that even major capacity increases can actually lead to little or no reduction in network traffic densities. This result has important implications for optimal urban transportation strategies. Supplementary materials for this article are available online.
    Journal of the American Statistical Association 10/2014; 109(508). · 2.11 Impact Factor
  • Benjamin Rich, Erica Em Moodie, David A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: Due to the cost and complexity of conducting a sequential multiple assignment randomized trial (SMART), it is desirable to pre-define a small number of personalized regimes to study. We proposed a simulation-based approach to studying personalized dosing strategies in contexts for which a therapeutic agent's pharmacokinetic and pharmacodynamics properties are well understood. We take dosing of warfarin as a case study, as its properties are well understood. We consider a SMART in which there are five intervention points in which dosing may be modified, following a loading phase of treatment. Realistic SMARTs are simulated, and two methods of analysis, G-estimation and Q-learning, are used to assess potential personalized dosing strategies. In settings where outcome modelling may be complex due to the highly non-linear nature of the pharmacokinetic and pharmacodynamics mechanisms of the therapeutic agent, G-estimation provides for which the more promising method of estimating an optimal dosing strategy. Used in combination with the simulated SMARTs, we were able to improve simulated patient outcomes and suggest which patient characteristics were needed to best individually tailor dosing. In particular, our simulations suggest that current dosing should be determined by an individual's current coagulation time as measured by the international normalized ratio (INR), their last measured INR, and their last dose. Tailoring treatment only based on current INR and last warfarin dose provided inferior control of INR over the course of the trial. The ability of the simulated SMARTs to suggest optimal personalized dosing strategies relies on the pharmacokinetic and pharmacodynamic models used to generate the hypothetical patient profiles. This approach is best suited to therapeutic agents whose effects are well studied. Prior to investing in a complex randomized trial that involves sequential treatment allocations, simulations should be used where possible in order to guide which dosing strategies to evaluate.
    Clinical Trials 01/2014; · 1.94 Impact Factor
  • Erica E M Moodie, David A Stephens, Marina B Klein
    [Show abstract] [Hide abstract]
    ABSTRACT: It is often the case that interest lies in the effect of an exposure on each of several distinct event types. For example, we are motivated to investigate in the impact of recent injection drug use on deaths due to each of cancer, end-stage liver disease, and overdose in the Canadian Co-infection Cohort (CCC). We develop a marginal structural model that permits estimation of cause-specific hazards in situations where more than one cause of death is of interest. Marginal structural models allow for the causal effect of treatment on outcome to be estimated using inverse-probability weighting under the assumption of no unmeasured confounding; these models are particularly useful in the presence of time-varying confounding variables, which may also mediate the effect of exposures. An asymptotic variance estimator is derived, and a cumulative incidence function estimator is given. We compare the performance of the proposed marginal structural model for multiple-outcome data to that of conventional competing risks models in simulated data and demonstrate the use of the proposed approach in the CCC. Copyright © 2013 John Wiley & Sons, Ltd.
    Statistics in Medicine 11/2013; 33(8). · 2.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The pervasive use of prevalent cohort studies on disease duration, increasingly calls for appropriate methodologies to account for the biases that invariably accompany samples formed by such data. It is well-known, for example, that subjects with shorter lifetime are less likely to be present in such studies. Moreover, certain covariate values could be preferentially selected into the sample, being linked to the long-term survivors. The existing methodology for estimation of the propensity score using data collected on prevalent cases requires the correct conditional survival/hazard function given the treatment and covariates. This requirement can be alleviated if the disease under study has stationary incidence, the so-called stationarity assumption. We propose a nonparametric adjustment technique based on a weighted estimating equation for estimating the propensity score which does not require modeling the conditional survival/hazard function when the stationarity assumption holds. Large sample properties of the estimator is established and its small sample behavior is studied via simulation.
    Stat. 11/2013; 3(1).
  • Source
    Ashkan Ertefaie, Masoud Asgharian, David A. Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: In the causal adjustment setting, variable selection techniques based on either the outcome or treatment allocation model can result in the omission of confounders or the inclusion of spurious variables in the propensity score. We propose a variable selection method based on a penalized likelihood which considers the response and treatment assignment models simultaneously. We show that under some conditions our method attains the oracle property. The selected variables are used to form a double robust regression estimator of the treatment effect. Simulation results are presented and data from the National Supported Work Demonstration are analyzed.
    11/2013;
  • Daniel J. Graham, Emma J. McCoy, David A. Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: . The paper investigates the link between area-based socio-economic deprivation and the incidence of child pedestrian casualties. The analysis is conducted by using data for small spatial zones within major British cities over the period 2001–2007. Spatial longitudinal generalized linear mixed models, estimated by using frequentist and Bayesian approaches, are used to address issues of confounding, spatial dependence and transmission of deprivation effects across zones (i.e. interference). The results show a consistent strong deprivation effect across model specifications. The incidence of child pedestrian casualties in the most deprived zones is typically greater than 10 times that in the least deprived zones. Modelling interference through a spatially auto-regressive covariate uncovers a substantially larger effect.
    Journal of the Royal Statistical Society Series A (Statistics in Society) 10/2013; 176(4). · 1.36 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Purpose: Explore compliance with occlusion treatment of amblyopia in the Monitored and Randomised Occlusion Treatment of Amblyopia Studies (MOTAS and ROTAS), using objective monitoring. Methods: Both studies had a 3-phase protocol: initial assessment, refractive adaptation and occlusion. In the occlusion phase, participants were instructed to dose for 6 hrs/day (MOTAS) or randomized to 6 or 12 hrs/day (ROTAS). Dose was monitored continuously using an Occlusion Dose Monitor (ODM). Results: 152 patients (71 male, 81 female; 122 Caucasian, 30 non-Caucasian) of mean ± sd age 68±18 months participated. Amblyopia was defined as an inter-ocular acuity difference of at least 0.1 logMAR and was associated with anisometropia in 50, strabismus in 44, and both (mixed) in 58. Median duration of occlusion was 99 days (interquartile range 72 days). Mean compliance was 44%, mean proportion of days with no patch worn was 42%. Compliance was lower (39%) on weekends compared to weekdays (46%, p=0.04), as was the likelihood of dosing at all (52% vs. 60%, p=0.028). Compliance was lower when attendance was less frequent (p < 0.001) and with prolonged treatment duration (p<0.001). Age, gender, amblyopia type and severity were not associated with compliance. Mixture modelling suggested three subpopulations of patch day doses: under 30 minutes; doses that achieve 30%-80% compliance; and doses that achieve around 100% compliance. Conclusions: This study shows that compliance with patching treatment averages less than 50% and is influenced by several factors. A greater understanding of these influences should improve treatment outcome.
    Investigative ophthalmology & visual science 07/2013; · 3.43 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: To explore how stereoacuity changes in patients while they are being treated for amblyopia. The Monitored Occlusion Treatment for Amblyopia Study (MOTAS) comprised 3 distinct phases. In the first phase, baseline, assessments of visual function were made to confirm the initial visual and binocular visual deficit. The second phase, refractive adaptation, now commonly termed "optical treatment," was an 18-week period of spectacle wear with measurements of logMAR visual acuity and stereoacuity with the Frisby test at weeks 0, 6, 12, and 18. In the third phase, occlusion, participants were prescribed 6 hours of patching per day. A total of 85 children were enrolled (mean age, 5.1 ± 1.5 years). In 21 children amblyopia was associated with anisometropia; in 29, with strabismus; and in 35, with both. At study entry, poor stereoacuity was associated with poor visual acuity (P < 0.001) in the amblyopic eye and greater angle of strabismus (P < 0.001). Of 66 participants, 25 (38%) who received refractive adaptation and 19 (29%) who received occlusion improved by at least one octave in stereoacuity, exceeding test-retest variability. Overall, 38 (45%) improved one or more octaves across both treatment phases. Unmeasureable stereoacuity was observed in 56 participants (66%) at study entry and in 37 (43%) at study exit. Stereoacuity improved for almost one half of the study participants. Improvement was observed in both treatment phases. Factors associated with poor or nil stereoacuity at study entry and exit were poor visual acuity of the amblyopic eye and large-angle strabismus.
    Journal of AAPOS: the official publication of the American Association for Pediatric Ophthalmology and Strabismus / American Association for Pediatric Ophthalmology and Strabismus 04/2013; 17(2):166-73. · 1.07 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Antimicrobial use is thought to suppress the intestinal microbiota, thereby impairing colonization resistance and allowing Clostridium difficile to infect the gut. Additional risk factors such as proton-pump inhibitors may also alter the intestinal microbiota and predispose patients to Clostridium difficile infection (CDI). This comparative metagenomic study investigates the relationship between epidemiologic exposures, intestinal bacterial populations and subsequent development of CDI in hospitalized patients. We performed a nested case–control study including 25 CDI cases and 25 matched controls. Fecal specimens collected prior to disease onset were evaluated by 16S rRNA gene amplification and pyrosequencing to determine the composition of the intestinal microbiota during the at-risk period. Results The diversity of the intestinal microbiota was significantly reduced prior to an episode of CDI. Sequences corresponding to the phylum Bacteroidetes and to the families Bacteroidaceae and Clostridiales Incertae Sedis XI were depleted in CDI patients compared to controls, whereas sequences corresponding to the family Enterococcaceae were enriched. In multivariable analyses, cephalosporin and fluoroquinolone use, as well as a decrease in the abundance of Clostridiales Incertae Sedis XI were significantly and independently associated with CDI development. Conclusions This study shows that a reduction in the abundance of a specific bacterial family - Clostridiales Incertae Sedis XI - is associated with risk of nosocomial CDI and may represent a target for novel strategies to prevent this life-threatening infection.
    Microbiome. 01/2013; 1(1).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: There is considerable interest in cell biology in determining whether, and to what extent, the spatial arrangement of nuclear objects affects nuclear function. A common approach to address this issue involves analyzing a collection of images produced using some form of fluorescence microscopy. We assume that these images have been successfully pre-processed and a spatial point pattern representation of the objects of interest within the nuclear boundary is available. Typically in these scenarios, the number of objects per nucleus is low, which has consequences on the ability of standard analysis procedures to demonstrate the existence of spatial preference in the pattern. There are broadly two common approaches to look for structure in these spatial point patterns. First a spatial point pattern for each image is analyzed individually, or second a simple normalization is performed and the patterns are aggregated. In this paper we demonstrate using synthetic spatial point patterns drawn from predefined point processes how difficult it is to distinguish a pattern from complete spatial randomness using these techniques and hence how easy it is to miss interesting spatial preferences in the arrangement of nuclear objects. The impact of this problem is also illustrated on data related to the configuration of PML nuclear bodies in mammalian fibroblast cells.
    PLoS ONE 05/2012; 7(5):e36841. · 3.53 Impact Factor
  • Source
    Vahid Partovi Nia, David Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: 1 Vahid Partovi Nia (vahid.partovinia@polymtl.ca) is Assistant Pro-fessor of Statistics at Department of Mathematics and Industrial Engi-neering Ecole Polytechnique Montréal, Canada and David A. Stephens (d.stephens@math.mcgill.ca) is Professor of Statistics at Department of Math-ematics and Statistics, McGill University, Canada. 2 Summary Clustering may be described as the partitioning of data into homogeneous groups or clusters. Classical clustering techniques employ a measure of dis-similarity and optimize a criterion in order to find the allocation of data such as the k-means. Modern approaches are based on a mixture model where ho-mogeneous groups of data follow the same distribution and are fitted using the EM algorithm. The result of clustering obtained by the k-means or the EM is sensitive to the starting values. One way of making the fitting proce-dure insensitive to the initial value is to assume that the data grouping is a random variable. This is called Bayesian clustering and its fitting involves stochastic search or sampling from the grouping posterior using Markov chain Monte Carlo. In Bayesian clustering, labels are used to show grouping of subjects and a dendrogram is a tree providing visual guide to different data grouping. We discuss the R package labeltodendro which links these two concepts and is made to achieve two goals: the first goal is to provide a flex-ible environment for plotting any arbitrary dendrogram, and the second is to summarize a matrix of integer labels produced by a Markov chain Monte Carlo sampler, using a dendrogram.
    03/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nuclear Magnetic Resonance (NMR) spectra are widely used in metabolomics to obtain profiles of metabolites dissolved in biofluids such as cell supernatants. Methods for estimating metabolite concentrations from these spectra are presently confined to manual peak fitting and to binning procedures for integrating resonance peaks. Extensive information on the patterns of spectral resonance generated by human metabolites is now available in online databases. By incorporating this information into a Bayesian model we can deconvolve resonance peaks from a spectrum and obtain explicit concentration estimates for the corresponding metabolites. Spectral resonances that cannot be deconvolved in this way may also be of scientific interest so we model them jointly using wavelets. We describe a Markov chain Monte Carlo algorithm which allows us to sample from the joint posterior distribution of the model parameters, using specifically designed block updates to improve mixing. The strong prior on resonance patterns allows the algorithm to identify peaks corresponding to particular metabolites automatically, eliminating the need for manual peak assignment. We assess our method for peak alignment and concentration estimation. Except in cases when the target resonance signal is very weak, alignment is unbiased and precise. We compare the Bayesian concentration estimates to those obtained from a conventional numerical integration method and find that our point estimates have sixfold lower mean squared error. Finally, we apply our method to a spectral dataset taken from an investigation of the metabolic response of yeast to recombinant protein expression. We estimate the concentrations of 26 metabolites and compare to manual quantification by five expert spectroscopists. We discuss the reason for discrepancies and the robustness of our methods concentration estimates.
    05/2011;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT:   We investigate simulation methodology for Bayesian inference in Lévy-driven stochastic volatility (SV) models. Typically, Bayesian inference from such models is performed using Markov chain Monte Carlo (MCMC); this is often a challenging task. Sequential Monte Carlo (SMC) samplers are methods that can improve over MCMC; however, there are many user-set parameters to specify. We develop a fully automated SMC algorithm, which substantially improves over the standard MCMC methods in the literature. To illustrate our methodology, we look at a model comprised of a Heston model with an independent, additive, variance gamma process in the returns equation. The driving gamma process can capture the stylized behaviour of many financial time series and a discretized version, fit in a Bayesian manner, has been found to be very useful for modelling equity data. We demonstrate that it is possible to draw exact inference, in the sense of no time-discretization error, from the Bayesian SV model.
    Scandinavian Journal of Statistics 02/2011; 38(1):1 - 22. · 1.06 Impact Factor
  • Erica E M Moodie, D A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: Longitudinal data are increasingly available to health researchers; these present challenges not encountered in cross-sectional data, not the least of which is the presence of time-varying confounding variables and intermediate effects. We review confounding and mediation in a longitudinal setting and introduce causal graphs to explain the bias that arises from conventional analyses. When both time-varying confounding and mediation are present in the data, traditional regression models result in estimates of effect coefficients that are systematically incorrect, or biased. In a companion paper (Moodie and Stephens in Int J Publ Health, 2010b, this issue), we describe a class of models that yield unbiased estimates in a longitudinal setting.
    International Journal of Public Health 12/2010; 55(6):701-3. · 1.99 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In the past 20 years cell biologists have studied the cell nucleus extensively, aided by advances in cell imaging technology and microscopy. Consequently, the volume of image data of the cell nucleus – and the compartments it contains – is growing rapidly. The spatial organisation of these nuclear compartments is thought to be fundamentally associated with nuclear function. However, the rules that oversee nuclear architecture remain unclear and controversial. As a result, there is an increasing need to replace qualitative visual assessment of microscope images with quantitative and automated methods. Such tools can substantially reduce manual labour and more importantly remove subjective bias. Quantitative methods can also increase the accuracy, sensitivity and reproducibility of data analysis. In this paper, we describe image processing and analysis methodology for the investigation of nuclear architecture, and the application of these methods to quantitatively explore the promyelocytic leukaemia (PML) nuclear bodies (NBs). PML NBs are linked with numerous nuclear functions including transcription and protein degradation. However, we know very little about the three-dimensional (3-D) architecture of PML NBs in relation to each other or within the general volume of the nucleus. Finally, we emphasise methods for the analysis of replicate images (of a given nuclear compartment across different cell nuclei) in order to aggregate information about nuclear architecture.
    10/2010: pages 173-187;
  • Erica E M Moodie, D A Stephens
    [Show abstract] [Hide abstract]
    ABSTRACT: In this article, we introduce Marginal Structural Models, which yield unbiased estimates of causal effects of exposures in the presence of time-varying confounding variables that also act as mediators. We describe estimation via inverse probability weighting; estimation may also be accomplished by g-computation (Robins in Latent Variable Modeling and Applications to Causality, Springer, New York, pp 69-117, 1997; van der Wal et al. in Stat Med 28:2325-2337, 2009) or targeted maximum likelihood (Rosenblum and van der Laan in Int J Biostat 6, 2010). When both time-varying confounding and mediation are present in a longitudinal setting data, Marginal Structural Models are a useful tool that provides unbiased estimates.
    International Journal of Public Health 10/2010; 56(1):117-9. · 1.99 Impact Factor

Publication Stats

1k Citations
127.59 Total Impact Points

Institutions

  • 2007–2014
    • McGill University
      • • Department of Mathematics and Statistics
      • • Department of Epidemiology, Biostatistics and Occupational Health
      Montréal, Quebec, Canada
  • 2004–2008
    • Imperial College London
      • Department of Mathematics
      London, ENG, United Kingdom
    • Athens University of Economics and Business
      Athínai, Attica, Greece
  • 2005–2007
    • City University London
      • Division of Optometry and Visual Science
      London, ENG, United Kingdom
  • 2006
    • University of Oxford
      • Department of Statistics
      Oxford, England, United Kingdom