Mixture Models for Single Cell Assays with Applications to VaccineStudies

Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center (FHCRC), Seattle, WA 98109, USA.
Biostatistics (Impact Factor: 2.24). 07/2013; DOI: 10.1093/biostatistics/kxt024
Source: arXiv

ABSTRACT In immunological studies, the characterization of small, functionally
distinct cell subsets from blood and tissue is crucial to decipher system level
biological changes. An increasing number of studies rely on assays that provide
single-cell measurements of multiple genes and proteins from bulk cell samples.
A common problem in the analysis of such data is to identify biomarkers (or
combinations of thereof) that are differentially expressed between two
biological conditions (e.g., before/after vaccination), where expression is
defined as the proportion of cells expressing the biomarker or combination in
the cell subset of interest.
Here, we present a Bayesian hierarchical framework based on a beta-binomial
mixture model for testing for differential biomarker expression using
single-cell assays. Our model allows inference to be subject specific, as is
typically required when accessing vaccine responses, while borrowing strength
across subjects through common prior distributions. We propose two approaches
for parameter estimation: an empirical-Bayes approach using an
Expectation-Maximization algorithm and a fully Bayesian one based on a Markov
chain Monte Carlo algorithm. We compare our method against frequentist
approaches for single-cell assays including Fisher's exact test, a likelihood
ratio test, and basic log-fold changes. Using several experimental assays
measuring proteins or genes at the single-cell level and simulated data, we
show that our method has higher sensitivity and specificity than alternative
methods. Additional simulations show that our framework is also robust to model
misspecification. Finally, we also demonstrate how our approach can be extended
to testing multivariate differential expression across multiple biomarker
combinations using a Dirichlet-multinomial model and illustrate this
multivariate approach using single-cell gene expression data and simulations.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Adaptive immune responses often begin with the formation of a molecular complex between a T-cell receptor (TCR) and a peptide antigen bound to a major histocompatibility complex (MHC) molecule. These complexes are highly variable, however, due to the polymorphism of MHC genes, the random, inexact recombination of TCR gene segments, and the vast array of possible self and pathogen peptide antigens. As a result, it has been very difficult to comprehensively study the TCR repertoire or identify and track more than a few antigen-specific T cells in mice or humans. For mouse studies, this had led to a reliance on model antigens and TCR transgenes. The study of limited human clinical samples, in contrast, requires techniques that can simultaneously survey TCR phenotype and function, and TCR reactivity to many T-cell epitopes. Thanks to recent advances in single-cell and cytometry methodologies, as well as high-throughput sequencing of the TCR repertoire, we now have or will soon have the tools needed to comprehensively analyze T-cell responses in health and disease.
    Nature Biotechnology 01/2014; 32(2). DOI:10.1038/nbt.2783
    This article is viewable in ResearchGate's enriched format
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Flow cytometry is used increasingly in clinical research for cancer, immunology and vaccines. Technological advances in cytometry instrumentation are increasing the size and dimensionality of data sets, posing a challenge for traditional data management and analysis. Automated analysis methods, despite a general consensus of their importance to the future of the field, have been slow to gain widespread adoption. Here we present OpenCyto, a new BioConductor infrastructure and data analysis framework designed to lower the barrier of entry to automated flow data analysis algorithms by addressing key areas that we believe have held back wider adoption of automated approaches. OpenCyto supports end-to-end data analysis that is robust and reproducible while generating results that are easy to interpret. We have improved the existing, widely used core BioConductor flow cytometry infrastructure by allowing analysis to scale in a memory efficient manner to the large flow data sets that arise in clinical trials, and integrating domain-specific knowledge as part of the pipeline through the hierarchical relationships among cell populations. Pipelines are defined through a text-based csv file, limiting the need to write data-specific code, and are data agnostic to simplify repetitive analysis for core facilities. We demonstrate how to analyze two large cytometry data sets: an intracellular cytokine staining (ICS) data set from a published HIV vaccine trial focused on detecting rare, antigen-specific T-cell populations, where we identify a new subset of CD8 T-cells with a vaccine-regimen specific response that could not be identified through manual analysis, and a CyTOF T-cell phenotyping data set where a large staining panel and many cell populations are a challenge for traditional analysis. The substantial improvements to the core BioConductor flow cytometry packages give OpenCyto the potential for wide adoption. It can rapidly leverage new developments in computational cytometry and facilitate reproducible analysis in a unified environment.
    PLoS Computational Biology 08/2014; 10(8):e1003806. DOI:10.1371/journal.pcbi.1003806
  • [Show abstract] [Hide abstract]
    ABSTRACT: The complex heterogeneity of cells, and their interconnectedness with each other, are major challenges to identifying clinically relevant measurements that reflect the state and capability of the immune system. Highly multiplexed, single-cell technologies may be critical for identifying correlates of disease or immunological interventions as well as for elucidating the underlying mechanisms of immunity. Here we review limitations of bulk measurements and explore advances in single-cell technologies that overcome these problems by expanding the depth and breadth of functional and phenotypic analysis in space and time. The geometric increases in complexity of data make formidable hurdles for exploring, analyzing and presenting results. We summarize recent approaches to making such computations tractable and discuss challenges for integrating heterogeneous data obtained using these single-cell technologies.
    Nature Immunology 01/2014; 15(2):128-35. DOI:10.1038/ni.2796


Available from
Oct 16, 2014