Mixture Models for Single Cell Assays with Applications to VaccineStudies

Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center (FHCRC), Seattle, WA 98109, USA.
Biostatistics (Impact Factor: 2.65). 07/2013; DOI: 10.1093/biostatistics/kxt024
Source: arXiv


In immunological studies, the characterization of small, functionally
distinct cell subsets from blood and tissue is crucial to decipher system level
biological changes. An increasing number of studies rely on assays that provide
single-cell measurements of multiple genes and proteins from bulk cell samples.
A common problem in the analysis of such data is to identify biomarkers (or
combinations of thereof) that are differentially expressed between two
biological conditions (e.g., before/after vaccination), where expression is
defined as the proportion of cells expressing the biomarker or combination in
the cell subset of interest.
Here, we present a Bayesian hierarchical framework based on a beta-binomial
mixture model for testing for differential biomarker expression using
single-cell assays. Our model allows inference to be subject specific, as is
typically required when accessing vaccine responses, while borrowing strength
across subjects through common prior distributions. We propose two approaches
for parameter estimation: an empirical-Bayes approach using an
Expectation-Maximization algorithm and a fully Bayesian one based on a Markov
chain Monte Carlo algorithm. We compare our method against frequentist
approaches for single-cell assays including Fisher's exact test, a likelihood
ratio test, and basic log-fold changes. Using several experimental assays
measuring proteins or genes at the single-cell level and simulated data, we
show that our method has higher sensitivity and specificity than alternative
methods. Additional simulations show that our framework is also robust to model
misspecification. Finally, we also demonstrate how our approach can be extended
to testing multivariate differential expression across multiple biomarker
combinations using a Dirichlet-multinomial model and illustrate this
multivariate approach using single-cell gene expression data and simulations.

Download full-text


Available from: Greg Finak, Oct 16, 2014
20 Reads
  • Source
    • "This allows users to easily share raw FCM data, together with associated analyses, and facilitates the comparison of automated or semi-automated gating approaches with manual gating, and enabling validation of automated gating schemes against expert manual results. Furthermore, these features facilitate collaboration between computational and non-computational researchers and have enabled the development of advanced downstream data analysis algorithms for FCM data in vaccine trials [28], [29], as well as a recent comprehensive comparison of automated gating algorithms via the FlowCAP effort [12]. The framework also facilitates extracting specific cell populations for downstream analysis from any step of a pipeline, as we demonstrate with the two data sets analyzed here [30]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Flow cytometry is used increasingly in clinical research for cancer, immunology and vaccines. Technological advances in cytometry instrumentation are increasing the size and dimensionality of data sets, posing a challenge for traditional data management and analysis. Automated analysis methods, despite a general consensus of their importance to the future of the field, have been slow to gain widespread adoption. Here we present OpenCyto, a new BioConductor infrastructure and data analysis framework designed to lower the barrier of entry to automated flow data analysis algorithms by addressing key areas that we believe have held back wider adoption of automated approaches. OpenCyto supports end-to-end data analysis that is robust and reproducible while generating results that are easy to interpret. We have improved the existing, widely used core BioConductor flow cytometry infrastructure by allowing analysis to scale in a memory efficient manner to the large flow data sets that arise in clinical trials, and integrating domain-specific knowledge as part of the pipeline through the hierarchical relationships among cell populations. Pipelines are defined through a text-based csv file, limiting the need to write data-specific code, and are data agnostic to simplify repetitive analysis for core facilities. We demonstrate how to analyze two large cytometry data sets: an intracellular cytokine staining (ICS) data set from a published HIV vaccine trial focused on detecting rare, antigen-specific T-cell populations, where we identify a new subset of CD8 T-cells with a vaccine-regimen specific response that could not be identified through manual analysis, and a CyTOF T-cell phenotyping data set where a large staining panel and many cell populations are a challenge for traditional analysis. The substantial improvements to the core BioConductor flow cytometry packages give OpenCyto the potential for wide adoption. It can rapidly leverage new developments in computational cytometry and facilitate reproducible analysis in a unified environment.
    PLoS Computational Biology 08/2014; 10(8):e1003806. DOI:10.1371/journal.pcbi.1003806 · 4.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Highly multiplexed, single cell technologies reveal important heterogeneity within cell populations. Recently, technologies to simultaneously measure expression of 96 (or more) genes from a single cell have been developed for immunologic monitoring. Here, we report a rigorous, optimized, quantitative methodology for using this technology. Specifically: we describe a unique primer/probe qualification method necessary for quantitative results; we show that primers do not compete in highly multiplexed amplifications; we define the limit of detection for this assay as a single mRNA transcript; and, we show that the technical reproducibility of the system is very high. We illustrate two disparate applications of the platform: a "bulk" approach that measures expression patterns from 100 cells at a time in high throughput to define gene signatures, and a single-cell approach to define the coordinate expression patterns of multiple genes and reveal unique subsets of cells.
    Journal of immunological methods 03/2013; 391(1-2). DOI:10.1016/j.jim.2013.03.002 · 1.82 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cutting edge immune monitoring techniques increasingly measure multiple functional outputs for various cell types, such as intracellular cytokine staining (ICS) assays that measure cytokines expressed by T cells. To date, however, there is no precise method to measure virus-specific cytokine production by both T cells as well as NK cells in the same well, which is important to a greater extent given recent identification of NK cells expressing a memory phenotype. This study describes an adaptable and efficient ICS assay platform that can be used to detect antigen-driven cytokine production by human T cells and NK cells, termed "viral ICS". Importantly, this assay uses limited amount of cryopreserved PBMCs along with autologous heat-inactivated serum, thereby allowing for this assay to be performed when sample is scarce as well as geographically distant from the laboratory. Compared to a standard ICS assay that detects antigen-specific T cell cytokine expression alone, the viral ICS assay is comparable in terms of both HIV-specific CD4 and CD8T cell cytokine response rates and magnitude of response, with the added advantage of ability to detect virus-specific NK cell responses.
    Journal of virological methods 01/2014; 199. DOI:10.1016/j.jviromet.2014.01.003 · 1.78 Impact Factor
Show more