The Stanford Data Miner: A novel approach for integrating and exploring heterogeneous immunological data

CytoAnalytics, Denver, CO, USA.
Journal of Translational Medicine (Impact Factor: 3.93). 03/2012; 10(1):62. DOI: 10.1186/1479-5876-10-62
Source: PubMed


Systems-level approaches are increasingly common in both murine and human translational studies. These approaches employ multiple high information content assays. As a result, there is a need for tools to integrate heterogeneous types of laboratory and clinical/demographic data, and to allow the exploration of that data by aggregating and/or segregating results based on particular variables (e.g., mean cytokine levels by age and gender).
Here we describe the application of standard data warehousing tools to create a novel environment for user-driven upload, integration, and exploration of heterogeneous data. The system presented here currently supports flow cytometry and immunoassays performed in the Stanford Human Immune Monitoring Center, but could be applied more generally.
Users upload assay results contained in platform-specific spreadsheets of a defined format, and clinical and demographic data in spreadsheets of flexible format. Users then map sample IDs to connect the assay results with the metadata. An OLAP (on-line analytical processing) data exploration interface allows filtering and display of various dimensions (e.g., Luminex analytes in rows, treatment group in columns, filtered on a particular study). Statistics such as mean, median, and N can be displayed. The views can be expanded or contracted to aggregate or segregate data at various levels. Individual-level data is accessible with a single click. The result is a user-driven system that permits data integration and exploration in a variety of settings. We show how the system can be used to find gender-specific differences in serum cytokine levels, and compare them across experiments and assay types.
We have used the tools and techniques of data warehousing, including open-source business intelligence software, to support investigator-driven data integration and mining of diverse immunological data.

Download full-text


Available from: Wes Munsil
  • [Show abstract] [Hide abstract]
    ABSTRACT: Purpose: Common variable immunodeficiency (CVID) comprises a heterogeneous group of primary immunodeficiency disorders. Immunophenotyping of memory B cells at the time of diagnosis is increasingly used for the classification of patients into subgroups with different clinical prognoses. The EUROclass classification is a widely used method. Levels of somatic hypermutation (SHM) have proven useful as a prognostic marker for recurrent respiratory tract infections. As time of presentation and diagnosis is highly variable in CVID patients, and diagnostic delay is a common problem, it is important to know whether classification parameters are stable over time. The purpose of the study was to address this question in a cohort of 33 CVID patients followed from 3 to 19 years after diagnosis (average follow-up 8.8 years). Methods: Levels of class-switched memory B cells were analyzed using flow cytometric immunophenotyping, and patients were classified according to the EUROclass criteria. Affinity maturation of B cells was measured using Igκ-REHMA, which assesses somatic hypermutation in kappa light chain transcripts. Clinical manifestations in terms of splenomegaly, autoimmune disease and granulomatous disease were also determined. Results: Switched memory B cells and levels of SHM were not consistently stable markers in a long-term follow-up setting. At a given time during follow-up, 60% of the patients were assigned to the EUROclass group SmB- (less than 2% switched memory B cells), but only 23% were consistently assigned to this group. Associations between clinical manifestations and levels of switched memory B cells or SHM were not observed in our study. Conclusion: Based on our findings, we suggest that immunologic characteristics in CVID patients should be evaluated several times after diagnosis using internationally standardized methods.
    No preview · Article · May 2013 · Journal of Clinical Immunology
  • [Show abstract] [Hide abstract]
    ABSTRACT: Bioanalysts and immunologists can interrogate the immune system with a variety of high-throughput technologies such as gene expression, multiplex bead arrays and flow cytometry. Conceptually, these assays support systems immunology studies, in which phenomena can be measured and correlated across biological compartments. First, however, the resulting high-dimensional data must be combined in a consistent fashion that supports analysis of the data as an integrated whole. Next, analytical methods must be applied to the hundreds or thousands of readouts. We recommend the use of a four-part analytical pipeline, consisting of data integration, hypothesis generation, prediction and hypothesis testing, and validation. We describe a variety of established methods appropriate for these integrated datasets, and highlight their application to human immunological studies. Our goal is to provide bioanalysts, immunologists and data analysts with a valuable perspective with which to approach the multiassay high-dimensional datasets generated by contemporary immunological studies.
    No preview · Article · Jan 2014 · Bioanalysis

  • No preview · Article · Jan 2014 · Nature Biotechnology
Show more