[Show abstract][Hide abstract] ABSTRACT: Flow cytometry is used increasingly in clinical research for cancer, immunology and vaccines. Technological advances in cytometry instrumentation are increasing the size and dimensionality of data sets, posing a challenge for traditional data management and analysis. Automated analysis methods, despite a general consensus of their importance to the future of the field, have been slow to gain widespread adoption. Here we present OpenCyto, a new BioConductor infrastructure and data analysis framework designed to lower the barrier of entry to automated flow data analysis algorithms by addressing key areas that we believe have held back wider adoption of automated approaches. OpenCyto supports end-to-end data analysis that is robust and reproducible while generating results that are easy to interpret. We have improved the existing, widely used core BioConductor flow cytometry infrastructure by allowing analysis to scale in a memory efficient manner to the large flow data sets that arise in clinical trials, and integrating domain-specific knowledge as part of the pipeline through the hierarchical relationships among cell populations. Pipelines are defined through a text-based csv file, limiting the need to write data-specific code, and are data agnostic to simplify repetitive analysis for core facilities. We demonstrate how to analyze two large cytometry data sets: an intracellular cytokine staining (ICS) data set from a published HIV vaccine trial focused on detecting rare, antigen-specific T-cell populations, where we identify a new subset of CD8 T-cells with a vaccine-regimen specific response that could not be identified through manual analysis, and a CyTOF T-cell phenotyping data set where a large staining panel and many cell populations are a challenge for traditional analysis. The substantial improvements to the core BioConductor flow cytometry packages give OpenCyto the potential for wide adoption. It can rapidly leverage new developments in computational cytometry and facilitate reproducible analysis in a unified environment.
[Show abstract][Hide abstract] ABSTRACT: Advances in high-throughput, single cell gene expression are allowing interrogation of cell heterogeneity. However, there is concern that the cell cycle phase of a cell might bias characterizations of gene expression at the single-cell level. We assess the effect of cell cycle phase on gene expression in single cells by measuring 333 genes in 930 cells across three phases and three cell lines. We determine each cell's phase non-invasively without chemical arrest and use it as a covariate in tests of differential expression. We observe bi-modal gene expression, a previously-described phenomenon, wherein the expression of otherwise abundant genes is either strongly positive, or undetectable within individual cells. This bi-modality is likely both biologically and technically driven. Irrespective of its source, we show that it should be modeled to draw accurate inferences from single cell expression experiments. To this end, we propose a semi-continuous modeling framework based on the generalized linear model, and use it to characterize genes with consistent cell cycle effects across three cell lines. Our new computational framework improves the detection of previously characterized cell-cycle genes compared to approaches that do not account for the bi-modality of single-cell data. We use our semi-continuous modelling framework to estimate single cell gene co-expression networks. These networks suggest that in addition to having phase-dependent shifts in expression (when averaged over many cells), some, but not all, canonical cell cycle genes tend to be co-expressed in groups in single cells. We estimate the amount of single cell expression variability attributable to the cell cycle. We find that the cell cycle explains only 5%-17% of expression variability, suggesting that the cell cycle will not tend to be a large nuisance factor in analysis of the single cell transcriptome.
[Show abstract][Hide abstract] ABSTRACT: The RV144 HIV-1 vaccine trial demonstrated partial efficacy of 31% against HIV-1 infection. Studies into possible correlates of protection found that antibodies specific to the V1/V2 region of envelope correlated inversely with infection risk and that viruses isolated from trial participants contained genetic signatures of vaccine-induced pressure in the V1/V2 region. We explored the hypothesis that the genetic signatures in V1/V2 could be partly attributed to selection by vaccine primed T cells. We performed a T-cell based sieve analysis of breakthrough viruses in the RV144 trial and found evidence of predicted HLA binding escape that was greater in vaccine versus placebo recipients. The predicted escape depended on class I HLA A*02 and A*11 restricted epitopes in the MN-strain rgp120 vaccine immunogen. Though we hypothesized that this was indicative of post-acquisition selection pressure, we also found that vaccine efficacy (VE) was greater in A*02(+) compared to A*02(-) participants (VE=54% vs. 3%, p=0.05). Vaccine efficacy against viruses with a lysine residue at site 169, important to antibody binding and implicated in vaccine-induced immune pressure, was also greater in A*02(+) participants (VE=74% vs. 15%, p=0.02). Additionally, a reanalysis of vaccine-induced immune responses focused on those that were shown to correlate with infection risk, suggested that the humoral response may have differed in A*02(+) participants. These exploratory and hypothesis-generating analyses indicate there may be an association between a class I HLA allele and vaccine efficacy, highlighting the importance of considering HLA alleles and host immune genetics in HIV vaccine trials.
The RV144 trial was the first to show efficacy against HIV-1 infection. Subsequently, much effort has been directed towards understanding the mechanisms of protection, including this T-cell based sieve analysis which compared the genetic sequences of viruses isolated from infected vaccine and placebo recipients. Though we hypothesized that the observed sieve effect indicated post-acquisition T-cell selection, we also found that vaccine efficacy was greater for participants who expressed HLA A*02, an allele implicated in the sieve analysis. Though HLA alleles have been associated with disease progression and viral load in HIV-1 infection, these data are the first to suggest the association of a class I HLA allele and vaccine efficacy. While these statistical analyses do not provide mechanistic evidence of protection in RV144, they generate testable hypotheses for the HIV vaccine community and they highlight the importance of assessing the impact of host immune genetics in vaccine-induced immunity and protection.
[Show abstract][Hide abstract] ABSTRACT: Semen contains relatively ill-defined regulatory com-ponents that likely aid fertilization, but which could also interfere with defense against infection. Each ejaculate contains trillions of exosomes, membrane-enclosed subcellular microvesicles, which have im-munosuppressive effects on cells important in the genital mucosa. Exosomes in general are believed to mediate inter-cellular communication, possibly by transferring small RNA molecules. We found that seminal exosome (SE) preparations contain a sub-stantial amount of RNA from 20 to 100 nucleotides (nts) in length. We sequenced 20–40 and 40–100 nt fractions of SE RNA separately from six semen donors. We found various classes of small non-coding RNA, including microRNA (21.7% of the RNA in the 20–40 nt fraction) as well as abundant Y RNAs and tRNAs present in both fractions. Specific RNAs were consistently present in all donors. For example, 10 (of ∼2600 known) microRNAs constituted over 40% of mature microRNA in SE. Additionally, tRNA fragments were strongly enriched for 5'-ends of 18– 19 or 30–34 nts in length; such tRNA fragments re-press translation. Thus, SE could potentially deliver regulatory signals to the recipient mucosa via trans-fer of small RNA molecules.
[Show abstract][Hide abstract] ABSTRACT: The phase III RV144 HIV-1 vaccine trial estimated vaccine efficacy (VE) to be 31.2%. This trial demonstrated that the presence of HIV-1-specific IgG-binding Abs to envelope (Env) V1V2 inversely correlated with infection risk, while the presence of Env-specific plasma IgA Abs directly correlated with risk of HIV-1 infection. Moreover, Ab-dependent cellular cytotoxicity responses inversely correlated with risk of infection in vaccine recipients with low IgA; therefore, we hypothesized that vaccine-induced Fc receptor-mediated (FcR-mediated) Ab function is indicative of vaccine protection. We sequenced exons and surrounding areas of FcR-encoding genes and found one FCGR2C tag SNP (rs114945036) that associated with VE against HIV-1 subtype CRF01_AE, with lysine at position 169 (169K) in the V2 loop (CRF01_AE 169K). Individuals carrying CC in this SNP had an estimated VE of 15%, while individuals carrying CT or TT exhibited a VE of 91%. Furthermore, the rs114945036 SNP was highly associated with 3 other FCGR2C SNPs (rs138747765, rs78603008, and rs373013207). Env-specific IgG and IgG3 Abs, IgG avidity, and neutralizing Abs inversely correlated with CRF01_AE 169K HIV-1 infection risk in the CT- or TT-carrying vaccine recipients only. These data suggest a potent role of Fc-gamma receptors and Fc-mediated Ab function in conferring protection from transmission risk in the RV144 VE trial.
[Show abstract][Hide abstract] ABSTRACT: Standardized assessments of HIV-1 vaccine-elicited neutralizing antibody responses are complicated by the genetic and antigenic variability of the viral envelope glycoproteins (Env). To address these issues, suitable reference strains are needed that are representative of the global epidemic. Several panels have been recommended previously but no clear answers have been available on how many and which strains are best suited for this purpose. We used a statistical model-selection method to identify a global panel of reference Env clones from among 219 Env-pseudotyped viruses assayed in TZM-bl cells with sera from 205 HIV-1-infected individuals. The Envs and sera were sampled globally from diverse geographic locations and represented all major genetic subtypes and circulating recombinant forms of the virus. Assays with a panel size of only nine viruses adequately represented the spectrum of HIV-1 serum neutralizing activity seen with the larger panel of 219 viruses. An optimal panel of nine viruses was selected and augmented with three additional viruses for greater genetic and antigenic coverage. The spectrum of HIV-1 serum neutralizing activity seen with the final twelve-virus panel closely approximated the activity seen with subtype-matched viruses. Moreover, the final panel was highly sensitive for detecting many of the known broadly neutralizing antibodies. For broader assay applications, all twelve Env clones were converted to infectious molecular clones using a proviral backbone encoding a Renilla luciferase reporter gene (Env.IMC.LucR viruses). This global panel should facilitate highly standardized assessments of vaccine-elicited neutralizing antibodies across multiple HIV-1 vaccine platforms in different parts of the world.
[Show abstract][Hide abstract] ABSTRACT: A major challenge for the development of a highly effective AIDS vaccine is the identification of mechanisms of protective immunity. To address this question, we used a nonhuman primate challenge model with simian immunodeficiency virus (SIV). We show that antibodies to the SIV envelope are necessary and sufficient to prevent infection. Moreover, sequencing of viruses from breakthrough infections revealed selective pressure against neutralization-sensitive viruses; we identified a two-amino-acid signature that alters antigenicity and confers neutralization resistance. A similar signature confers resistance of human immunodeficiency virus (HIV)-1 to neutralization by monoclonal antibodies against variable regions 1 and 2 (V1V2), suggesting that SIV and HIV share a fundamental mechanism of immune escape from vaccine-elicited or naturally elicited antibodies. These analyses provide insight into the limited efficacy seen in HIV vaccine trials.
[Show abstract][Hide abstract] ABSTRACT: Recently, mapping studies of expression quantitative loci (eQTL) (where gene expression levels are viewed as quantitative traits) have provided insight into the biology of gene regulation. Bayesian methods provide natural modelling frameworks for analyzing eQTL studies, where information shared across markers and/or genes can increase the power to detect eQTLs. Bayesian approaches tend to be computationally demanding and require specialized software. As a result, most eQTL studies employ univariate methods treating each gene independently, leading to sub-optimal results.
We present a powerful, computationally-optimized and free open source R package, iBMQ. Our package implements a joint hierarchical Bayesian model where all genes and SNPs are modeled concurrently. Model parameters are estimated using a Markov chain Monte Carlo algorithm. The free and widely used openMP parallel library speeds up computation. Using a mouse cardiac data set, we show that iBMQ improves the detection of large trans-eQTL hotspots compared to other state-of-the-art packages for eQTL analysis.
The R-package iBMQ is available from the Bioconductor web site at http://bioconductor.org and runs on Linux, Windows and MAC OS X. It is distributed under the Artistic Licence-2.0 terms.
firstname.lastname@example.org; email@example.com SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: In immunological studies, the characterization of small, functionally
distinct cell subsets from blood and tissue is crucial to decipher system level
biological changes. An increasing number of studies rely on assays that provide
single-cell measurements of multiple genes and proteins from bulk cell samples.
A common problem in the analysis of such data is to identify biomarkers (or
combinations of thereof) that are differentially expressed between two
biological conditions (e.g., before/after vaccination), where expression is
defined as the proportion of cells expressing the biomarker or combination in
the cell subset of interest.
Here, we present a Bayesian hierarchical framework based on a beta-binomial
mixture model for testing for differential biomarker expression using
single-cell assays. Our model allows inference to be subject specific, as is
typically required when accessing vaccine responses, while borrowing strength
across subjects through common prior distributions. We propose two approaches
for parameter estimation: an empirical-Bayes approach using an
Expectation-Maximization algorithm and a fully Bayesian one based on a Markov
chain Monte Carlo algorithm. We compare our method against frequentist
approaches for single-cell assays including Fisher's exact test, a likelihood
ratio test, and basic log-fold changes. Using several experimental assays
measuring proteins or genes at the single-cell level and simulated data, we
show that our method has higher sensitivity and specificity than alternative
methods. Additional simulations show that our framework is also robust to model
misspecification. Finally, we also demonstrate how our approach can be extended
to testing multivariate differential expression across multiple biomarker
combinations using a Dirichlet-multinomial model and illustrate this
multivariate approach using single-cell gene expression data and simulations.
[Show abstract][Hide abstract] ABSTRACT: MNase-Seq and ChIP-Seq have evolved as popular techniques to study chromatin and histone modification. Although many tools have been developed to identify enriched regions, software tools for nucleosome positioning are still limited. We introduce a flexible and powerful open-source R package, PING 2.0, for nucleosome positioning using MNase-Seq data or MNase- or sonicated- ChIP-Seq data combined with either single-end or paired-end sequencing. PING uses a model-based approach, which enables nucleosome predictions even in the presence of low read counts. We illustrate PING using two paired-end datasets from Saccharomyces cerevisiae and compare its performance to nucleR and ChIPseqR.
PING 2.0 is available from the Bioconductor website at http://bioconductor.org. It can run on Linux, Mac and Windows.
firstname.lastname@example.org SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: We present an integrated analytical method for analyzing peptide microarray antibody binding data, from normalization through subject-specific positivity calls and data integration and visualization. Current techniques for the normalization of such data sets do not account for non-specific binding activity. A novel normalization technique based on peptide sequence information quickly and effectively reduced systematic biases. We also employed a sliding mean window technique that borrows strength from peptides sharing similar sequences, resulting in reduced signal variability. A smoothed signal aided in the detection of weak antibody binding hotspots. A new principled FDR method of setting positivity thresholds struck a balance between sensitivity and specificity. In addition, we demonstrate the utility and importance of using baseline control measurements when making subject-specific positivity calls. Data sets from two human clinical trials of candidate HIV-1 vaccines were used to validate the effectiveness of our overall computational framework.
Journal of immunological methods 06/2013; · 2.35 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Highly multiplexed, single cell technologies reveal important heterogeneity within cell populations. Recently, technologies to simultaneously measure expression of 96 (or more) genes from a single cell have been developed for immunologic monitoring. Here, we report a rigorous, optimized, quantitative methodology for using this technology. Specifically: we describe a unique primer/probe qualification method necessary for quantitative results; we show that primers do not compete in highly multiplexed amplifications; we define the limit of detection for this assay as a single mRNA transcript; and, we show that the technical reproducibility of the system is very high. We illustrate two disparate applications of the platform: a "bulk" approach that measures expression patterns from 100 cells at a time in high throughput to define gene signatures, and a single-cell approach to define the coordinate expression patterns of multiple genes and reveal unique subsets of cells.
Journal of immunological methods 03/2013; · 2.35 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Background. The licensing of Zostavax has demonstrated that therapeutic vaccination can help control chronic viral infection. Unfortunately, HIV therapeutic vaccine trials have shown only marginal efficacy.Methods. Seventeen HIV-infected individuals with viral loads <50 copies/ml and CD4 T cell counts >350 cells/µl were randomized to the vaccine or placebo arm. Vaccine recipients received three intramuscular injections of HIV DNA (4 mg) coding for clade B Gag, Pol, Nef, and clade A, B, C Env, followed by a replication-deficient Ad5 boost (1010 PFU) encoding all DNA vaccine antigens, except Nef. Humoral, total T cell and CD8 cytotoxic T lymphocyte (CTL) responses were studied pre- and post-vaccination. Single copy viral loads and latently infected CD4 T cell frequencies were determined. VRC 101 is a double-blind trial registered with ClinicalTrials.gov (NCT00270465).Results. Vaccination was safe and well tolerated. Significantly stronger HIV-specific T cell responses against Gag, Pol, and Env, with increased polyfunctionality and a broadened epitope-specific CTL repertoire, were observed post-vaccination. No changes in single copy viral load or the frequency of latent infection were observed.Conclusions. Vaccination of individuals with existing HIV-specific immunity improved the magnitude, breadth and polyfunctionality of HIV-specific memory T cell responses, but did not impact markers of viral control.
The Journal of Infectious Diseases 03/2013; · 5.85 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Traditional methods for flow cytometry (FCM) data processing rely on subjective manual gating. Recently, several groups have developed computational methods for identifying cell populations in multidimensional FCM data. The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of these methods on two tasks: (i) mammalian cell population identification, to determine whether automated algorithms can reproduce expert manual gating and (ii) sample classification, to determine whether analysis pipelines can identify characteristics that correlate with external variables (such as clinical outcome). This analysis presents the results of the first FlowCAP challenges. Several methods performed well as compared to manual gating or external variables using statistical performance measures, which suggests that automated methods have reached a sufficient level of maturity and accuracy for reliable use in FCM data analysis.
[Show abstract][Hide abstract] ABSTRACT: The RV144 clinical trial of a prime/boost immunizing regimen using recombinant canary pox (ALVAC-HIV) and two gp120 proteins (AIDSVAX B and E) was previously shown to have a 31.2% efficacy rate. Plasma specimens from vaccine and placebo recipients were used in an extensive set of assays to identify correlates of HIV-1 infection risk. Of six primary variables that were studied, only one displayed a significant inverse correlation with risk of infection: the antibody (Ab) response to a fusion protein containing the V1 and V2 regions of gp120 (gp70-V1V2). This finding prompted a thorough examination of the results generated with the complete panel of 13 assays measuring various V2 Abs in the stored plasma used in the initial pilot studies and those used in the subsequent case-control study. The studies revealed that the ALVAC-HIV/AIDSVAX vaccine induced V2-specific Abs that cross-react with multiple HIV-1 subgroups and recognize both conformational and linear epitopes. The conformational epitope was present on gp70-V1V2, while the predominant linear V2 epitope mapped to residues 165-178, immediately N-terminal to the putative α4β7 binding motif in the mid-loop region of V2. Odds ratios (ORs) were calculated to compare the risk of infection with data from 12 V2 assays, and in 11 of these, the ORs were ≤1, reaching statistical significance for two of the variables: Ab responses to gp70-V1V2 and to overlapping V2 linear peptides. It remains to be determined whether anti-V2 Ab responses were directly responsible for the reduced infection rate in RV144 and whether anti-V2 Abs will prove to be important with other candidate HIV vaccines that show efficacy, however, the results support continued dissection of Ab responses to the V2 region which may illuminate mechanisms of protection from HIV-1 infection and may facilitate the development of an effective HIV-1 vaccine.
PLoS ONE 01/2013; 8(1):e53629. · 3.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Neutralizing and non-neutralizing antibodies to linear epitopes on HIV-1 envelope glycoproteins have potential to mediate antiviral effector functions that could be beneficial to vaccine-induced protection. Here, plasma IgG responses were assessed in three HIV-1 gp120 vaccine efficacy trials (RV144, Vax003, Vax004) and in HIV-1-infected individuals by using arrays of overlapping peptides spanning the entire consensus gp160 of all major genetic subtypes and circulating recombinant forms (CRFs) of the virus. In RV144, where 31.2% efficacy against HIV-1 infection was seen, dominant responses targeted the C1, V2, V3 and C5 regions of gp120. An analysis of RV144 case-control samples showed that IgG to V2 CRF01_AE significantly inversely correlated with infection risk (OR= 0.54, p=0.0042), as did the response to other V2 subtypes (OR=0.60-0.63, p=0.016-0.025). The response to V3 CRF01_AE also inversely correlated with infection risk but only in vaccine recipients who had lower levels of other antibodies, especially Env-specific plasma IgA (OR=0.49, p=0.007) and neutralizing antibodies (OR=0.5, p=0.008). Responses to C1 and C5 showed no significant correlation with infection risk. In Vax003 and Vax004, where no significant protection was seen, serum IgG responses targeted the same epitopes as in RV144 with the exception of an additional C1 reactivity in Vax003 and infrequent V2 reactivity in Vax004. In HIV-1 infected subjects, dominant responses targeted the V3 and C5 regions of gp120, as well as the immunodominant domain, heptad repeat 1 (HR-1) and membrane proximal external region (MPER) of gp41. These results highlight the presence of several dominant linear B cell epitopes on the HIV-1 envelope glycoproteins. They also generate the hypothesis that IgG to linear epitopes in the V2 and V3 regions of gp120 are part of a complex interplay of immune responses that contributed to protection in RV144.
PLoS ONE 01/2013; 8(9):e75665. · 3.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: MOTIVATION: Cell populations are never truly homogeneous; individual cells exist in biochemical states that define functional differences between them. New technology based on microfluidic arrays combined with multiplexed quantitative polymerase chain reactions (qPCR) now enables high-throughput single-cell gene expression measurement, allowing assessment of cellular heterogeneity. However very little analytic tools have been developed specifically for the statistical and analytical challenges of single-cell qPCR data. RESULTS: We present a statistical framework for the exploration, quality control, and analysis of single-cell gene expression data from microfluidic arrays. We assess accuracy and within-sample heterogeneity of single-cell expression and develop quality control criteria to filter unreliable cell measurements. We propose a statistical model accounting for the fact that genes at the single-cell level can be on (and a continuous expression measure is recorded) or dichotomously off (and the recorded expression is zero). Based on this model, we derive a combined likelihood-ratio test for differential expression that incorporates both the discrete and continuous components. Using an experiment that examines treatment-specific changes in expression, we show that this combined test is more powerful than either the continuous or dichotomous component in isolation, or a t-test on the zero-inflated data. While developed for measurements from a specific platform (Fluidigm), these tools are generalizable to other multi-parametric measures over large numbers of events. AVAILABILITY: All results presented here were obtained using the SingleCellAssay R package available on GitHub (http://github.com/RGLab/SingleCellAssay). CONTACT: email@example.comSupplementary Material: Supplementary data are available.
[Show abstract][Hide abstract] ABSTRACT: AIMS/HYPOTHESIS: The paucity of information on the epigenetic barriers that are blocking reprogramming protocols, and on what makes a beta cell unique, has hampered efforts to develop novel beta cell sources. Here, we aimed to identify enhancers in pancreatic islets, to understand their developmental ontologies, and to identify enhancers unique to islets to increase our understanding of islet-specific gene expression. METHODS: We combined H3K4me1-based nucleosome predictions with pancreatic and duodenal homeobox 1 (PDX1), neurogenic differentiation 1 (NEUROD1), v-Maf musculoaponeurotic fibrosarcoma oncogene family, protein A (MAFA) and forkhead box A2 (FOXA2) occupancy data to identify enhancers in mouse islets. RESULTS: We identified 22,223 putative enhancer loci in in vivo mouse islets. Our validation experiments suggest that nearly half of these loci are active in regulating islet gene expression, with the remaining regions probably poised for activity. We showed that these loci have at least nine developmental ontologies, and that islet enhancers predominately acquire H3K4me1 during differentiation. We next discriminated 1,799 enhancers unique to islets and showed that these islet-specific enhancers have reduced association with annotated genes, and identified a subset that are instead associated with novel islet-specific long non-coding RNAs (lncRNAs). CONCLUSIONS/INTERPRETATIONS: Our results indicate that genes with islet-specific expression and function tend to have enhancers devoid of histone methylation marks or, less often, that are bivalent or repressed, in embryonic stem cells and liver. Further, we identify a subset of enhancers unique to islets that are associated with novel islet-specific genes and lncRNAs. We anticipate that these data will facilitate the development of novel sources of functional beta cell mass.
[Show abstract][Hide abstract] ABSTRACT: With the development of novel assay technologies, biomedical experiments and analyses have gone through substantial evolution. Today, a typical experiment can simultaneously measure hundreds to thousands of individual features (e.g. genes) in dozens of biological conditions, resulting in gigabytes of data that need to be processed and analyzed. Because of the multiple steps involved in the data generation and analysis and the lack of details provided, it can be difficult for independent researchers to try to reproduce a published study. With the recent outrage following the halt of a cancer clinical trial due to the lack of reproducibility of the published study, researchers are now facing heavy pressure to ensure that their results are reproducible. Despite the global demand, too many published studies remain non-reproducible mainly due to the lack of availability of experimental protocol, data and/or computer code. Scientific discovery is an iterative process, where a published study generates new knowledge and data, resulting in new follow-up studies or clinical trials based on these results. As such, it is important for the results of a study to be quickly confirmed or discarded to avoid wasting time and money on novel projects. The availability of high-quality, reproducible data will also lead to more powerful analyses (or meta-analyses) where multiple data sets are combined to generate new knowledge. In this article, we review some of the recent developments regarding biomedical reproducibility and comparability and discuss some of the areas where the overall field could be improved.
Briefings in Bioinformatics 11/2012; · 5.30 Impact Factor