The SVA package for removing batch effects and other unwanted variation in high-throughput experiments
ABSTRACT Heterogeneity and latent variables are now widely recognized as major sources of bias and variability in high-throughput
experiments. The most well-known source of latent variation in genomic experiments are batch effects—when samples are processed
on different days, in different groups or by different people. However, there are also a large number of other variables that
may have a major impact on high-throughput measurements. Here we describe the sva package for identifying, estimating and removing unwanted sources of variation in high-throughput experiments. The sva package supports surrogate variable estimation with the sva function, direct adjustment for known batch effects with the ComBat function and adjustment for batch and latent variables in prediction problems with the fsva function.
Availability: The R package sva is freely available from http://www.bioconductor.org.
Supplementary information: Supplementary data are available at Bioinformatics online.
Full-textDOI: · Available from: William Evan Johnson, Jan 06, 2014
- SourceAvailable from: Sadegh Jamalkandi Azimzadeh
[Show abstract] [Hide abstract]
- "Probe level analysis was performed using the " affy " package of the Bioconductor project . After Robust Multi-array Average normalization , batch effect removal was performed using the Surrogate Variable Analysis (SVA) package . The lipid-related probe sets were selected based on the Biological Processes of the Gene Ontology annotation . "
ABSTRACT: Chronic obstructive pulmonary disease (COPD) is a heterogeneous and progressive inflammatory condition that has been linked to the dysregulation of many metabolic pathways including lipid biosynthesis. How lipid metabolism could affect disease progression in smokers with COPD remains unclear. We cross-examined the transcriptomics, proteomics, metabolomics, and phenomics data available on the public domain to elucidate the mechanisms by which lipid metabolism is perturbed in COPD. We reconstructed a sputum lipid COPD (SpLiCO) signaling network utilizing active/inactive, and functional/dysfunctional lipid-mediated signaling pathways to explore how lipid-metabolism could promote COPD pathogenesis in smokers. SpLiCO was further utilized to investigate signal amplifiers, distributers, propagators, feed-forward and/or -back loops that link COPD disease severity and hypoxia to disruption in the metabolism of sphingolipids, fatty acids and energy. Also, hypergraph analysis and calculations for dependency of molecules identified several important nodes in the network with modular regulatory and signal distribution activities. Our systems-based analyses indicate that arachidonic acid is a critical and early signal distributer that is upregulated by the sphingolipid signaling pathway in COPD, while hypoxia plays a critical role in the elevated dependency to glucose as a major energy source. Integration of SpLiCo and clinical data shows a strong association between hypoxia and the upregulation of sphingolipids in smokers with emphysema, vascular disease, hypertension and those with increased risk of lung cancer. DOI: 10.1016/j.bbalip.2015.07.005Biochimica et Biophysica Acta (BBA) - Molecular and Cell Biology of Lipids 08/2015; 1851(10):1383-1393. DOI:10.1016/j.bbalip.2015.07.005 · 4.50 Impact Factor
[Show abstract] [Hide abstract]
- "QCs) to fit a smoothed model for the intensity levels of certain features, and then to correct all the biological samples accordingly (Dunn et al., 2011). The R package sva includes the ComBat function, which compensates the batch effects on microarray data using an empirical Bayes approach (Johnson et al., 2007; Leek et al., 2012). This method has been applied to normalize gene expression and methylation data (Chen et al., 2013; Leitch et al., 2013). "
ABSTRACT: Liquid Chromatography coupled to mass Spectrometry (LC/MS) has become widely used in Metabolomics. Several artefacts have been identified during the acquisition step in large LC/MS metabolomics experiments, including ion suppression, carryover or changes in the sensitivity and intensity. Several sources have been pointed out as responsible for these effects. In this context, the drift effects of the peak intensity is one of the most frequent and may even constitute the main source of variance in the data, resulting in misleading statistical results when the samples are analysed. In this paper, we propose the introduction of a methodology based on a common variance analysis prior to the data normalisation to address this issue. This methodology was tested and compared with four other methods by calculating the Dunn and Silhouette indices of the Quality Control classes. The results showed that our proposed methodology performed better than any of the other four methods. As far as we know, this is the first time that this kind of approach has been applied in the metabolomics context. Availability: The source code of the methods is available as the R package intCor at: http://b2slab.upc.edu/software-and-downloads/intensity-drift-correction/Bioinformatics 07/2014; 30(20). DOI:10.1093/bioinformatics/btu423 · 4.62 Impact Factor
[Show abstract] [Hide abstract]
- "Results with pSVA batch correction were compared to implementations of SVA in the SVA package (Leek et al., 2012). It was also compared to this package's implementation of ComBat, which also fits the model in eq (1) from estimates of both P and Γ with an empirical Bayes procedure as depicted in Figure 1 (Johnson et al., 2007). "
ABSTRACT: Sample source, procurement process, and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intra-group biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes from batch corrected genomics data is challenging using standard algorithms designed to remove batch effects for class comparison analyses. Nor can batch effects be corrected reliably in future applications of genomics-based clinical tests, in which the biological groups are by definition unknown a priori.Bioinformatics 06/2014; 30(19). DOI:10.1093/bioinformatics/btu375 · 4.62 Impact Factor