Publications (35) View all
-
Article: Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size.
Danni Yu, Wolfgang Huber, Olga Vitek[show abstract] [hide abstract]
ABSTRACT: MOTIVATION: RNA-Seq experiments produce digital counts of reads that are affected by both biological and technical variation. To distinguish the systematic changes in expression between conditions from noise, the counts are frequently modeled by the Negative Binomial distribution. However in experiments with small sample size the per-gene estimates of the dispersion parameter are unreliable.Method: We propose a simple and effective approach for estimating the dispersions. First, we obtain the initial estimates for each gene using the method of moments. Second, the estimates are regularized, i.e. shrunk towards a common value that minimizes the average squared difference between the initial estimates and the shrinkage estimates. The approach does not require extra modeling assumptions, is easy to compute, and is compatible with the exact test of differential expression. RESULTS: We evaluated the proposed approach using ten simulated and experimental datasets, and compared its performance with that of currently popular packages edgeR, DESeq, baySeq, BBSeq and SAMseq. For these datasets sSeq performed favorably for experiments with small sample size in sensitivity, specificity and computational time. AVAILABILITY: The open source R-based software package sSeq is available at http://www.stat.purdue.edu/∼dyu/sSeq. CONTACT: Olga Vitek (ovitek@purdue.edu).Bioinformatics 04/2013; · 5.47 Impact Factor -
SourceAvailable from: John M. C. Danku
Dataset: Bioinformatics 2011 Yu et al Suppl Materials-Noise reduction in genome-wide perturbation screens using linear mixed-effect models
-
Article: High-resolution genome-wide scan of genes, gene-networks and cellular systems impacting the yeast ionome.
Danni Yu, John M Danku, Ivan Baxter, Sungjin Kim, Olena K Vatamaniuk, Olga Vitek, Mourad Ouzzani, David E Salt[show abstract] [hide abstract]
ABSTRACT: BACKGROUND: To balance the demand for uptake of essential elements with their potential toxicity living cells have complex regulatory mechanisms. Here, we describe a genome-wide screen to identify genes that impact the elemental composition ('ionome') of yeast Saccharomyces cerevisiae. Using inductively coupled plasma -- mass spectrometry (ICP-MS) we quantify Ca, Cd, Co, Cu, Fe, K, Mg, Mn, Mo, Na, Ni, P, S and Zn in 11890 mutant strains, including 4940 haploid and 1127 diploid deletion strains, and 5798 over expression strains. RESULTS: We identified 1065 strains with an altered ionome, including 584 haploid and 35 diploid deletion strains, and 446 over expression strains. Disruption of protein metabolism or trafficking has the highest likelihood of causing large ionomic changes, with gene dosage also being important. Gene over expression produced more extreme ionomic changes, but over expression and loss of function phenotypes are generally not related. Ionomic clustering revealed the existence of only a small number of possible ionomic profiles suggesting fitness tradeoffs that constrain the ionome. Clustering also identified important roles for the mitochondria, vacuole and ESCRT pathway in regulation of the ionome. Network analysis identified hub genes such as PMR1 in Mn homeostasis, novel members of ionomic networks such as SMF3 in vacuolar retrieval of Mn, and cross-talk between the mitochondria and the vacuole. All yeast ionomic data can be searched and downloaded at www.ionomicshub.org. CONCLUSIONS: Here, we demonstrate the power of high-throughput ICP-MS analysis to functionally dissect the ionome on a genome-wide scale. The information this reveals has the potential to benefit both human health and agriculture.BMC Genomics 11/2012; 13(1):623. · 4.07 Impact Factor -
Article: "Add to subtract": a simple method to remove complex background signals from the 1H nuclear magnetic resonance spectra of mixtures.
[show abstract] [hide abstract]
ABSTRACT: Because of its highly reproducible and quantitative nature and minimal requirements for sample preparation or separation, (1)H nuclear magnetic resonance (NMR) spectroscopy is widely used for profiling small-molecule metabolites in biofluids. However (1)H NMR spectra contain many overlapped peaks. In particular, blood serum/plasma and diabetic urine samples contain high concentrations of glucose, which produce strong peaks between 3.2 ppm and 4.0 ppm. Signals from most metabolites in this region are overwhelmed by the glucose background signals and become invisible. We propose a simple "Add to Subtract" background subtraction method and show that it can reduce the glucose signals by 98% to allow retrieval of the hidden information. This procedure includes adding a small drop of concentrated glucose solution to the sample in the NMR tube, mixing, waiting for an equilibration time, and acquisition of a second spectrum. The glucose-free spectra are then generated by spectral subtraction using Bruker Topspin software. Subsequent multivariate statistical analysis can then be used to identify biomarker candidate signals for distinguishing different types of biological samples. The principle of this approach is generally applicable for all quantitative spectral data and should find utility in a variety of NMR-based mixture analyses as well as in metabolite profiling.Analytical Chemistry 01/2012; 84(2):994-1002. · 5.86 Impact Factor -
Article: A statistical model-building perspective to identification of MS/MS spectra with PeptideProphet.
Kelvin Ma, Olga Vitek, Alexey I Nesvizhskii[show abstract] [hide abstract]
ABSTRACT: PeptideProphet is a post-processing algorithm designed to evaluate the confidence in identifications of MS/MS spectra returned by a database search. In this manuscript we describe the "what and how" of PeptideProphet in a manner aimed at statisticians and life scientists who would like to gain a more in-depth understanding of the underlying statistical modeling. The theory and rationale behind the mixture-modeling approach taken by PeptideProphet is discussed from a statistical model-building perspective followed by a description of how a model can be used to express confidence in the identification of individual peptides or sets of peptides. We also demonstrate how to evaluate the quality of model fit and select an appropriate model from several available alternatives. We illustrate the use of PeptideProphet in association with the Trans-Proteomic Pipeline, a free suite of software used for protein identification.BMC Bioinformatics 01/2012; 13 Suppl 16:S1. · 2.75 Impact Factor