Article

A nonparametric empirical Bayes approach to joint modeling of multiple sources of genomic data

University of Minnesota; University of California, Los Angeles; University of Texas Southwestern Medical Center
Statistica Sinica (impact factor: 1.02). 01/2008; 18:709-729. pp.709-729

ABSTRACT With the rapid accumulation of various high-throughput genomic and proteomic data, one is compelled to develop new statistical methods that can take advantage of existing multiple sources of data. In our motivating example, a chromatin-immunoprecipitation (ChIP) microarray experiment was conducted to detect binding target genes of a broad transcription regulator, leucine responsive regulatory protein (Lrp) in E. coli. In addition, a cDNA microarray dataset is avail-able to compare gene expression of the wild type with that of a mutant with the Lrp gene deleted in E. coli. It is biologically reasonable to assume that the genes with altered expression are more likely to be regulated by Lrp than those with no expres-sion change. Hence we aim to borrow information in the gene expression data to increase statistical power to detect the binding targets of Lrp. We propose a novel joint model for protein-DNA binding data and gene expression data; under mild modeling assumptions, it is shown that the method is optimal, equivalent to a joint likelihood ratio test. We compare the joint modeling with two existing methods of combining separate analyses. We adopt a nonparametric empirical Bayes (EB) method to draw statistical inference in the joint model; in particular, we propose a new method, maximum likelihood conditional on the binding data, to estimate two prior probabilities for the expression data, which are non-identifiable based on the expression data alone. We use simulated data to demonstrate the improved performance of the joint modeling over other approaches. Application to the Lrp data also shows better performance of the joint modeling than that of analyzing the binding data alone.

0 0
 · 
0 Bookmarks
 · 
37 Views

Full-text

View
0 Downloads
Available from

Keywords

binding data
 
broad transcription regulator
 
cDNA microarray dataset
 
draw statistical inference
 
expres-sion change
 
expression data
 
gene expression data
 
joint likelihood ratio test
 
joint model
 
leucine responsive regulatory protein
 
Lrp data
 
Lrp gene deleted
 
maximum likelihood conditional
 
motivating example
 
new method
 
novel joint model
 
protein-DNA binding data
 
proteomic data
 
rapid accumulation
 
wild type