Empirical Bayes conditional independence graphs for regulatory network recovery.
ABSTRACT Computational inference methods that make use of graphical models to extract regulatory networks from gene expression data can have difficulty reconstructing dense regions of a network, a consequence of both computational complexity and unreliable parameter estimation when sample size is small. As a result, identification of hub genes is of special difficulty for these methods.
We present a new algorithm, Empirical Light Mutual Min (ELMM), for large network reconstruction that has properties well suited for recovery of graphs with high-degree nodes. ELMM reconstructs the undirected graph of a regulatory network using empirical Bayes conditional independence testing with a heuristic relaxation of independence constraints in dense areas of the graph. This relaxation allows only one gene of a pair with a putative relation to be aware of the network connection, an approach that is aimed at easing multiple testing problems associated with recovering densely connected structures.
Using in silico data, we show that ELMM has better performance than commonly used network inference algorithms including GeneNet, ARACNE, FOCI, GENIE3 and GLASSO. We also apply ELMM to reconstruct a network among 5492 genes expressed in human lung airway epithelium of healthy non-smokers, healthy smokers and individuals with chronic obstructive pulmonary disease assayed using microarrays. The analysis identifies dense sub-networks that are consistent with known regulatory relationships in the lung airway and also suggests novel hub regulatory relationships among a number of genes that play roles in oxidative stress and secretion.
Software for running ELMM is made available at http://mezeylab.cb.bscb.cornell.edu/Software.aspx.
firstname.lastname@example.org or email@example.com
Supplementary data are available at Bioinformatics online.
- SourceAvailable from: Supriya Karkra
Article: Next-generation DNA sequencing.[show abstract] [hide abstract]
ABSTRACT: DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high-throughput data collection. Over the past three years, massively parallel DNA sequencing platforms have become widely available, reducing the cost of DNA sequencing by over two orders of magnitude, and democratizing the field by putting the sequencing capacity of a major genome center in the hands of individual investigators. These new technologies are rapidly evolving, and near-term challenges include the development of robust protocols for generating sequencing libraries, building effective new approaches to data-analysis, and often a rethinking of experimental design. Next-generation DNA sequencing has the potential to dramatically accelerate biological and biomedical research, by enabling the comprehensive analysis of genomes, transcriptomes and interactomes to become inexpensive, routine and widespread, rather than requiring significant production-scale efforts.Nature Biotechnology 11/2008; 26(10):1135-45. · 32.44 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Aldo-keto reductase family 1 B10 (AKR1B10), a member of aldo-keto reductase superfamily, is overexpressed in human hepatocellular carcinoma, lung squamous cell carcinoma and lung adenocarcinoma. Our previous study had demonstrated that the ectopic expression of AKR1B10 in 293T cells promotes cell proliferation. To evaluate its potential as a target for cancer intervention, in the current study we knocked down AKR1B10 expression in HCT-8 cells derived from a colorectal carcinoma, using chemically synthesized small interfering RNA (siRNA). The siRNA 1, targeted to encoding region, downregulated AKR1B10 expression by more than 60%, and siRNA 2, targeted to 3' untranslational region, reduced AKR1B10 expression by more than 95%. AKR1B10 silencing resulted in approximately a 50% decrease in cell growth rate and nearly 40% suppression of DNA synthesis. More importantly, AKR1B10 downregulation significantly reduced focus formation rate and colony size in semisolid culture, indicating the critical role of AKR1B10 in HCT-8 cell proliferation. Recombinant AKR1B10 protein showed strong enzymatic activity to acrolein and crotonaldehyde, with K(m) = 110.1 +/- 12.2 microM and V(max) = 3,122.0 +/- 64.7 nmol/mg protein/min for acrolein and K(m) = 86.7 +/- 14.3 microM and V(max) = 2,647.5 +/- 132.2 nmol/mg protein/min for crotonaldehyde. AKR1B10 downregulation enhanced the susceptibility of HCT-8 cells to acrolein (25 microM) and crotonaldehyde (50 microM), resulting in rapid oncotic cell death characterized with lactate dehydrogenase efflux and annexin-V staining. These results suggest that AKR1B10 may regulate cell proliferation and cellular response to additional carbonyl stress, thus being a potential target for cancer intervention.International Journal of Cancer 12/2007; 121(10):2301-6. · 6.20 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L(1)-norm of the coefficients but encourages smoothness of the coefficients on the network. Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes.Bioinformatics 06/2008; 24(9):1175-82. · 5.47 Impact Factor