| a |
| a |
| a |
| a |
20.75
Skills (1)
-
3 Questions319 Followers
Research experience
-
Jul 2007–
presentResearch: University of Southern California
University of Southern California · Department of Biological SciencesUSA · Los Angeles -
Jan 2006–
Jul 2007Research: University of Texas at Dallas
University of Texas at Dallas · Department of Computer ScienceUSA · Dallas -
Nov 2002–
Jun 2007Research: PhD
Nanyang Technological University · School of Computer EngineeringSingapore · Singapore
Publications (29) View all
-
Article: Discovery of multi-dimensional modules by integrative analysis of cancer genomic data.
[show abstract] [hide abstract]
ABSTRACT: Recent technology has made it possible to simultaneously perform multi-platform genomic profiling (e.g. DNA methylation (DM) and gene expression (GE)) of biological samples, resulting in so-called 'multi-dimensional genomic data'. Such data provide unique opportunities to study the coordination between regulatory mechanisms on multiple levels. However, integrative analysis of multi-dimensional genomics data for the discovery of combinatorial patterns is currently lacking. Here, we adopt a joint matrix factorization technique to address this challenge. This method projects multiple types of genomic data onto a common coordinate system, in which heterogeneous variables weighted highly in the same projected direction form a multi-dimensional module (md-module). Genomic variables in such modules are characterized by significant correlations and likely functional associations. We applied this method to the DM, GE, and microRNA expression data of 385 ovarian cancer samples from the The Cancer Genome Atlas project. These md-modules revealed perturbed pathways that would have been overlooked with only a single type of data, uncovered associations between different layers of cellular activities and allowed the identification of clinically distinct patient subgroups. Our study provides an useful protocol for uncovering hidden patterns and their biological implications in multi-dimensional 'omic' data.Nucleic Acids Research 08/2012; 40(19):9379-91. · 8.03 Impact Factor -
Article: Identifying multi-layer gene regulatory modules from multi-dimensional genomic data.
[show abstract] [hide abstract]
ABSTRACT: Eukaryotic gene expression (GE) is subjected to precisely coordinated multi-layer controls, across the levels of epigenetic, transcriptional and post-transcriptional regulations. Recently, the emerging multi-dimensional genomic dataset has provided unprecedented opportunities to study the cross-layer regulatory interplay. In these datasets, the same set of samples is profiled on several layers of genomic activities, e.g. copy number variation (CNV), DNA methylation (DM), GE and microRNA expression (ME). However, suitable analysis methods for such data are currently sparse. In this article, we introduced a sparse Multi-Block Partial Least Squares (sMBPLS) regression method to identify multi-dimensional regulatory modules from this new type of data. A multi-dimensional regulatory module contains sets of regulatory factors from different layers that are likely to jointly contribute to a local 'gene expression factory'. We demonstrated the performance of our method on the simulated data as well as on The Cancer Genomic Atlas Ovarian Cancer datasets including the CNV, DM, ME and GE data measured on 230 samples. We showed that majority of identified modules have significant functional and transcriptional enrichment, higher than that observed in modules identified using only a single type of genomic data. Our network analysis of the modules revealed that the CNV, DM and microRNA can have coupled impact on expression of important oncogenes and tumor suppressor genes. Availability and implementation: The source code implemented by MATLAB is freely available at: http://zhoulab.usc.edu/sMBPLS/. xjzhou@usc.edu Supplementary material are available at Bioinformatics online.Bioinformatics 08/2012; 28(19):2458-66. · 5.47 Impact Factor -
Article: Integrating many co-splicing networks to reconstruct splicing regulatory modules.
[show abstract] [hide abstract]
ABSTRACT: Alternative splicing is a ubiquitous gene regulatory mechanism that dramatically increases the complexity of the proteome. However, the mechanism for regulating alternative splicing is poorly understood, and study of coordinated splicing regulation has been limited to individual cases. To study genome-wide splicing regulation, we integrate many human RNA-seq datasets to identify splicing module, which we define as a set of cassette exons co-regulated by the same splicing factors. We have designed a tensor-based approach to identify co-splicing clusters that appear frequently across multiple conditions, thus very likely to represent splicing modules - a unit in the splicing regulatory network. In particular, we model each RNA-seq dataset as a co-splicing network, where the nodes represent exons and the edges are weighted by the correlations between exon inclusion rate profiles. We apply our tensor-based method to the 38 co-splicing networks derived from human RNA-seq datasets and indentify an atlas of frequent co-splicing clusters. We demonstrate that these identified clusters represent potential splicing modules by validating against four biological knowledge databases. The likelihood that a frequent co-splicing cluster is biologically meaningful increases with its recurrence across multiple datasets, highlighting the importance of the integrative approach. Co-splicing clusters reveal novel functional groups which cannot be identified by co-expression clusters, particularly they can grant new insights into functions associated with post-transcriptional regulation, and the same exons can dynamically participate in different pathways depending on different conditions and different other exons that are co-spliced. We propose that by identifying splicing module, a unit in the splicing regulatory network can serve as an important step to decipher the splicing code.BMC Systems Biology 07/2012; 6 Suppl 1:S17. · 3.15 Impact Factor -
Article: Algorithm to identify frequent coupled modules from two-layered network series: application to study transcription and splicing coupling.
[show abstract] [hide abstract]
ABSTRACT: Current network analysis methods all focus on one or multiple networks of the same type. However, cells are organized by multi-layer networks (e.g., transcriptional regulatory networks, splicing regulatory networks, protein-protein interaction networks), which interact and influence each other. Elucidating the coupling mechanisms among those different types of networks is essential in understanding the functions and mechanisms of cellular activities. In this article, we developed the first computational method for pattern mining across many two-layered graphs, with the two layers representing different types yet coupled biological networks. We formulated the problem of identifying frequent coupled clusters between the two layers of networks into a tensor-based computation problem, and proposed an efficient solution to solve the problem. We applied the method to 38 two-layered co-transcription and co-splicing networks, derived from 38 RNA-seq datasets. With the identified atlas of coupled transcription-splicing modules, we explored to what extent, for which cellular functions, and by what mechanisms transcription-splicing coupling takes place.Journal of computational biology: a journal of computational molecular cell biology 06/2012; 19(6):710-30. · 1.69 Impact Factor -
SourceAvailable from: Wenyuan Li
Conference Proceeding: Systematic reconstruction of splicing regulatory modules by integrating many RNA-seq datasets
[show abstract] [hide abstract]
ABSTRACT: Alternative splicing is a ubiquitous gene regulatory mechanism that dramatically increases the complexity of the proteome. In this paper we study splicing module, which we define as a set of cassette exons co-regulated by the same splicing factors. We have designed a tensor-based approach to identify co-splicing clusters that appear frequently across multiple conditions, thus very likely to represent splicing modules - a unit in the splicing regulatory network. In particular, we model each RNA-seq dataset as a co-splicing network, where the nodes represent exons and the edges are weighted by the correlations between exon inclusion rate profiles. We apply our tensor-based method to the 19 co-splicing networks derived from RNA-seq datasets and identify an atlas of frequent co-splicing clusters. We demonstrate that these identified clusters represent splicing modules by validating against four biological knowledge databases. The likelihood that a frequent co-splicing cluster is biologically meaningful increases with its recurrence across multiple datasets, highlighting the importance of the integrative approach. We also demonstrate that the co-splicing clusters reveal novel functional groups which cannot be identified by co-expression clusters, and that the same exons can dynamically participate in different pathways depending on different conditions and different other exons that are co-spliced.Systems Biology (ISB), 2011 IEEE International Conference on; 10/2011