Zhi-Ping Liu

Shanghai Institutes for Biological Sciences, Shanghai, Shanghai Shi, China

Are you Zhi-Ping Liu?

Claim your profile

Publications (42)95.44 Total impact

  • Article: A Sequence-based Computational Approach to Predicting PDZ Domain-Peptide Interactions.
    [show abstract] [hide abstract]
    ABSTRACT: The PDZ domain is one of the most ubiquitous protein domains that are involved in coordinating signaling complex formation and protein networking by reversibly interacting with multiple binding partners. It has been linked to many devastating diseases such as avian influenza, Fraser syndrome, Usher syndrome and Dejerine-Sottas neuropathy. Understanding the selectivity of PDZ domains can help elucidate how defects in PDZ proteins and their binding partners lead to human diseases. Since experimental methods to determine the interaction specificity of the PDZ domains are expensive and labor intensive, an accurate computational method is tremendously needed. Our developed support vector machine-based predictor using dipeptide composition is shown to qualitatively predict PDZ domain-peptide interaction with a high accuracy rate. Furthermore, since most of the dipeptide compositions are redundant and irrelevant, we propose a new hybrid feature selection technique to select only a subset of these compositions for interaction prediction. The experimental results show that only approximately 25% of dipeptide features are needed and that our method improves the prediction results significantly. The selected dipeptide features are also analyzed and shown to play important roles in specificity patterns of PDZ domains. Our method is based only on primary sequence information, and it can be used for the research of drug target and drug design in identifying PDZ domain-ligand interactions. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications.
    Biochimica et Biophysica Acta 04/2013; · 4.66 Impact Factor
  • Article: De novo prediction of RNA-protein interactions from sequence information.
    [show abstract] [hide abstract]
    ABSTRACT: Protein-RNA interactions are fundamentally important in understanding cellular processes. In particular, non-coding RNA-protein interactions play an important role to facilitate biological functions in signalling, transcriptional regulation, and even the progression of complex diseases. However, experimental determination of protein-RNA interactions remains time-consuming and labour-intensive. Here, we develop a novel extended naïve-Bayes-classifier for de novo prediction of protein-RNA interactions, only using protein and RNA sequence information. Specifically, we first collect a set of known protein-RNA interactions as gold-standard positives and extract sequence-based features to represent each protein-RNA pair. To fill the gap between high dimensional features and scarcity of gold-standard positives, we select effective features by cutting a likelihood ratio score, which not only reduces the computational complexity but also allows transparent feature integration during prediction. An extended naïve Bayes classifier is then constructed using these effective features to train a protein-RNA interaction prediction model. Numerical experiments show that our method can achieve the prediction accuracy of 0.77 even though only a small number of protein-RNA interaction data are available. In particular, we demonstrate that the extended naïve-Bayes-classifier is superior to the naïve-Bayes-classifier by fully considering the dependences among features. Importantly, we conduct ncRNA pull-down experiments to validate the predicted novel protein-RNA interactions and identify the interacting proteins of sbRNA CeN72 in C. elegans, which further demonstrates the effectiveness of our method.
    Molecular BioSystems 11/2012; · 3.53 Impact Factor
  • Article: NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: Reconstruction of gene regulatory networks (GRNs) is of utmost interest to biologists and is vital for understanding the complex regulatory mechanisms within the cell. Despite various methods developed for reconstruction of GRNs from gene expression profiles, they are notorious for high false positive rate due to the noise inherited in the data, especially for the dataset with a large number of genes but a small number of samples. RESULTS: In this work, we present a novel method, namely NARROMI, to improve the accuracy of GRN inference by combining ordinary differential equation based recursive optimization (RO) and information theory based mutual information (MI). In the proposed algorithm, the noisy regulations with low pairwise correlations are first removed by utilizing MI, and the redundant regulations from indirect regulators are further excluded by RO to improve the accuracy of inferred GRNs. In particular, the RO step can help to determine regulatory directions without prior knowledge of regulators. The results on benchmark datasets from DREAM challenge and experimentally determined GRN of Escherichia coli show that NARROMI significantly outperforms other popular methods in terms of false positive rates and accuracy. AVAILABILITY: All the source data and code are available at: http://csb.shu.edu.cn/narromi.htm. CONTACT: lnchen@sibs.ac.cn; hao@info.univ-angers.fr; zhaoxingming@gmail.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 10/2012; · 5.47 Impact Factor
  • Article: An integrated approach to identify causal network modules of complex diseases with application to colorectal cancer.
    [show abstract] [hide abstract]
    ABSTRACT: BACKGROUND: Many methods have been developed to identify disease genes and further module biomarkers of complex diseases based on gene expression data. It is generally difficult to distinguish whether the variations in gene expression are causative or merely the effect of a disease. The limitation of relying on gene expression data alone highlights the need to develop new approaches that can explore various data to reflect the casual relationship between network modules and disease traits. METHODS: In this work, we developed a novel network-based approach to identify putative causal module biomarkers of complex diseases by integrating heterogeneous information, for example, epigenomic data, gene expression data, and protein-protein interaction network. We first formulated the identification of modules as a mathematical programming problem, which can be solved efficiently and effectively in an accurate manner. Then, we applied our approach to colorectal cancer (CRC) and identified several network modules that can serve as potential module biomarkers for characterizing CRC. Further validations using three additional gene expression datasets verified their candidate biomarker properties and the effectiveness of the method. Functional enrichment analysis also revealed that the identified modules are strongly related to hallmarks of cancer, and the enriched functions, such as inflammatory response, receptor and signaling pathways, are specific to CRC. RESULTS: Through constructing a transcription factor (TF)-module network, we found that aberrant DNA methylation of genes encoding TF considerably contributes to the activity change of some genes, which may function as causal genes of CRC, and that can also be exploited to develop efficient therapies or effective drugs. CONCLUSION: Our method can potentially be extended to the study of other complex diseases and the multiclassification problem.
    Journal of the American Medical Informatics Association 09/2012; · 3.61 Impact Factor
  • Article: A computational procedure for identifying master regulator candidates: a case study on diabetes progression in Goto-Kakizaki rats.
    [show abstract] [hide abstract]
    ABSTRACT: We have recently identified a number of active regulatory networks involved in diabetes progression in Goto-Kakizaki (GK) rats by network screening. The networks were quite consistent with the previous knowledge of the regulatory relationships between transcription factors (TFs) and their regulated genes. To study the underlying molecular mechanisms directly related to phenotype changes, such as diseases, we also previously developed a computational procedure for identifying transcriptional master regulators (MRs) in conjunction with network screening and network inference, by effectively perturbing the phenotype states. In this work, we further improved our previous method for identifying MR candidates, by listing them in a more reliable manner, and applied the method to reveal the MR candidates for diabetes progression in GK rats from the active networks. Specifically, the active TF-gene pairs for different time periods in GK rats were first extracted from the networks by network screening. Another set of active TF-gene pairs was selected by network inference, by considering the gene expression signatures for those periods between GK and Wistar-Kyoto (WKY) rats. The TF-gene pairs extracted by the two methods were then further selected, from the viewpoints of the emergence specificity of TF in GK rats and the regulated-gene coverage of TF in the expression signature. Finally, we narrowed all of the genes down to only 5 TFs (Etv4, Fus, Nr2f1, Sp2, and Tcfap2b) as the candidates of MRs, with 54 regulated genes, by merging the selected TF-gene pairs. The present method has successfully identified biologically plausible MR candidates, including the TFs related to diabetes in previous reports. Although the experimental verifications of the candidates and the present procedure are beyond the scope of this study, we narrowed down the candidates to 5 TFs, which can be used to perform the verification experiments relatively easily. The numerical results showed that our computational method is an efficient way to detect the key molecules responsible for biological phenomena.
    BMC Systems Biology 07/2012; 6 Suppl 1:S2. · 3.15 Impact Factor
  • Article: Proteome-wide prediction of protein-protein interactions from high-throughput data.
    Zhi-Ping Liu, Luonan Chen
    [show abstract] [hide abstract]
    ABSTRACT: In this paper, we present a brief review of the existing computational methods for predicting proteome-wide protein-protein interaction networks from high-throughput data. The availability of various types of omics data provides great opportunity and also unprecedented challenge to infer the interactome in cells. Reconstructing the interactome or interaction network is a crucial step for studying the functional relationship among proteins and the involved biological processes. The protein interaction network will provide valuable resources and alternatives to decipher the mechanisms of these functionally interacting elements as well as the running system of cellular operations. In this paper, we describe the main steps of predicting protein-protein interaction networks and categorize the available approaches to couple the physical and functional linkages. The future topics and the analyses beyond prediction are also discussed and concluded.
    Protein & Cell 06/2012; 3(7):508-20.
  • Article: Identifying dysregulated pathways in cancers from pathway interaction networks.
    [show abstract] [hide abstract]
    ABSTRACT: Cancers, a group of multifactorial complex diseases, are generally caused by mutation of multiple genes or dysregulation of pathways. Identifying biomarkers that can characterize cancers would help to understand and diagnose cancers. Traditional computational methods that detect genes differentially expressed between cancer and normal samples fail to work due to small sample size and independent assumption among genes. On the other hand, genes work in concert to perform their functions. Therefore, it is expected that dysregulated pathways will serve as better biomarkers compared with single genes. In this paper, we propose a novel approach to identify dysregulated pathways in cancer based on a pathway interaction network. Our contribution is three-fold. Firstly, we present a new method to construct pathway interaction network based on gene expression, protein-protein interactions and cellular pathways. Secondly, the identification of dysregulated pathways in cancer is treated as a feature selection problem, which is biologically reasonable and easy to interpret. Thirdly, the dysregulated pathways are identified as subnetworks from the pathway interaction networks, where the subnetworks characterize very well the functional dependency or crosstalk between pathways. The benchmarking results on several distinct cancer datasets demonstrate that our method can obtain more reliable and accurate results compared with existing state of the art methods. Further functional analysis and independent literature evidence also confirm that our identified potential pathogenic pathways are biologically reasonable, indicating the effectiveness of our method. Dysregulated pathways can serve as better biomarkers compared with single genes. In this work, by utilizing pathway interaction networks and gene expression data, we propose a novel approach that effectively identifies dysregulated pathways, which can not only be used as biomarkers to diagnose cancers but also serve as potential drug targets in the future.
    BMC Bioinformatics 06/2012; 13:126. · 2.75 Impact Factor
  • Article: Coexpression network analysis in chronic hepatitis B and C hepatic lesions reveals distinct patterns of disease progression to hepatocellular carcinoma.
    [show abstract] [hide abstract]
    ABSTRACT: Chronic infections with the hepatitis B virus (HBV) and hepatitis C virus (HCV) are the major risks of hepatocellular carcinoma (HCC), and great efforts have been made towards the understanding of the different mechanisms that link the viral infection of hepatic lesions to HCC development. In this work, we developed a novel framework to identify distinct patterns of gene coexpression networks and inflammation-related modules from genome-scale microarray data upon viral infection, and further classified them into oncogenic and dysfunctional ones. The core of our framework lies in the comparative study on viral infection modules across different disease stages and disease types--the module preservation during disease progression is evaluated according to the change of network connectivity in different stages, while the similarity and difference in HBV and HCV are evaluated by comparing the overlap of gene compositions and functional annotations in HBV and HCV modules. In particular, we revealed two types of driving modules related to infection for carcinogenesis in HBV and HCV, respectively, i.e. pro-apoptosis modules that are oncogenic in HBV, and anti-apoptosis and inflammation modules that are oncogenic in HCV, which are in concordance with the results of previous differential expression-based approaches. Moreover, we found that intracellular protein transmembrane transportation and the transmembrane receptor protein tyrosine kinase signaling pathway act as oncogenic factors in HBV-HCC. Our findings provide novel insights into viral hepatocarcinogenesis and disease progression, and also demonstrate the advantages of an integrative and comparative network analysis over the existing differential expression-based approach and virus-host interactome-based approach.
    Journal of Molecular Cell Biology 06/2012; 4(3):140-152. · 7.67 Impact Factor
  • Article: Coexpression network analysis in chronic hepatitis B and C hepatic lesions reveals distinct patterns of disease progression to hepatocellular carcinoma.
    [show abstract] [hide abstract]
    ABSTRACT: Chronic infections with the hepatitis B virus (HBV) and hepatitis C virus (HCV) are the major risks of hepatocellular carcinoma (HCC), and great efforts have been made towards the understanding of the different mechanisms that link the viral infection of hepatic lesions to HCC development. In this work, we developed a novel framework to identify distinct patterns of gene coexpression networks and inflammation-related modules from genome-scale microarray data upon viral infection, and further classified them into oncogenic and dysfunctional ones. The core of our framework lies in the comparative study on viral infection modules across different disease stages and disease types--the module preservation during disease progression is evaluated according to the change of network connectivity in different stages, while the similarity and difference in HBV and HCV are evaluated by comparing the overlap of gene compositions and functional annotations in HBV and HCV modules. In particular, we revealed two types of driving modules related to infection for carcinogenesis in HBV and HCV, respectively, i.e. pro-apoptosis modules that are oncogenic in HBV, and anti-apoptosis and inflammation modules that are oncogenic in HCV, which are in concordance with the results of previous differential expression-based approaches. Moreover, we found that intracellular protein transmembrane transportation and the transmembrane receptor protein tyrosine kinase signaling pathway act as oncogenic factors in HBV-HCC. Our findings provide novel insights into viral hepatocarcinogenesis and disease progression, and also demonstrate the advantages of an integrative and comparative network analysis over the existing differential expression-based approach and virus-host interactome-based approach.
    Journal of Molecular Cell Biology 03/2012; 4(3):140-52.
  • Article: Prediction of hot spots in protein interfaces using a random forest model with hybrid features.
    [show abstract] [hide abstract]
    ABSTRACT: Prediction of hot spots in protein interfaces provides crucial information for the research on protein-protein interaction and drug design. Existing machine learning methods generally judge whether a given residue is likely to be a hot spot by extracting features only from the target residue. However, hot spots usually form a small cluster of residues which are tightly packed together at the center of protein interface. With this in mind, we present a novel method to extract hybrid features which incorporate a wide range of information of the target residue and its spatially neighboring residues, i.e. the nearest contact residue in the other face (mirror-contact residue) and the nearest contact residue in the same face (intra-contact residue). We provide a novel random forest (RF) model to effectively integrate these hybrid features for predicting hot spots in protein interfaces. Our method can achieve accuracy (ACC) of 82.4% and Matthew's correlation coefficient (MCC) of 0.482 in Alanine Scanning Energetics Database, and ACC of 77.6% and MCC of 0.429 in Binding Interface Database. In a comparison study, performance of our RF model exceeds other existing methods, such as Robetta, FOLDEF, KFC, KFC2, MINERVA and HotPoint. Of our hybrid features, three physicochemical features of target residues (mass, polarizability and isoelectric point), the relative side-chain accessible surface area and the average depth index of mirror-contact residues are found to be the main discriminative features in hot spots prediction. We also confirm that hot spots tend to form large contact surface areas between two interacting proteins. Source data and code are available at: http://www.aporc.org/doc/wiki/HotSpot.
    Protein Engineering Design and Selection 03/2012; 25(3):119-26. · 2.94 Impact Factor
  • Article: Inferring Protein-Protein Interactions Based on Sequences and Interologs in Mycobacterium Tuberculosis
    BMC Bioinformatics 03/2012; · 2.75 Impact Factor
  • Article: Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information.
    Bioinformatics. 01/2012; 28:98-104.
  • Article: Comparative analysis of tumor markers and evaluation of their predictive value in patients with colorectal cancer.
    [show abstract] [hide abstract]
    ABSTRACT: We evaluated the feasibility of CEA/CK20 mRNA and CEA/CA19-9 proteins as tumor markers for colorectal cancer by detecting tumor-specific mRNAs in circulating tumor cells and secreted tumor-specific proteins in the peripheral blood of colorectal cancer patients. Peripheral blood was obtained from 23 healthy volunteers and 46 colorectal cancer patients on the day of initiation of adjuvant chemotherapy after surgery (stages I-III, n = 27) or on the first day of chemotherapy after diagnosis (stage IV, n = 19). Levels of CEA/CK20 mRNA in peripheral blood mononuclear cells (PBMCs) were determined with quantitative real-time reverse transcription polymerase chain reaction, and serum CEA/CA19-9 protein levels were determined by radioimmunoassay. The detection sensitivity of CK20 mRNA was approximately 1 tumor cell in 1 × 10(7) PBMCs, and that of CEA mRNA was approximately 1 tumor cell in 1 × 10(6) PBMCs. Patients with stage IV colorectal cancer had higher levels of CEA mRNA, CK20 mRNA, and serum CEA than patients at stages I-III. Peripheral blood CEA mRNA levels were predictive of overall survival, while serum protein levels of CEA and CA19-9 had no predictive value. Peripheral blood CEA mRNA is a useful marker of overall survival in colorectal cancer patients, that is sensitive and specific.
    Onkologie 01/2012; 35(3):108-13. · 0.87 Impact Factor
  • Source
    Article: Identifying responsive modules by mathematical programming: an application to budding yeast cell cycle.
    [show abstract] [hide abstract]
    ABSTRACT: High-throughput biological data offer an unprecedented opportunity to fully characterize biological processes. However, how to extract meaningful biological information from these datasets is a significant challenge. Recently, pathway-based analysis has gained much progress in identifying biomarkers for some phenotypes. Nevertheless, these so-called pathway-based methods are mainly individual-gene-based or molecule-complex-based analyses. In this paper, we developed a novel module-based method to reveal causal or dependent relations between network modules and biological phenotypes by integrating both gene expression data and protein-protein interaction network. Specifically, we first formulated the identification problem of the responsive modules underlying biological phenotypes as a mathematical programming model by exploiting phenotype difference, which can also be viewed as a multi-classification problem. Then, we applied it to study cell-cycle process of budding yeast from microarray data based on our biological experiments, and identified important phenotype- and transition-based responsive modules for different stages of cell-cycle process. The resulting responsive modules provide new insight into the regulation mechanisms of cell-cycle process from a network viewpoint. Moreover, the identification of transition modules provides a new way to study dynamical processes at a functional module level. In particular, we found that the dysfunction of a well-known module and two new modules may directly result in cell cycle arresting at S phase. In addition to our biological experiments, the identified responsive modules were also validated by two independent datasets on budding yeast cell cycle.
    PLoS ONE 01/2012; 7(7):e41854. · 4.09 Impact Factor
  • Source
    Article: Inferring a protein interaction map of Mycobacterium tuberculosis based on sequences and interologs.
    [show abstract] [hide abstract]
    ABSTRACT: Mycobacterium tuberculosis is an infectious bacterium posing serious threats to human health. Due to the difficulty in performing molecular biology experiments to detect protein interactions, reconstruction of a protein interaction map of M. tuberculosis by computational methods will provide crucial information to understand the biological processes in the pathogenic microorganism, as well as provide the framework upon which new therapeutic approaches can be developed. In this paper, we constructed an integrated M. tuberculosis protein interaction network by machine learning and ortholog-based methods. Firstly, we built a support vector machine (SVM) method to infer the protein interactions of M. tuberculosis H37Rv by gene sequence information. We tested our predictors in Escherichia coli and mapped the genetic codon features underlying its protein interactions to M. tuberculosis. Moreover, the documented interactions of 14 other species were mapped to the interactome of M. tuberculosis by the interolog method. The ensemble protein interactions were validated by various functional relationships, i.e., gene coexpression, evolutionary relationship and functional similarity, extracted from heterogeneous data sources. The accuracy and validation demonstrate the effectiveness and efficiency of our framework. A protein interaction map of M. tuberculosis is inferred from genetic codons and interologs. The prediction accuracy and numerically experimental validation demonstrate the effectiveness and efficiency of our method. Furthermore, our methods can be straightforwardly extended to infer the protein interactions of other bacterial species.
    BMC Bioinformatics 01/2012; 13 Suppl 7:S6. · 2.75 Impact Factor
  • Article: Identifying critical transitions and their leading biomolecular networks in complex diseases.
    [show abstract] [hide abstract]
    ABSTRACT: Identifying a critical transition and its leading biomolecular network during the initiation and progression of a complex disease is a challenging task, but holds the key to early diagnosis and further elucidation of the essential mechanisms of disease deterioration at the network level. In this study, we developed a novel computational method for identifying early-warning signals of the critical transition and its leading network during a disease progression, based on high-throughput data using a small number of samples. The leading network makes the first move from the normal state toward the disease state during a transition, and thus is causally related with disease-driving genes or networks. Specifically, we first define a state-transition-based local network entropy (SNE), and prove that SNE can serve as a general early-warning indicator of any imminent transitions, regardless of specific differences among systems. The effectiveness of this method was validated by functional analysis and experimental data.
    Scientific Reports 01/2012; 2:813.
  • Source
    Article: Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers.
    [show abstract] [hide abstract]
    ABSTRACT: Considerable evidence suggests that during the progression of complex diseases, the deteriorations are not necessarily smooth but are abrupt, and may cause a critical transition from one state to another at a tipping point. Here, we develop a model-free method to detect early-warning signals of such critical transitions, even with only a small number of samples. Specifically, we theoretically derive an index based on a dynamical network biomarker (DNB) that serves as a general early-warning signal indicating an imminent bifurcation or sudden deterioration before the critical transition occurs. Based on theoretical analyses, we show that predicting a sudden transition from small samples is achievable provided that there are a large number of measurements for each sample, e.g., high-throughput data. We employ microarray data of three diseases to demonstrate the effectiveness of our method. The relevance of DNBs with the diseases was also validated by related experimental data and functional analysis.
    Scientific Reports 01/2012; 2:342.
  • Source
    Article: Identifying disease genes and module biomarkers by differential interactions.
    [show abstract] [hide abstract]
    ABSTRACT: A complex disease is generally caused by the mutation of multiple genes or by the dysfunction of multiple biological processes. Systematic identification of causal disease genes and module biomarkers can provide insights into the mechanisms underlying complex diseases, and help develop efficient therapies or effective drugs. In this paper, we present a novel approach to predict disease genes and identify dysfunctional networks or modules, based on the analysis of differential interactions between disease and control samples, in contrast to the analysis of differential gene or protein expressions widely adopted in existing methods. As an example, we applied our method to the study of three-stage microarray data for gastric cancer. We identified network modules or module biomarkers that include a set of genes related to gastric cancer, implying the predictive power of our method. The results on holdout validation data sets show that our identified module can serve as an effective module biomarker for accurately detecting or diagnosing gastric cancer, thereby validating the efficiency of our method. We proposed a new approach to detect module biomarkers for diseases, and the results on gastric cancer demonstrated that the differential interactions are useful to detect dysfunctional modules in the molecular interaction network, which in turn can be used as robust module biomarkers.
    Journal of the American Medical Informatics Association 12/2011; 19(2):241-8. · 3.61 Impact Factor
  • Source
    Article: Identification of dysfunctional modules and disease genes in congenital heart disease by a network-based approach.
    Danning He, Zhi-Ping Liu, Luonan Chen
    [show abstract] [hide abstract]
    ABSTRACT: The incidence of congenital heart disease (CHD) is continuously increasing among infants born alive nowadays, making it one of the leading causes of infant morbidity worldwide. Various studies suggest that both genetic and environmental factors lead to CHD, and therefore identifying its candidate genes and disease-markers has been one of the central topics in CHD research. By using the high-throughput genomic data of CHD which are available recently, network-based methods provide powerful alternatives of systematic analysis of complex diseases and identification of dysfunctional modules and candidate disease genes. In this paper, by modeling the information flow from source disease genes to targets of differentially expressed genes via a context-specific protein-protein interaction network, we extracted dysfunctional modules which were then validated by various types of measurements and independent datasets. Network topology analysis of these modules revealed major and auxiliary pathways and cellular processes in CHD, demonstrating the biological usefulness of the identified modules. We also prioritized a list of candidate CHD genes from these modules using a guilt-by-association approach, which are well supported by various kinds of literature and experimental evidence. We provided a network-based analysis to detect dysfunctional modules and disease genes of CHD by modeling the information transmission from source disease genes to targets of differentially expressed genes. Our method resulted in 12 modules from the constructed CHD subnetwork. We further identified and prioritized candidate disease genes of CHD from these dysfunctional modules. In conclusion, module analysis not only revealed several important findings with regard to the underlying molecular mechanisms of CHD, but also suggested the distinct network properties of causal disease genes which lead to identification of candidate CHD genes.
    BMC Genomics 12/2011; 12:592. · 4.07 Impact Factor
  • Article: Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information.
    [show abstract] [hide abstract]
    ABSTRACT: Reconstruction of gene regulatory networks (GRNs), which explicitly represent the causality of developmental or regulatory process, is of utmost interest and has become a challenging computational problem for understanding the complex regulatory mechanisms in cellular systems. However, all existing methods of inferring GRNs from gene expression profiles have their strengths and weaknesses. In particular, many properties of GRNs, such as topology sparseness and non-linear dependence, are generally in regulation mechanism but seldom are taken into account simultaneously in one computational method. In this work, we present a novel method for inferring GRNs from gene expression data considering the non-linear dependence and topological structure of GRNs by employing path consistency algorithm (PCA) based on conditional mutual information (CMI). In this algorithm, the conditional dependence between a pair of genes is represented by the CMI between them. With the general hypothesis of Gaussian distribution underlying gene expression data, CMI between a pair of genes is computed by a concise formula involving the covariance matrices of the related gene expression profiles. The method is validated on the benchmark GRNs from the DREAM challenge and the widely used SOS DNA repair network in Escherichia coli. The cross-validation results confirmed the effectiveness of our method (PCA-CMI), which outperforms significantly other previous methods. Besides its high accuracy, our method is able to distinguish direct (or causal) interactions from indirect associations. All the source data and code are available at: http://csb.shu.edu.cn/subweb/grn.htm. lnchen@sibs.ac.cn; zpliu@sibs.ac.cn Supplementary data are available at Bioinformatics online.
    Bioinformatics 11/2011; 28(1):98-104. · 5.47 Impact Factor

Institutions

  • 2013
    • Shanghai Institutes for Biological Sciences
      Shanghai, Shanghai Shi, China
  • 2012
    • Harbin Medical University
      • Department of Oncology
      Harbin, Heilongjiang Sheng, China
    • Tianjin University of Science and Technology
      Tianjin, Tianjin Shi, China
  • 2007–2012
    • Chinese Academy of Sciences
      • • Key Laboratory of Synthetic Biology
      • • Academy of Mathematics and Systems Science
      Beijing, Beijing Shi, China
  • 2011
    • Shanghai University
      • Institute of Systems Biology
      Shanghai, Shanghai Shi, China
  • 2004–2006
    • Fudan University
      • Department of Forensic Medicine
      Shanghai, Shanghai Shi, China