A New Method for the Discovery of Essential Proteins

Key Laboratory of High Confidence Software Technologies, Ministry of Education, Peking University, Beijing, China
PLoS ONE (Impact Factor: 3.23). 03/2013; 8(3):e58763. DOI: 10.1371/journal.pone.0058763
Source: PubMed


Experimental methods for the identification of essential proteins are always costly, time-consuming, and laborious. It is a challenging task to find protein essentiality only through experiments. With the development of high throughput technologies, a vast amount of protein-protein interactions are available, which enable the identification of essential proteins from the network level. Many computational methods for such task have been proposed based on the topological properties of protein-protein interaction (PPI) networks. However, the currently available PPI networks for each species are not complete, i.e. false negatives, and very noisy, i.e. high false positives, network topology-based centrality measures are often very sensitive to such noise. Therefore, exploring robust methods for identifying essential proteins would be of great value.
In this paper, a new essential protein discovery method, named CoEWC (Co-Expression Weighted by Clustering coefficient), has been proposed. CoEWC is based on the integration of the topological properties of PPI network and the co-expression of interacting proteins. The aim of CoEWC is to capture the common features of essential proteins in both date hubs and party hubs. The performance of CoEWC is validated based on the PPI network of Saccharomyces cerevisiae. Experimental results show that CoEWC significantly outperforms the classical centrality measures, and that it also outperforms PeC, a newly proposed essential protein discovery method which outperforms 15 other centrality measures on the PPI network of Saccharomyces cerevisiae. Especially, when predicting no more than 500 proteins, even more than 50% improvements are obtained by CoEWC over degree centrality (DC), a better centrality measure for identifying protein essentiality.
We demonstrate that more robust essential protein discovery method can be developed by integrating the topological properties of PPI network and the co-expression of interacting proteins. The proposed centrality measure, CoEWC, is effective for the discovery of essential proteins.

Download full-text


Available from: Xue Zhang, Dec 14, 2014
16 Reads
  • Source
    • "Consequently, several new methods have been proposed. For example, PeC considers both the edge clustering coefficient and gene expression data (Li et al., 2012), Co-Expression Weighted by Clustering coefficient (CoEWC) (Zhang et al., 2013) integrates clustering coefficient and gene expression data, and another integrated approach proposed by Luo and Ma (2013) combines edge clustering coefficient with complex centrality. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Predicting essential proteins is highly significant because organisms can not survive or develop even if only one of these proteins is missing. Improvements in high-throughput technologies have resulted in a large number of available protein-protein interactions. By taking advantage of these interaction data, researchers have proposed many computational methods to identify essential proteins at the network level. Most of these approaches focus on the topology of a static protein interaction network. However, the protein interaction network changes with time and condition. This important inherent dynamics of the protein interaction network is overlooked by previous methods. In this paper, we introduce a new method named CDLC to predict essential proteins by integrating dynamic local average connectivity and in-degree of proteins in complexes. CDLC is applied to the protein interaction network of Saccharomyces cerevisiae. The results show that CDLC outperforms five other methods (Degree Centrality (DC), Local Average Connectivity-based method (LAC), Sum of ECC (SoECC), PeC and Co-Expression Weighted by Clustering coefficient (CoEWC)). In particular, CDLC could improve the prediction precision by more than 45% compared with DC methods. CDLC is also compared with the latest algorithm CEPPK, and a higher precision is achieved by CDLC. CDLC is available as Supplementary materials. The default settings of active threshold and alpha-parameter are 0.8 and 0.1, respectively.
    Computational Biology and Chemistry 10/2014; 52. DOI:10.1016/j.compbiolchem.2014.08.022 · 1.12 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcription factors (TFs) and miRNAs are essential for the regulation of gene expression; however, the global view of human gene regulatory networks remains poorly understood. For example, how is the expression of so many genes regulated by limited cohorts of regulators and how are genes differentially expressed in different tissues despite the genetic code being the same in all tissues? We analyzed the network properties of housekeeping and tissue-specific genes in gene regulatory networks from seven human tissues. Our results show that different classes of genes behave quite differently in these networks. Tissue-specific miRNAs show a higher average target number compared with non-tissue specific miRNAs, which indicates that tissue-specific miRNAs tend to regulate different sets of targets. Tissue-specific TFs exhibit higher in-degree, out-degree, cluster coefficient and betweenness values, indicating that they occupy central positions in the regulatory network and that they transfer genetic information from upstream genes to downstream genes more quickly than other TFs. Housekeeping TFs tend to have higher cluster coefficients compared with other genes that are neither housekeeping nor tissue specific, indicating that housekeeping TFs tend to regulate their targets synergistically. Several topological properties of disease-associated miRNAs and genes were found to be significantly different from those of non-disease-associated miRNAs and genes. Tissue-specific miRNAs, TFs and disease genes have particular topological properties within the transcriptional regulatory networks of the seven human tissues examined. The tendency of tissue-specific miRNAs to regulate different sets of genes shows that a particular tissue-specific miRNA and its target gene set may form a regulatory module to execute particular functions in the process of tissue differentiation. The regulatory patterns of tissue-specific TFs reflect their vital role in regulatory networks and their importance to biological functions in their respective tissues. The topological differences between disease and non-disease genes may aid the discovery of new disease genes or drug targets. Determining the network properties of these regulatory factors will help define the basic principles of human gene regulation and the molecular mechanisms of disease.
    BMC Systems Biology 10/2013; 7(1):112. DOI:10.1186/1752-0509-7-112 · 2.44 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Network theory has been used for modeling biological data as well as social networks, transportation logistics, business transcripts, and many other types of data sets. Identifying important features/parts of these networks for a multitude of applications is becoming increasingly significant as the need for big data analysis techniques grows. When analyzing a network of protein-protein interactions (PPIs), identifying nodes of significant importance can direct the user toward biologically relevant network features. In this work, we propose that a node of structural importance in a network model can correspond to a biologically vital or significant property. This relationship between topological and biological importance can be seen in/between structurally defined nodes, such as hub nodes and driver nodes, within a network and within clusters. This work proposes data mining approaches for identification and examination of relationships between hub and driver nodes within human, yeast, rat, and mouse PPI networks. Relationships with other types of significant nodes, with direct neighbors, and with the rest of the network were analyzed to determine if the model can be characterized biologically by its structural makeup. We performed numerous tests on structure with a data-driven mentality, looking for properties that were potentially significant on a network level and then comparing those properties to biological significance. Our results showed that identifying and cross-referencing different types of topologically significant nodes can exemplify properties such as transcription factor enrichment, lethality, clustering, and Gene Ontology (GO) enrichment. Mining the biological networks, we discovered a key relationship between network properties and how sparse/dense a network is-a property we described as "sparseness". Overall, structurally important nodes were found to have significant biological relevance.
    Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops; 12/2013
Show more