Improved prediction of critical residues for protein function based on network and phylogenetic analyses

University of California, San Francisco, San Francisco, California, United States
BMC Bioinformatics (Impact Factor: 2.58). 02/2005; 6(1):213. DOI: 10.1186/1471-2105-6-213
Source: PubMed


Phylogenetic approaches are commonly used to predict which amino acid residues are critical to the function of a given protein. However, such approaches display inherent limitations, such as the requirement for identification of multiple homologues of the protein under consideration. Therefore, complementary or alternative approaches for the prediction of critical residues would be desirable. Network analyses have been used in the modelling of many complex biological systems, but only very recently have they been used to predict critical residues from a protein's three-dimensional structure. Here we compare a couple of phylogenetic approaches to several different network-based methods for the prediction of critical residues, and show that a combination of one phylogenetic method and one network-based method is superior to other methods previously employed.
We associate a network with each member of a set of proteins for which the three-dimensional structure is known and the critical residues have been previously determined experimentally. We show that several network-based centrality measurements (connectivity, 2-connectivity, closeness centrality, betweenness and cluster coefficient) accurately detect residues critical for the protein's function. Phylogenetic approaches render predictions as reliable as the network-based measurements, although, interestingly, the two general approaches tend to predict different sets of critical residues. Hence we propose a hybrid method that is composed of one network-based calculation--the closeness centrality--and one phylogenetic approach--the Conseq server. This hybrid approach predicts critical residues more accurately than the other methods tested here.
We show that network analysis can be used to improve the prediction of amino acids critical for protein function, when utilized in combination with phylogenetic approaches. It is proposed that such improvement is due to the complementary nature of these approaches: network-based methods tend to predict as critical those residues that are highly connected and internal (i.e., non-surface), although some surface residues are indeed identified as critical by network analyses; whereas residues chosen by phylogenetic approaches display a lower overall probability of being surface inaccessible.

Download full-text


Available from: PubMed Central · License: CC BY
  • Source
    • "Network analysis facilitates the characterization of such complex system and its individual components [7, 8]. This provides novel insights into understanding the protein folding mechanism [9, 10], stability [11], function [9, 12, 13], and dynamics [14] and, more specifically, the study of protein structures. Viewing the protein structure as the an intricate network of interacting residues, metastructure analysis was proved to be an effective tool for large-scale (genome-wide) protein sequence analysis target selection for structural genomics and the identification of intrinsically unstructured (unfolded) proteins [15]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The main objective of this study is to explore the contribution of complex network together with its different definitions of vertexes and edges to describe the structure of proteins. Protein folds into a specific conformation for its function depending on interactions between residues. Consequently, in many studies, a protein structure was treated as a complex system comprised of individual components residues, and edges were interactions between residues. What is the proper time for representing a protein structure as a network? To confirm the effect of different definitions of vertexes and edges in constructing the amino acid interaction networks, protein domains and the structural unit of proteins were described using this method. The identification performance of 2847 proteins with domain/domains proved that the structure of proteins was described well when RCα was around 5.0–7.5 Å, and the optimal cutoff value for constructing the protein structure networks was 5.0 Å (Cα-Cα distances) while the ideal community division method was community structure detection based on edge betweenness in this study.
    Full-text · Article · Feb 2013 · Computational and Mathematical Methods in Medicine
  • Source
    • "Sequence conservation is probably the most powerful attribute for identification of functional residues, and some flavor of sequence conservation analysis is present in virtually all these hybrid methods. Thiebert et al. [30] build a graph of interacting residues and calculate the degree, degree-2, clustering coefficient and closeness for each node, and conclude that the best predictor relies on closeness centrality and phylogenetic analysis. Here, the degree of a node is the number of nodes directly connected to it, that is, nodes that are exactly one edge apart. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.
    Full-text · Article · Feb 2013 · BMC Bioinformatics
  • Source
    • "The small world network approach has been particularly successful in the prediction of binding site residues [27–29, 46–50], for example, using closeness centrality with solvent accessibility scores [27]; combining closeness centrality and phylogenetics [29]. Betweenness centrality, when averaged over a patch of residues, was found to improve the predictive power of a model for identifying residues involved in binding RNA [48]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Small world network concepts provide many new opportunities to investigate the complex three dimensional structures of protein molecules. This mini-review explores the published literature on using small-world network approaches to study protein structure, with emphasis on the different combinations of descriptors that have been tested, on studies involving ligand binding in protein-ligand complexes, and on protein-protein complexes. The benefits and success of small world network approaches, which change the focus from specific interactions to the local environment, even to non-local phenomenon, are described. The purpose is to show the different ways that small world network concepts have been used for building new computational models for studying protein structure and function, and for extending and improving existing modelling approaches.
    Full-text · Article · Feb 2013 · Computational and Structural Biotechnology Journal
Show more