Direct-coupling analysis of residue coevolution captures native contacts across many protein families.

Center for Theoretical Biological Physics, University of California at San Diego, La Jolla, CA 92093-0374, USA.
Proceedings of the National Academy of Sciences (Impact Factor: 9.81). 11/2011; 108(49):E1293-301. DOI: 10.1073/pnas.1111471108
Source: PubMed

ABSTRACT The similarity in the three-dimensional structures of homologous proteins imposes strong constraints on their sequence variability. It has long been suggested that the resulting correlations among amino acid compositions at different sequence positions can be exploited to infer spatial contacts within the tertiary protein structure. Crucial to this inference is the ability to disentangle direct and indirect correlations, as accomplished by the recently introduced direct-coupling analysis (DCA). Here we develop a computationally efficient implementation of DCA, which allows us to evaluate the accuracy of contact prediction by DCA for a large number of protein domains, based purely on sequence information. DCA is shown to yield a large number of correctly predicted contacts, recapitulating the global structure of the contact map for the majority of the protein domains examined. Furthermore, our analysis captures clear signals beyond intradomain residue contacts, arising, e.g., from alternative protein conformations, ligand-mediated residue couplings, and interdomain interactions in protein oligomers. Our findings suggest that contacts predicted by DCA can be used as a reliable guide to facilitate computational predictions of alternative protein conformations, protein complex formation, and even the de novo prediction of protein domain structures, contingent on the existence of a large number of homologous sequences which are being rapidly made available due to advances in genome sequencing.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The analysis of membrane proteins is essential in genomic and proteomic-wide investigations. With the number of available mem-brane protein sequences growing exponentially, research on sequ-ence motifs and the investigation of their biological role have become of great interest. Recently published works have shown, that motifs act as stabilizing 'building blocks' or they are involved in functional tasks. Such motifs are specific for a membrane protein family and thus their evolutionary way can be traced. The importance of short mem-brane sequence motifs has shown in many works and emphases the related sequence motif analysis. Together with specific transmem-brane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting sub-sequences and finally the investiga-tion of natural variants which cause diseases like e.g. nephrogenic diabetes insipidus. Results In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in-silico analyses pro-vide information about interacting sequence parts which are constrai-ned by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquapo-rin 2 -a protein whose mutants are involved in the rare indocrene disorder known as nephrogenic diabetes insipidus -and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. Contact:
    Hindawi Publishing Corporation. 11/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The large influx of biological sequences poses the importance of identifying and correlating conserved regions in homologous sequences to acquire valuable biological knowledge. These conserved regions contain statistically significant residue associations as sequence patterns. Thus, patterns from two conserved regions co-occurring frequently on the same sequences are inferred to have joint functionality. A method for finding conserved regions in protein families with frequent co-occurrence patterns is proposed. The biological significance of the discovered clusters of conserved regions with co-occurrences patterns can be validated by their three-dimensional closeness of amino acids and the biological functionality found in those regions as supported by published work. Using existing algorithms, we discovered statistically significant amino acid associations as sequence patterns. We then aligned and clustered them into Aligned Pattern Clusters (APCs) corresponding to conserved regions with amino acid conservation and variation. When one APC frequently co-occured with another APC, the two APCs have high co-occurrence. We then clustered APCs with high co-occurrence into what we refer to as Co-occurrence APC Clusters (Co-occurrence Clusters). Our results show that for Co-occurrence Clusters, the three-dimensional distance between their amino acids is closer than average amino acid distances. For the Co-occurrence Clusters of the ubiquitin and the cytochrome c families, we observed biological significance among the residing amino acids of the APCs within the same cluster. In ubiquitin, the residues are responsible for ubiquitination as well as conventional and unconventional ubiquitin-bindings. In cytochrome c, amino acids in the first co-occurrence cluster contribute to binding of other proteins in the electron transport chain, and amino acids in the second co-occurrence cluster contribute to the stability of the axial heme ligand. Thus, our co-occurrence clustering algorithm can efficiently find and rank conserved regions that contain patterns that frequently co-occurring on the same proteins. Co-occurring patterns are biologically significant due to their three-dimensional closeness and other evidences reported in literature. These results play an important role in drug discovery as biologists can quickly identify the target for drugs to conduct detailed preclinical studies.
    BMC Bioinformatics 11/2014; 15 Suppl 12:S2. · 2.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Response regulators are proteins that undergo transient phosphorylation, connecting specific signals to adaptive responses. Remarkably, the molecular mechanism of response regulator activation remains elusive, largely because of the scarcity of structural data on multidomain response regulators and histidine kinase/response regulator complexes. We now address this question by using a combination of crystallographic data and functional analyses in vitro and in vivo, studying DesR and its cognate sensor kinase DesK, a two-component system that controls membrane fluidity in Bacillus subtilis. We establish that phosphorylation of the receiver domain of DesR is allosterically coupled to two distinct exposed surfaces of the protein, controlling noncanonical dimerization/tetramerization, cooperative activation, and DesK binding. One of these surfaces is critical for both homodimerization- and kinase-triggered allosteric activations. Moreover, DesK induces a phosphorylation-independent activation of DesR in vivo, uncovering a novel and stringent level of specificity among kinases and regulators. Our results support a model that helps to explain how response regulators restrict phosphorylation by small-molecule phosphoryl donors, as well as cross talk with noncognate sensors. The ability to sense and respond to environmental variations is an essential property for cell survival. Two-component systems mediate key signaling pathways that allow bacteria to integrate extra- or intracellular signals. Here we focus on the DesK/DesR system, which acts as a molecular thermometer in B. subtilis, regulating the cell membrane's fluidity. Using a combination of complementary approaches, including determination of the crystal structures of active and inactive forms of the response regulator DesR, we unveil novel molecular mechanisms of DesR's activation switch. In particular, we show that the association of the cognate histidine kinase DesK triggers DesR activation beyond the transfer of the phosphoryl group. On the basis of sequence and structural analyses of other two-component systems, this activation mechanism appears to be used in a wide range of sensory systems, contributing a further level of specificity control among different signaling pathways. Copyright © 2014 Trajtenberg et al.
    mBio 12/2014; 5(6). · 6.88 Impact Factor

Full-text (2 Sources)

Available from
May 29, 2014