Determinants, Discriminants, Conserved Residues - A Heuristic Approach to Detection of Functional Divergence in Protein Families

Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, Singapore.
PLoS ONE (Impact Factor: 3.53). 09/2011; 6(9):e24382. DOI: 10.1371/journal.pone.0024382
Source: PubMed

ABSTRACT In this work, belonging to the field of comparative analysis of protein sequences, we focus on detection of functional specialization on the residue level. As the input, we take a set of sequences divided into groups of orthologues, each group known to be responsible for a different function.
This provides two independent pieces of information: within group conservation and overlap in amino acid type across groups. We build our discussion around the set of scoring functions that keep the two separated and the source of the signal easy to trace back to its source.
We propose a heuristic description of functional divergence that includes residue type exchangeability, both in the conservation and in the overlap measure, and does not make any assumptions on the rate of evolution in the groups other than the one under consideration. Residue types acceptable at a certain position within an orthologous group are described as a distribution which evolves in time, starting from a single ancestral type, and is subject to constraints that can be inferred only indirectly. To estimate the strength of the constraints, we compare the observed degrees of conservation and overlap with those expected in the hypothetical case of a freely evolving distribution.
Our description matches the experiment well, but we also conclude that any attempt to capture the evolutionary behavior of specificity determining residues in terms of a scalar function will be tentative, because no single model can cover the variety of evolutionary behavior such residues exhibit. Especially, models expecting the same type of evolutionary behavior across functionally divergent groups tend to miss a portion of information otherwise retrievable by the conservation and overlap measures they use.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: When comparing sequences of similar proteins, two kinds of questions can be asked, and the related two kinds of inference made. First, one may ask to what degree they are similar, and then, how they differ. In the first case one may tentatively conclude that the conserved elements common to all sequences are of central and common importance to the protein's function. In the latter case the regions of specialization may be discriminative of the function or binding partners across subfamilies of related proteins. Experimental efforts - mutagenesis or pharmacological intervention - can then be pointed in either direction, depending on the context of the study. Cube simplifies this process for users that already have their favorite sets of sequences, and helps them collate the information by visualization of the conservation and specialization scores on the sequence and on the structure, and by spreadsheet tabulation. All information can be visualized on the spot, or downloaded for reference and later inspection.
    PLoS ONE 11/2013; 8(11):e79480. DOI:10.1371/journal.pone.0079480 · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The analysis of sequence conservation is commonly used to predict functionally important sites in proteins. We have developed an approach that first identi-fies highly conserved sites in a set of orthologous sequences using a weighted substitution-matrix-based conservation score and then filters these conserved sites based on the pattern of conservation present in a wider alignment of se-quences from the same family and structural information to identify surface exposed sites. This allows us to detect specific functional sites in the target protein and exclude regions that are likely to be generally important for the structure or function of the wider protein family. We applied our method to two members of the serpin family of serine protease inhibitors. We first confirmed that our method successfully detected the known heparin binding site in antithrombin while excluding residues known to be general important in the serpin family. We next applied our sequence analysis approach to neuroserpin and used our results to guide site-directed polyalanine mutagenesis experiments. The majority of the mutant neuroserpin proteins were found to fold correctly and could still form inhibitory complexes with tissue plasminogen activator (tPA). Kinetic analysis of tPA inhibition, however, revealed altered inhibitory kinetics in several of the mutant proteins, with some mutants showing decreased association with tPA and others showing more rapid dissociation of the covalent complex. Altogether, these results confirm that our sequence analysis approach is a useful tool that can be used to guide mutagenesis experiments for the detection of specific functional sites in proteins. © Proteins 2014;. © 2014 Wiley Periodicals, Inc.
    Proteins Structure Function and Bioinformatics 11/2014; 83(1). DOI:10.1002/prot.24711 · 3.34 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein families might evolve paralogous functions on their common tertiary scaffold in two ways. First, the locations of functionally-important sites might be "hard-wired" into the structure, with novel functions evolved by altering the amino acid (e.g. Ala vs Ser) at these positions. Alternatively, the tertiary scaffold might be adaptable, accommodating a unique set of functionally important sites for each paralogous function. To discriminate between these possibilities, we compared the set of functionally important sites in the six largest paralogous subfamilies of the LacI/GalR transcription repressor family. LacI/GalR paralogs share a common tertiary structure, but have low sequence identity (≤30%), and regulate a variety of metabolic processes. Functionally important positions were identified by conservation and co-evolutionary sequence analyses. Results showed that conserved positions use a mixture of the "hard-wired" and "accommodating" scaffold frameworks, but that the co-evolution networks were highly dissimilar between any pair of subfamilies. Therefore, the tertiary structure can accommodate multiple networks of functionally important positions. This possibility should be included when designing and interpreting sequence analyses of other protein families. Software implementing conservation and co-evolution analyses is available at
    PLoS ONE 12/2013; 8(12):e84398. DOI:10.1371/journal.pone.0084398 · 3.53 Impact Factor

Full-text (3 Sources)

Available from
Aug 18, 2014