Determinants, discriminants, conserved residues--a heuristic approach to detection of functional divergence in protein families.

Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, Singapore.
PLoS ONE (Impact Factor: 3.53). 01/2011; 6(9):e24382. DOI: 10.1371/journal.pone.0024382
Source: PubMed

ABSTRACT In this work, belonging to the field of comparative analysis of protein sequences, we focus on detection of functional specialization on the residue level. As the input, we take a set of sequences divided into groups of orthologues, each group known to be responsible for a different function. This provides two independent pieces of information: within group conservation and overlap in amino acid type across groups. We build our discussion around the set of scoring functions that keep the two separated and the source of the signal easy to trace back to its source.We propose a heuristic description of functional divergence that includes residue type exchangeability, both in the conservation and in the overlap measure, and does not make any assumptions on the rate of evolution in the groups other than the one under consideration. Residue types acceptable at a certain position within an orthologous group are described as a distribution which evolves in time, starting from a single ancestral type, and is subject to constraints that can be inferred only indirectly. To estimate the strength of the constraints, we compare the observed degrees of conservation and overlap with those expected in the hypothetical case of a freely evolving distribution.Our description matches the experiment well, but we also conclude that any attempt to capture the evolutionary behavior of specificity determining residues in terms of a scalar function will be tentative, because no single model can cover the variety of evolutionary behavior such residues exhibit. Especially, models expecting the same type of evolutionary behavior across functionally divergent groups tend to miss a portion of information otherwise retrievable by the conservation and overlap measures they use.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: DIVERGE is a software system for phylogeny-based analyses of protein family evolution and functional divergence. It provides a suite of statistical tools for selection and prioritization of the amino acid sites that are responsible for the functional divergence of a gene family. The synergistic efforts of DIVERGE and other methods have convincingly demonstrated that the pattern of rate change at a particular amino acid site may contain insightful information about the underlying functional divergence following gene duplication. These predicted sites may be used as candidates for further experiments. We are now releasing an updated version of DIVERGE with the following improvements: (i) a feasible approach to examining functional divergence in nearly complete sequences by including deletions and insertions (indels); (ii) the calculation of the false discovery rate (FDR) of functionally diverging sites; (iii) estimation of the effective number of functional divergence-related sites that is reliable and insensitive to cutoffs; (iv) a statistical test for asymmetric functional divergence; and (v) a new method to infer functional divergence specific to a given duplicate cluster. In addition, we have made efforts to improve software design and produce a well-written software manual for the general user.
    Molecular Biology and Evolution 04/2013; · 10.35 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: When comparing sequences of similar proteins, two kinds of questions can be asked, and the related two kinds of inference made. First, one may ask to what degree they are similar, and then, how they differ. In the first case one may tentatively conclude that the conserved elements common to all sequences are of central and common importance to the protein's function. In the latter case the regions of specialization may be discriminative of the function or binding partners across subfamilies of related proteins. Experimental efforts - mutagenesis or pharmacological intervention - can then be pointed in either direction, depending on the context of the study. Cube simplifies this process for users that already have their favorite sets of sequences, and helps them collate the information by visualization of the conservation and specialization scores on the sequence and on the structure, and by spreadsheet tabulation. All information can be visualized on the spot, or downloaded for reference and later inspection.
    PLoS ONE 01/2013; 8(11):e79480. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein families might evolve paralogous functions on their common tertiary scaffold in two ways. First, the locations of functionally-important sites might be "hard-wired" into the structure, with novel functions evolved by altering the amino acid (e.g. Ala vs Ser) at these positions. Alternatively, the tertiary scaffold might be adaptable, accommodating a unique set of functionally important sites for each paralogous function. To discriminate between these possibilities, we compared the set of functionally important sites in the six largest paralogous subfamilies of the LacI/GalR transcription repressor family. LacI/GalR paralogs share a common tertiary structure, but have low sequence identity (≤30%), and regulate a variety of metabolic processes. Functionally important positions were identified by conservation and co-evolutionary sequence analyses. Results showed that conserved positions use a mixture of the "hard-wired" and "accommodating" scaffold frameworks, but that the co-evolution networks were highly dissimilar between any pair of subfamilies. Therefore, the tertiary structure can accommodate multiple networks of functionally important positions. This possibility should be included when designing and interpreting sequence analyses of other protein families. Software implementing conservation and co-evolution analyses is available at
    PLoS ONE 01/2013; 8(12):e84398. · 3.53 Impact Factor

Full-text (3 Sources)

1 Download
Available from
Aug 18, 2014