Determinants, Discriminants, Conserved Residues - A Heuristic Approach to Detection of Functional Divergence in Protein Families

Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, Singapore.
PLoS ONE (Impact Factor: 3.23). 09/2011; 6(9):e24382. DOI: 10.1371/journal.pone.0024382
Source: PubMed


In this work, belonging to the field of comparative analysis of protein sequences, we focus on detection of functional specialization on the residue level. As the input, we take a set of sequences divided into groups of orthologues, each group known to be responsible for a different function.
This provides two independent pieces of information: within group conservation and overlap in amino acid type across groups. We build our discussion around the set of scoring functions that keep the two separated and the source of the signal easy to trace back to its source.
We propose a heuristic description of functional divergence that includes residue type exchangeability, both in the conservation and in the overlap measure, and does not make any assumptions on the rate of evolution in the groups other than the one under consideration. Residue types acceptable at a certain position within an orthologous group are described as a distribution which evolves in time, starting from a single ancestral type, and is subject to constraints that can be inferred only indirectly. To estimate the strength of the constraints, we compare the observed degrees of conservation and overlap with those expected in the hypothetical case of a freely evolving distribution.
Our description matches the experiment well, but we also conclude that any attempt to capture the evolutionary behavior of specificity determining residues in terms of a scalar function will be tentative, because no single model can cover the variety of evolutionary behavior such residues exhibit. Especially, models expecting the same type of evolutionary behavior across functionally divergent groups tend to miss a portion of information otherwise retrievable by the conservation and overlap measures they use.

Download full-text


Available from: Zong-Hong Zhang, Aug 18, 2014
  • Source
    • "Similarly, for each pair of paralogous groups, the overlap in the amino acid type choice is turned into a quantity in the same range. In addition, the scoring functions are sensitive to similarity between the amino acid types—both conservation and overlap are measured from their expected values given the distribution of the amino acid types at a position, the overall variability in the alignment, and the average propensity of residue types to mutate into each other [see Supplementatry Data and (11)]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cube-DB is a database of pre-evaluated results for detection of functional divergence in human/vertebrate protein families. The analysis is organized around the nomenclature associated with the human proteins, but based on all currently available vertebrate genomes. Using full genomes enables us, through a mutual-best-hit strategy, to construct comparable taxonomical samples for all paralogues under consideration. Functional specialization is scored on the residue level according to two models of behavior after divergence: heterotachy and homotachy. In the first case, the positions on the protein sequence are scored highly if they are conserved in the reference group of orthologs, and overlap poorly with the residue type choice in the paralogs groups (such positions will also be termed functional determinants). The second model additionally requires conservation within each group of paralogs (functional discriminants). The scoring functions are phylogeny independent, but sensitive to the residue type similarity. The results are presented as a table of per-residue scores, and mapped onto related structure (when available) via browser-embedded visualization tool. They can also be downloaded as a spreadsheet table, and sessions for two additional molecular visualization tools. The database interface is available at
    Full-text · Article · Dec 2011 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: DIVERGE is a software system for phylogeny-based analyses of protein family evolution and functional divergence. It provides a suite of statistical tools for selection and prioritization of the amino acid sites that are responsible for the functional divergence of a gene family. The synergistic efforts of DIVERGE and other methods have convincingly demonstrated that the pattern of rate change at a particular amino acid site may contain insightful information about the underlying functional divergence following gene duplication. These predicted sites may be used as candidates for further experiments. We are now releasing an updated version of DIVERGE with the following improvements: (i) a feasible approach to examining functional divergence in nearly complete sequences by including deletions and insertions (indels); (ii) the calculation of the false discovery rate (FDR) of functionally diverging sites; (iii) estimation of the effective number of functional divergence-related sites that is reliable and insensitive to cutoffs; (iv) a statistical test for asymmetric functional divergence; and (v) a new method to infer functional divergence specific to a given duplicate cluster. In addition, we have made efforts to improve software design and produce a well-written software manual for the general user.
    Full-text · Article · Apr 2013 · Molecular Biology and Evolution
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: When comparing sequences of similar proteins, two kinds of questions can be asked, and the related two kinds of inference made. First, one may ask to what degree they are similar, and then, how they differ. In the first case one may tentatively conclude that the conserved elements common to all sequences are of central and common importance to the protein's function. In the latter case the regions of specialization may be discriminative of the function or binding partners across subfamilies of related proteins. Experimental efforts - mutagenesis or pharmacological intervention - can then be pointed in either direction, depending on the context of the study. Cube simplifies this process for users that already have their favorite sets of sequences, and helps them collate the information by visualization of the conservation and specialization scores on the sequence and on the structure, and by spreadsheet tabulation. All information can be visualized on the spot, or downloaded for reference and later inspection.
    Full-text · Article · Nov 2013 · PLoS ONE
Show more