Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs.

ETH Zurich, Department of Computer Science, Zürich, Switzerland.
PLoS Computational Biology (Impact Factor: 4.83). 05/2012; 8(5):e1002514. DOI: 10.1371/journal.pcbi.1002514
Source: PubMed

ABSTRACT The function of most proteins is not determined experimentally, but is extrapolated from homologs. According to the "ortholog conjecture", or standard model of phylogenomics, protein function changes rapidly after duplication, leading to paralogs with different functions, while orthologs retain the ancestral function. We report here that a comparison of experimentally supported functional annotations among homologs from 13 genomes mostly supports this model. We show that to analyze GO annotation effectively, several confounding factors need to be controlled: authorship bias, variation of GO term frequency among species, variation of background similarity among species pairs, and propagated annotation bias. After controlling for these biases, we observe that orthologs have generally more similar functional annotations than paralogs. This is especially strong for sub-cellular localization. We observe only a weak decrease in functional similarity with increasing sequence divergence. These findings hold over a large diversity of species; notably orthologs from model organisms such as E. coli, yeast or mouse have conserved function with human proteins.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mitogen Activated Protein Kinase (MAPK) signaling is of critical importance in plants and other eukaryotic organisms. The MAPK cascade plays an indispensible role in the growth and development of plants, as well as in biotic and abiotic stress responses. The MAPKs are constitute the most downstream module of the three tier MAPK cascade and are phosphorylated by upstream MAP kinase kinases (MAPKK), which are in turn are phosphorylated by MAP kinase kinase kinase (MAPKKK). The MAPKs play pivotal roles in regulation of many cytoplasmic and nuclear substrates, thus regulating several biological processes. A total of 589 MAPKs genes were identified from the genome wide analysis of 40 species. The sequence analysis has revealed the presence of several N- and C-terminal conserved domains. The MAPKs were previously believed to be characterized by the presence of TEY/TDY activation loop motifs. The present study showed that, in addition to presence of activation loop TEY/TDY motifs, MAPKs are also contain MEY, TEM, TQM, TRM, TVY, TSY, TEC and TQY activation loop motifs. Phylogenetic analysis of all predicted MAPKs were clustered into six different groups (group A, B, C, D, E and F), and all predicted MAPKs were assigned with specific names based on their orthology based evolutionary relationships with Arabidopsis or Oryza MAPKs. We conducted global analysis of the MAPK gene family of plants from lower eukaryotes to higher eukaryotes and analyzed their genomic and evolutionary aspects. Our study showed the presence of several new activation loop motifs and diverse conserved domains in MAPKs. Advance study of newly identified activation loop motifs can provide further information regarding the downstream signaling cascade activated in response to a wide array of stress conditions, as well as plant growth and development.
    BMC Genomics 01/2015; 16(1):58. DOI:10.1186/s12864-015-1244-7 · 4.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Altered expression profiles of microRNAs (miRNAs) are linked to many diseases including lung cancer. miRNA expression profiling is reproducible and miRNAs are very stable. These characteristics of miRNAs make them ideal biomarker candidates. This work is aimed to detect 2-and 3-miRNA groups, together with specific expression ranges of these miRNAs, to form simple linear discriminant rules for biomarker identification and biological interpretation. Our method is based on a novel committee of decision trees to derive 2-and 3-miRNA 100%-frequency rules. This method is applied to a data set of lung miRNA expression profiles of 61 squamous cell carcinoma (SCC) samples and 10 normal tissue samples. A distance separation technique is used to select the most reliable rules which are then evaluated on a large independent data set. We obtained four 2-miRNA and three 3-miRNA top-ranked rules. One important rule is that: If the expression level of miR-98 is above 7.356 and the expression level of miR-205 is below 9.601 (log2 quantile normalized MirVan miRNA Bioarray signals), then the sample is normal rather than cancerous with specificity and sensitivity both 100%. The classification performance of our best miRNA rules remarkably outperformed that by randomly selected miRNA rules. Our data analysis also showed that miR-98 and miR-205 have two common predicted target genes FZD3 and RPS6KA3, which are actually genes associated with carcinoma according to the Online Mendelian Inheritance in Man (OMIM) database. We also found that most of the chromosomal loci of these miRNAs have a high frequency of genomic alteration in lung cancer. On the independent data set (with balanced controls), the three miRNAs miR-126, miR-205 and miR-182 from our best rule can separate the two classes of samples at the accuracy of 84.49%, sensitivity of 91.40% and specificity of 77.14%. Our results indicate that rule discovery followed by distance separation is a powerful computational method to identify reliable miRNA biomarkers. The visualization of the rules and the clear separation between the normal and cancer samples by our rules will help biology experts for their analysis and biological interpretation.
    BMC Genomics 12/2014; 15(Suppl 9):S16. DOI:10.1186/1471-2164-15-S9-S16 · 4.04 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: There is a current interest in reducing the in vivo toxicity testing of nanomaterials in animals by increasing toxicity testing using in vitro cellular assays; however, toxicological results are seldom concordant between in vivo and in vitro models. This study compared global multi-walled carbon nanotube (MWCNT)-induced gene expression from human lung epithelial and microvascular endothelial cells in monoculture and coculture with gene expression from mouse lungs exposed to MWCNT. Using a cutoff of 10% false discovery rate and 1.5 fold change, we determined that there were more concordant genes (gene expression both up- or downregulated in vivo and in vitro) expressed in both cell types in coculture than in monoculture. When reduced to only those genes involved in inflammation and fibrosis, known outcomes of in vivo MWCNT exposure, there were more disease-related concordant genes expressed in coculture than monoculture. Additionally, different cellular signaling pathways are activated in response to MWCNT dependent upon culturing conditions. As coculture gene expression better correlated with in vivo gene expression, we suggest that cellular cocultures may offer enhanced in vitro models for nanoparticle risk assessment and the reduction of in vivo toxicological testing. Copyright © 2014. Published by Elsevier Ireland Ltd.
    Toxicology 12/2014; 328. DOI:10.1016/j.tox.2014.12.012 · 3.75 Impact Factor

Full-text (2 Sources)

Available from
Jun 5, 2014