Jensen, L. J. et al. eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res. 36, D250-D254

European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
Nucleic Acids Research (Impact Factor: 9.11). 02/2008; 36(Database issue):D250-4. DOI: 10.1093/nar/gkm796
Source: PubMed


The identification of orthologous genes forms the basis for most comparative genomics studies. Existing approaches either lack functional annotation of the identified orthologous groups, hampering the interpretation of subsequent results, or are manually annotated and thus lag behind the rapid sequencing of new genomes. Here we present the eggNOG database ('evolutionary genealogy of genes: Non-supervised Orthologous Groups'), which contains orthologous groups constructed from Smith-Waterman alignments through identification of reciprocal best matches and triangular linkage clustering. Applying this procedure to 312 bacterial, 26 archaeal and 35 eukaryotic genomes yielded 43 582 course-grained orthologous groups of which 9724 are extended versions of those from the original COG/KOG database. We also constructed more fine-grained groups for selected subsets of organisms, such as the 19 914 mammalian orthologous groups. We automatically annotated our non-supervised orthologous groups with functional descriptions, which were derived by identifying common denominators for the genes based on their individual textual descriptions, annotated functional categories, and predicted protein domains. The orthologous groups in eggNOG contain 1 241 751 genes and provide at least a broad functional description for 77% of them. Users can query the resource for individual genes via a web interface or download the complete set of orthologous groups at

Download full-text


Available from: Jean Muller
  • Source
    • "The Clusters of Orthologous Groups (COGs) database is a pioneering study of graph-based methods and is still one of the most popular ortholog databases, although it is no longer updated [5,12]. The eggNOG database was later constructed by extending COGs incrementally using a computational method [13]. Another approach for creating ortholog groups is based on the phylogenetic tree of genes and is called a tree-based method. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Identification of ortholog groups is a crucial step in comparative analysis of multiple genomes. Although several computational methods have been developed to create ortholog groups, most of those methods do not evaluate orthology at the sub-gene level. In our method for domain-level ortholog clustering, DomClust, proteins are split into domains on the basis of alignment boundaries identified by all-against-all pairwise comparison, but it often fails to determine appropriate boundaries. Results We developed a method to improve domain-level ortholog classification using multiple alignment information. This method is based on a scoring scheme, the domain-specific sum-of-pairs (DSP) score, which evaluates ortholog clustering results at the domain level as the sum total of domain-level alignment scores. We developed a refinement pipeline to improve domain-level clustering, DomRefine, by optimizing the DSP score. We applied DomRefine to domain-level ortholog groups created by DomClust using a dataset obtained from the Microbial Genome Database for Comparative Analysis (MBGD), and evaluated the results using COG clusters and TIGRFAMs models as the reference data. Thus, we observed that the agreement between the resulting classification and the classifications in the reference databases is improved at almost every step in the refinement pipeline. Moreover, the refined classification showed better agreement than the classifications in the eggNOG databases when TIGRFAMs was used as the reference database. Conclusions DomRefine is a useful tool for improving the quality of domain-level ortholog classification among microbial genomes. Combining with a rapid domain-level ortholog clustering method, such as DomClust, it can be used to create a high-quality ortholog database that can serve as a solid basis for various comparative genome analyses.
    Full-text · Article · May 2014 · BMC Bioinformatics
  • Source
    • "While a draft genome of E. adhaerens CSBa has recently been reported, to the best of our knowledge only DNA sequences for cobalamin biosynthetic (cob) genes are publicly available [47]. In this study the genome of OV14 was sequenced and functionally annotated by comparing to the already sequenced genomes of C58 and 1021 using the eggNOG database [48,49]. Subsequently, the literature was screened for all genes reported to have a positive effect on A. tumefaciens virulence and then homologs to these genes were sought for in OV14, and also in 1021 for additional comparison. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently it has been shown that Ensifer adhaerens can be used as a plant transformation technology, transferring genes into several plant genomes when equipped with a Ti plasmid. For this study, we have sequenced the genome of Ensifer adhaerens OV14 (OV14) and compared it with those of Agrobacterium tumefaciens C58 (C58) and Sinorhizobium meliloti 1021 (1021); the latter of which has also demonstrated a capacity to genetically transform crop genomes, albeit at significantly reduced frequencies. The 7.7 Mb OV14 genome comprises two chromosomes and two plasmids. All protein coding regions in the OV14 genome were functionally grouped based on an eggNOG database. No genes homologous to the A. tumefaciens Ti plasmid vir genes appeared to be present in the OV14 genome. Unexpectedly, OV14 and 1021 were found to possess homologs to chromosomal based genes cited as essential to A. tumefaciens T-DNA transfer. Of significance, genes that are non-essential but exert a positive influence on virulence and the ability to genetically transform host genomes were identified in OV14 but were absent from the 1021 genome. This study reveals the presence of homologs to chromosomally based Agrobacterium genes that support T-DNA transfer within the genome of OV14 and other alphaproteobacteria. The sequencing and analysis of the OV14 genome increases our understanding of T-DNA transfer by non-Agrobacterium species and creates a platform for the continued improvement of Ensifer-mediated transformation (EMT).
    Full-text · Article · Apr 2014 · BMC Genomics
  • Source
    • "Genes were aligned to the eggNOG (v 3.0) and KEGG (release 59.0) databases using BLASTP (e-value ≤1e-5) for functional annotations [16,17]. Each sequence was assigned to a KEGG orthologous group or to an eggNOG orthologous group based on the highest scoring annotated hit(s) that contained at least one high-scoring segment pair (HSP) scoring over 60 bits. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Assessment and characterization of human colon microbiota is now a major research area in human diseases, including in patients with hepatitis B liver cirrhosis (HBLC). We recruited 120 patients with HBLC and 120 healthy controls. The fecal microbial community and functions in the two groups were analyzed using high-throughput Solexa sequencing of the complete metagenomic DNA and bioinformatics methods. Community and metabolism-wide changes of the fecal microbiota in 20 HBLC patients and 20 healthy controls were observed and compared. A negative correlation was observed between the Child-Turcotte-Pugh scores and Bacteroidetes (P < 0.01), whereas a positive correlation was observed between the scores and Enterobacteriaceae and Veillonella (P < 0.01). Analysis of the additional 200 fecal microbiota samples demonstrated that these intestinal microbial markers might be useful for distinguishing liver cirrhosis microbiota samples from normal ones. The functional diversity was significantly reduced in the fecal microbiota of cirrhotic patients compared with in the controls. At the module or pathway levels, the fecal microbiota of the HBLC patients showed enrichment in the metabolism of glutathione, gluconeogenesis, branched-chain amino acid, nitrogen, and lipid (P < 0.05), whereas there was a decrease in the level of aromatic amino acid, bile acid and cell cycle related metabolism (P < 0.05). Extensive differences in the microbiota community and metabolic potential were detected in the fecal microbiota of cirrhotic patients. The intestinal microbial community may act as an independent organ to regulate the body's metabolic balance, which may affect the prognosis for HBLC patients.
    Full-text · Article · Dec 2013 · BMC Gastroenterology
Show more