Meeting report: a workshop on Best Practices in Genome Annotation

Informatics, J. Craig Venter Institute, Rockville, MD 20850 USA, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK and The Arabidopsis Information Resource, Carnegie Institution of Washington, Stanford, CA 94305 USA.
Database The Journal of Biological Databases and Curation (Impact Factor: 3.37). 01/2010; 2010:baq001. DOI: 10.1093/database/baq001
Source: PubMed


Efforts to annotate the genomes of a wide variety of model organisms are currently carried out by sequencing centers, model
organism databases and academic/institutional laboratories around the world. Different annotation methods and tools have been
developed over time to meet the needs of biologists faced with the task of annotating biological data. While standardized
methods are essential for consistent curation within each annotation group, methods and tools can differ between groups, especially
when the groups are curating different organisms. Biocurators from several institutes met at the Third International Biocuration
Conference in Berlin, Germany, April 2009 and hosted the ‘Best Practices in Genome Annotation: Inference from Evidence’ workshop
to share their strategies, pipelines, standards and tools. This article documents the material presented in the workshop.

Download full-text


Available from: Linda I Hannick
  • Source
    • "Blixem and Dotter are used extensively by the HAVANA group at the Wellcome Trust Sanger Institute and are essential to the manual annotation process. Examples of work published by the HAVANA group that has involved the use of Blixem and Dotter includes5678910. Belvu is used in the curation of high-quality " seed " alignments for the Pfam database[11]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Manual annotation is essential to create high-quality reference alignments and annotation. Annotators need to be able to view sequence alignments in detail. The SeqTools package provides three tools for viewing different types of sequence alignment: Blixem is a many-to-one browser of pairwise alignments, displaying multiple match sequences aligned against a single reference sequence; Dotter provides a graphical dot-plot view of a single pairwise alignment; and Belvu is a multiple sequence alignment viewer, editor, and phylogenetic tool. These tools were originally part of the AceDB genome database system but have been completely rewritten to make them generally available as a standalone package of greatly improved function. Findings: Blixem is used by annotators to give a detailed view of the evidence for particular gene models. Blixem displays the gene model positions and the match sequences aligned against the genomic reference sequence. Annotators use this for many reasons, including to check the quality of an alignment, to find missing/misaligned sequence and to identify splice sites and polyA sites and signals. Dotter is used to give a dot-plot representation of a particular pairwise alignment. This is used to identify sequence that is not represented (or is misrepresented) and to quickly compare annotated gene models with transcriptional and protein evidence that putatively supports them. Belvu is used to analyse conservation patterns in multiple sequence alignments and to perform a combination of manual and automatic processing of the alignment. High-quality reference alignments are essential if they are to be used as a starting point for further automatic alignment generation. Conclusions: While there are many different alignment tools available, the SeqTools package provides unique functionality that annotators have found to be essential for analysing sequence alignments as part of the manual annotation process.
    Preview · Article · Dec 2016 · BMC Research Notes
  • Source
    • "To gain full use of the genomic information, the next step after sequencing is to annotate genes encoded in each genome. Several genome annotation pipelines have been established and actively used in a number of genome projects [4-6]. One of the most important products of genome annotation is a set of amino acid sequences translated from predicted protein-coding genes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Accurate computational identification of eukaryotic gene organization is a long-standing problem. Despite the fundamental importance of precise annotation of genes encoded in newly sequenced genomes, the accuracy of predicted gene structures has not been critically evaluated, mostly due to the scarcity of proper assessment methods. Results We present a gene-structure-aware multiple sequence alignment method for gene prediction using amino acid sequences translated from homologous genes from many genomes. The approach provides rich information concerning the reliability of each predicted gene structure. We have also devised an iterative method that attempts to improve the structures of suspiciously predicted genes based on a spliced alignment algorithm using consensus sequences or reliable homologs as templates. Application of our methods to cytochrome P450 and ribosomal proteins from 47 plant genomes indicated that 50 ~ 60 % of the annotated gene structures are likely to contain some defects. Whereas more than half of the defect-containing genes may be intrinsically broken, i.e. they are pseudogenes or gene fragments, located in unfinished sequencing areas, or corresponding to non-productive isoforms, the defects found in a majority of the remaining gene candidates can be remedied by our iterative refinement method. Conclusions Refinement of eukaryotic gene structures mediated by gene-structure-aware multiple protein sequence alignment is a useful strategy to dramatically improve the overall prediction quality of a set of homologous genes. Our method will be applicable to various families of protein-coding genes if their domain structures are evolutionarily stable. It is also feasible to apply our method to gene families from all kingdoms of life, not just plants.
    Full-text · Article · Jun 2014 · BMC Bioinformatics
  • Source
    • "However, the accuracy of the annotation also relies on the automated pipeline used [38], some predicted genes could be dissimilar to anything in the reference databases as they could have evolved extensively, represent uncharacterized sequences, or be misidentified [39]. Reference databases and computational methods constituting annotation pipelines are constantly developed, and there is hence a need to reprocess genome annotations on a regular basis to improve their quality and completeness [39], [40]. As it was not the primary objective in this study, the genomic sequence of the bacterium remains as draft. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Following the isolation, cultivation and characterization of the rumen bacterium Anaerovibrio lipolyticus in the 1960s, it has been recognized as one of the major species involved in lipid hydrolysis in ruminant animals. However, there has been limited characterization of the lipases from the bacterium, despite the importance of understanding lipolysis and its impact on subsequent biohydrogenation of polyunsaturated fatty acids by rumen microbes. This study describes the draft genome of Anaerovibrio lipolytica 5ST, and the characterization of three lipolytic genes and their translated protein. The uncompleted draft genome was 2.83 Mbp and comprised of 2,673 coding sequences with a G+C content of 43.3%. Three putative lipase genes, alipA, alipB and alipC, encoding 492-, 438- and 248- amino acid peptides respectively, were identified using RAST. Phylogenetic analysis indicated that alipA and alipB clustered with the GDSL/SGNH family II, and alipC clustered with lipolytic enzymes from family V. Subsequent expression and purification of the enzymes showed that they were thermally unstable and had higher activities at neutral to alkaline pH. Substrate specificity assays indicated that the enzymes had higher hydrolytic activity against caprylate (C8), laurate (C12) and myristate (C14).
    Full-text · Article · Aug 2013 · PLoS ONE
Show more