Article

Using native and syntenically mapped cDNA alignments to improve

Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA.
Bioinformatics (Impact Factor: 4.62). 04/2008; 24(5):637-44. DOI: 10.1093/bioinformatics/btn013
Source: PubMed

ABSTRACT Computational annotation of protein coding genes in genomic DNA is a widely used and essential tool for analyzing newly sequenced genomes. However, current methods suffer from inaccuracy and do poorly with certain types of genes. Including additional sources of evidence of the existence and structure of genes can improve the quality of gene predictions. For many eukaryotic genomes, expressed sequence tags (ESTs) are available as evidence for genes. Related genomes that have been sequenced, annotated, and aligned to the target genome provide evidence of existence and structure of genes.
We incorporate several different evidence sources into the gene finder AUGUSTUS. The sources of evidence are gene and transcript annotations from related species syntenically mapped to the target genome using TransMap, evolutionary conservation of DNA, mRNA and ESTs of the target species, and retroposed genes. The predictions include alternative splice variants where evidence supports it. Using only ESTs we were able to correctly predict at least one splice form exactly correct in 57% of human genes. Also using evidence from other species and human mRNAs, this number rises to 77%. Syntenic mapping is well-suited to annotate genomes closely related to genomes that are already annotated or for which extensive transcript evidence is available. Native cDNA evidence is most helpful when the alignments are used as compound information rather than independent positionwise information.
AUGUSTUS is open source and available at http://augustus.gobics.de. The gene predictions for human can be browsed and downloaded at the UCSC Genome Browser (http://genome.ucsc.edu).

Download full-text

Full-text

Available from: Mario Stanke, May 19, 2015
1 Follower
 · 
135 Views
  • Source
    • "mwas.html) with the tuatara transcriptome dataset from Miller et al. (2012) and an Anolis carolinensis protein dataset from Ensembl (AnoCar2.0.72) as expressed sequence tag and protein evidence, respectively . The software Augustus (Stanke et al. 2008) was used for ab initio gene prediction. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Major Histocompatibility Complex (MHC) genes are a central component of the vertebrate immune system and usually exist in a single genomic region. However, considerable differences in MHC organization and size exist between different vertebrate lineages. Reptiles occupy a key evolutionary position for understanding how variation in MHC structure evolved in vertebrates, but information on the structure of the MHC region in reptiles is limited. In this study, we investigate the organization and cytogenetic location of MHC genes in the tuatara (Sphenodon punctatus), the sole extant representative of the early-diverging reptilian order Rhynchocephalia. Sequencing and mapping of 12 clones containing class I and II MHC genes from a Bacterial Artificial Chromosome library indicated that the core MHC region is located on chromosome 13q. However, duplication and translocation of MHC genes outside of the core region was evident, as additional class I MHC genes were located on chromosome 4p. We found a total of seven class I sequences, and eleven class II β sequences, with evidence for duplication and pseudogenization of genes within the tuatara lineage. The tuatara MHC is characterized by high repeat content and low gene density compared with other species and we found no antigen processing or MHC framework genes on the MHC gene-containing clones. Our findings indicate substantial differences in MHC organization in tuatara compared with mammalian and avian MHCs and highlight the dynamic nature of the MHC. Further sequencing and annotation of tuatara and other reptile MHCs will determine if the tuatara MHC is representative of non-avian reptiles in general. Copyright © 2015 Author et al.
    G3-Genes Genomes Genetics 05/2015; DOI:10.1534/g3.115.017467 · 2.51 Impact Factor
  • Source
    • "A second initial gene set was generated by WebAUGUSTUS (Hoff and Stanke, 2013), which ran PASA (Haas et al., 2003) on assembled RNA-Seq data and the unmasked genome from Xenoturbella. Both gene sets were merged (nonredundantly ) and used for optimizing AUGUSTUS parameters (Stanke, et al., 2008). RNA-Seq reads were mapped against the repeat masked genome, alignments were converted to hints for AUGUSTUS. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Xenacoelomorpha is, most probably, a monophyletic group that includes three clades: Acoela, Nemertodermatida and Xenoturbellida. The group still has contentious phylogenetic affinities; though most authors place it as the sister group of the remaining bilaterians, some would include it as a fourth phylum within the Deuterostomia. Over the past few years, our group, along with others, has undertaken a systematic study of the microscopic anatomy of these worms; our main aim is to understand the structure and development of the nervous system. This research plan has been aided by the use of molecular/developmental tools, the most important of which has been the sequencing of the complete genomes and transcriptomes of different members of the three clades. The data obtained has been used to analyse the evolutionary history of gene families and to study their expression patterns during development, in both space and time. A major focus of our research is the origin of 'cephalized' (centralized) nervous systems. How complex brains are assembled from simpler neuronal arrays has been a matter of intense debate for at least 100 years. We are now tackling this issue using Xenacoelomorpha models. These represent an ideal system for this work because the members of the three clades have nervous systems with different degrees of cephalization; from the relatively simple sub-epithelial net of Xenoturbella to the compact brain of acoels. How this process of 'progressive' cephalization is reflected in the genomes or transcriptomes of these three groups of animals is the subject of this paper. © 2015. Published by The Company of Biologists Ltd.
    Journal of Experimental Biology 02/2015; 218(Pt 4):618-28. DOI:10.1242/jeb.110379 · 3.00 Impact Factor
  • Source
    • "A second initial gene set was generated by WebAUGUSTUS (Hoff and Stanke, 2013), which ran PASA (Haas et al., 2003) on assembled RNA-Seq data and the unmasked genome from Xenoturbella. Both gene sets were merged (nonredundantly ) and used for optimizing AUGUSTUS parameters (Stanke, et al., 2008). RNA-Seq reads were mapped against the repeat masked genome, alignments were converted to hints for AUGUSTUS. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Xenacoelomorpha is, most probably, a monophyletic group that includes three clades: Acoela, Nemertodermatida and Xenoturbellida. The group still has contentious phylogenetic affinities; though most authors place it as the sister group of the remaining bilaterians, some would include it as a fourth phylum within the Deuterostomia. Over the past few years, our group, along with others, has undertaken a systematic study of the microscopic anatomy of these worms; our main aim is to understand the structure and development of the nervous system. This research plan has been aided by the use of molecular/developmental tools, the most important of which has been the sequencing of the complete genomes and transcriptomes of different members of the three clades. The data obtained has been used to analyse the evolutionary history of gene families and to study their expression patterns during development, in both space and time. A major focus of our research is the origin of 'cephalized' (centralized) nervous systems. How complex brains are assembled from simpler neuronal arrays has been a matter of intense debate for at least 100 years. We are now tackling this issue using Xenacoelomorpha models. These represent an ideal system for this work because the members of the three clades have nervous systems with different degrees of cephalization; from the relatively simple sub-epithelial net of Xenoturbella to the compact brain of acoels. How this process of 'progressive' cephalization is reflected in the genomes or transcriptomes of these three groups of animals is the subject of this paper.
Show more