PartiGene - constructing partial genomes

School of Biological Sciences, Ashworth Laboratories, King's Buildings, West Mains Rd, University of Edinburgh, Edinburgh EH9 3JT, UK.
Bioinformatics (Impact Factor: 4.62). 07/2004; 20(9):1398-404. DOI: 10.1093/bioinformatics/bth101
Source: PubMed

ABSTRACT Expressed sequence tags (ESTs) offer a low-cost approach to gene discovery and are being used by an increasing number of laboratories to obtain sequence information for a wide variety of organisms. The challenge lies in processing and organizing this data within a genomic context to facilitate large scale analyses. Here we present PartiGene, an integrated sequence analysis suite that uses freely available public domain software to (1) process raw trace chromatograms into sequence objects suitable for submission to dbEST; (2) place these sequences within a genomic context; (3) perform customizable first-pass annotation of the data; and (4) present the data as HTML tables and an SQL database resource. PartiGene has been used to create a number of non-model organism database resources including NEMBASE ( and LumbriBase ( The packages are readily portable, freely available and can be run on simple Linux-based workstations. AVAILABILITY: PartiGene is available from and also forms part of the EST analysis software, associated with the Natural Environmental Research Council (UK) Bio-Linux project (

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the past decade, transcriptome data have become an important component of many phylogenetic studies. They are a cost-effective source of protein-coding gene sequences, and have helped projects grow from a few genes to hundreds or thousands of genes. Phylogenetic studies now regularly include genes from newly sequenced transcriptomes, as well as publicly available transcriptomes and genomes. Implementing such a phylogenomic study, however, is computationally intensive, requires the coordinated use of many complex software tools, and includes multiple steps for which no published tools exist. Phylogenomic studies have therefore been manual or semiautomated. In addition to taking considerable user time, this makes phylogenomic analyses difficult to reproduce, compare, and extend. In addition, methodological improvements made in the context of one study often cannot be easily applied and evaluated in the context of other studies. We present Agalma, an automated tool that constructs matrices for phylogenomic analyses. The user provides raw Illumina transcriptome data, and Agalma produces annotated assemblies, aligned gene sequence matrices, a preliminary phylogeny, and detailed diagnostics that allow the investigator to make extensive assessments of intermediate analysis steps and the final results. Sequences from other sources, such as externally assembled genomes and transcriptomes, can also be incorporated in the analyses. Agalma is built on the BioLite bioinformatics framework, which tracks provenance, profiles processor and memory use, records diagnostics, manages metadata, installs dependencies, logs version numbers and calls to external programs, and enables rich HTML reports for all stages of the analysis. Agalma includes a small test data set and a built-in test analysis of these data. In addition to describing Agalma, we here present a sample analysis of a larger seven-taxon data set. Agalma is available for download at Agalma allows complex phylogenomic analyses to be implemented and described unambiguously as a series of high-level commands. This will enable phylogenomic studies to be readily reproduced, modified, and extended. Agalma also facilitates methods development by providing a complete modular workflow, bundled with test data, that will allow further optimization of each step in the context of a full phylogenomic analysis.
    BMC Bioinformatics 11/2013; 14(1):330. DOI:10.1186/1471-2105-14-330 · 2.67 Impact Factor
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Next-generation DNA sequencing technologies have made it possible to generate transcriptome data for novel organisms quickly and cheaply, to the extent that the effort required to annotate and publish a new transcriptome is greater than the effort required to sequence it. Often, following publication, details of the annotation effort are only available in summary form, hindering subsequent exploitation of the data. To promote best-practice in annotation and to ensure that data remain accessible, we have written afterParty, a web application that allows users to assemble, annotate and publish novel transcriptomes using only a web browser. afterParty is a robust web application that implements best-practice transcriptome assembly, annotation, browsing, searching, and visualization. Users can turn a collection of reads (from Roche 454 chemistry) or assembled contigs (from any sequencing chemistry, including Illumina Solexa RNA-Seq) into a searchable, browsable transcriptome resource and quickly make it publicly available. Contigs are functionally annotated based on similarity to known sequences and protein domains. Once assembled and annotated, transcriptomes derived from multiple species or libraries can be compared and searched. afterParty datasets can either be created using the existing afterParty server, or using local instances that can be built easily using a virtual machine. afterParty includes powerful visualization tools for transcriptome dataset exploration and uses a flexible annotation architecture which will allow additional types of annotation to be added in the future. afterParty's main use case scenario is one in which a working biologist has generated a large volume of transcribed sequence data and wishes to turn it into a useful resource that has some durability. By reducing the effort, bioinformatics skills, and computational resources needed to annotate and publish a transcriptome, afterParty will facilitate the annotation and sharing of sequence data that would otherwise remain unavailable. A typical metazoan transcriptome containing several tens of thousands of contigs can be annotated in a few minutes of interactive time and a few days of computational time.
    BMC Bioinformatics 10/2013; 14(1):301. DOI:10.1186/1471-2105-14-301 · 2.67 Impact Factor

Full-text (2 Sources)

Available from
May 19, 2014