GOBASE: an organelle genome database.
ABSTRACT The organelle genome database GOBASE, now in its 21st release (June 2008), contains all published mitochondrion-encoded sequences (approximately 913,000) and chloroplast-encoded sequences (approximately 250,000) from a wide range of eukaryotic taxa. For all sequences, information on related genes, exons, introns, gene products and taxonomy is available, as well as selected genome maps and RNA secondary structures. Recent major enhancements to database functionality include: (i) addition of an interface for RNA editing data, with substitutions, insertions and deletions displayed using multiple alignments; (ii) addition of medically relevant information, such as haplotypes, SNPs and associated disease states, to human mitochondrial sequence data; (iii) addition of fully reannotated genome sequences for Escherichia coli and Nostoc sp., for reference and comparison; and (iv) a number of interface enhancements, such as the availability of both genomic and gene-coding sequence downloads, and a more sophisticated literature reference search functionality with links to PubMed where available. Future projects include the transfer of GOBASE features to NCBI/GenBank, allowing long-term preservation of accumulated expert information. The GOBASE database can be found at http://gobase.bcm.umontreal.ca/. Queries about custom and large-scale data retrievals should be addressed to firstname.lastname@example.org.
[show abstract] [hide abstract]
ABSTRACT: The taxonomically broad organelle genome database (GOBASE) organizes and integrates diverse data related to organelles (mitochondria and chloroplasts). The current version of GOBASE focuses on the mitochondrial subset of data and contains molecular sequences, RNA secondary structures and genetic maps, as well as taxonomic information for all eukaryotic species represented. The database has been designed so that complex biological queries, especially ones posed in a comparative genomics context, are supported. GOBASE has been implemented as a relational database with a web-based user interface (http://megasun.bch.umontreal.ca/gobase/gobas e.html ). Custom software tools have been written in house to assist in the population of the database, data validation, nomenclature standardization and front-end design. The database is fully operational and publicly accessible via the World Wide Web, allowing interactive browsing, sophisticated searching and easy downloading of data.Nucleic Acids Research 02/1998; 26(1):138-44. · 8.03 Impact Factor
[show abstract] [hide abstract]
ABSTRACT: The organelle genome database GOBASE is now in its twelfth release, and includes 350,000 mitochondrial sequences and 118,000 chloroplast sequences, roughly a 3-fold expansion since previously documented. GOBASE also includes a fully reannotated genome sequence of Rickettsia prowazekii, one of the closest bacterial relatives of mitochondria, and will shortly expand to contain more data from bacteria from which organelles originated. All these sequences are now accessible through a single unified interface. Enhancements to the functionality of GOBASE include addition of pages for RNA structures and a page compiling data about the taxonomic distribution of organelle-encoded genes; incorporation of Gene Ontology terms; addition of features deduced from incomplete annotations to sequences in GenBank; marking of type examples in cases where single genes in single species are oversampled within GenBank; and addition of graphics illustrating gene structure and the position of neighbouring genes on a sequence. The database has been reimplemented in PostgreSQL to facilitate development and maintenance, and structural modifications have been made to speed up queries, particularly those related to taxonomy. The GOBASE database can be queried at http://gobase.bcm.umontreal.ca/ and inquiries should be directed to email@example.com.Nucleic Acids Research 02/2006; 34(Database issue):D697-9. · 8.03 Impact Factor
Trends in Genetics 01/2004; 19(12):709-16. · 10.06 Impact Factor
Nucleic Acids Research, 2009, Vol. 37, Database issuePublished online 25 October 2008
GOBASE: an organelle genome database
Emmet A. O’Brien*, Yue Zhang, Eric Wang, Veronique Marie, Wole Badejoko,
B. Franz Lang and Gertraud Burger
Robert-Cedergren Center for Bioinformatics and Genomics, De ´partement de Biochimie, Pavillon Roger-Gaudry,
Universite ´ de Montre ´al, 2900 Edouard-Montpetit, Montreal QC, Canada H3T 1J4
Received September 11, 2008; Revised October 10, 2008; Accepted October 13, 2008
The organelle genome database GOBASE, now in its
21st release (June 2008), contains all published
mitochondrion-encoded sequences (~913000) and
chloroplast-encoded sequences (~250000) from a
wide range of eukaryotic taxa. For all sequences,
information on related genes, exons, introns, gene
products and taxonomy is available, as well as
selected genome maps and RNA secondary struc-
tures. Recent major enhancements to database
functionality include: (i) addition of an interface for
RNA editing data, with substitutions, insertions and
such as haplotypes, SNPs and associated disease
states, to human mitochondrial sequence data;
(iii) addition of fully reannotated genome sequences
for Escherichia coli and Nostoc sp., for reference
and comparison; and (iv) a number of interface
enhancements, such as the availability of both
genomic and gene-coding sequence downloads,
search functionality with links to PubMed where
available. Future projects include the transfer of
GOBASE features to NCBI/GenBank, allowing long-
term preservation of accumulated expert informa-
tion. The GOBASE database can be found at http://
gobase.bcm.umontreal.ca/. Queries about custom
and large-scale data retrievals should be addressed
The amount of information available in generalist molec-
ular sequence databases such as GenBank (1) continues to
grow, and this information becomes more diverse and
complex as we discover new biological phenomena.
Therefore, there is an increasing need for expert databases
specializing in particular areas of molecular biology.
Specialist databases provide expert curation of data, and
access to that data in a flexible and well-integrated fashion
serves a purpose complementary to generalist databases
such as GenBank.
GOBASE is one such specialist database, which has
been collecting, curating and publishing data concerning
mitochondrial and chloroplast genomes since 1995 (2–5).
Organelle genomes are of biological interest for a wide
range of studies, such as molecular taxonomy, molecular
mechanisms of trans-splicing and RNA editing, and
non-Mendelian inherited metabolism-related disease in
humans. GOBASE contains a number of different cate-
gories of data, such as nucleic acid and protein sequences,
genetic maps, taxonomic data and RNA secondary struc-
tures. All gene and product names have been assigned
from a locally maintained standard list, and this combines
with a powerful and flexible interface to allow a wide
range of complex searches. While initially GOBASE was
designed primarily to address issues of comparative biol-
ogy, such as the diversity of organelle genome structure in
eukaryotes (e.g. 6,7), we have more recently added func-
tionality specific to the human mitochondrial genome in
GOBASE, such as searches by haplotype and disease
state, which are of medical interest.
GOBASE release 21 (June 2008) contains 913000 mito-
chondrial sequences including 737000 genes, and 250000
chloroplast-encoded sequences including 174000 genes,
derived mostly from GenBank releases up to 164. The
large number of complete organelle genomes available
makes GOBASE a valuable resource for phylogenomics,
with 6300 complete mitochondrial genomes and 213 chlo-
roplast genomes. This number has increased almost 4-fold
since the previous report.
More recently (5), we have added bacterial genome
sequences for reference purposes. As of release 21
GOBASE includes three complete bacterial genomes:
*To whom correspondence should be addressed. Tel: +1 514 343 6111; Fax: +1 514 343 2210; Email: firstname.lastname@example.org
? 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Rickettsia prowazekii strain Madrid E, closely related to
the bacterial ancestor of mitochondria; and the cyanobac-
terium Nostoc sp., closely related to the bacterial ancestor
of chloroplasts. In order to provide a consistent compara-
tive view of these genomes, they have each been reanno-
tated using the AutoFACT functional annotation tool (8),
including assignation of Gene Ontology terms. GOBASE
now contains 10700 bacterial genes in total.
ENHANCEMENTS TO FUNCTIONALITY
RNA editing refers to a molecular process by which the
sequence of a transcribed RNA is modified. This has been
seen to occur in the mitochondria of several eukaryotic
taxa, such as plants (9) and trypanosomes (10), and in
chloroplasts (11). At the level of basic changes, examples
exist in the database of sequences being modified by the
substitution of one residue for another, by deletion of
residues, and by the addition of residues, usually uracil.
The RNA editing interface in GOBASE is based primar-
ily on the previously existing RNA query page, with the
addition of editing-specific selection parameters such as
the type of modification (insertion, deletion or substitu-
tion). A query result is shown in Figure 1. In addition to
the sequence itself, edited positions are displayed, both as
a list specifying the exact change made at each position,
and marked in red on an alignment of the relevant sections
of sequence for a straightforward and intuitive visual
representation. The interface displays only the regions of
the sequence where editing occurs. Coding and intronic
regions of the sequence are distinguished by background
color. Complete unedited and edited sequences can be
downloaded from the interface page. Future development
will include the possibility of downloading the sequence
alignment as displayed, and the addition of multiple rows
Figure 1. RNA editing result page, showing sequence-specific data, location of edited positions and alignment of gene sequence with edited sequence.
Hyperlinks lead to database pages for details of appropriate Gene Product, Taxonomy, Sequence and Gene, and to the Entrez page for the
appropriate gi. Start and end positions of the gene, and locations of edited positions, are numbered relative to the start of the sequence entry
containing the gene.
Nucleic Acids Research, 2009,Vol. 37,Database issueD947
to the alignment in cases where edits to a sequence are
known to occur sequentially, so that observed intermedi-
ate stages in the editing process can be represented.
Information specific to the ?3000 complete human
mitochondrial genome sequences in GOBASE has been
added from a number of sources, including HmtDB
(http://www.hmtdb/uniba.it/) (12), OMIM (http://www.
www.mitomap.org/) (14). Two different interface pages
provide access to these new data.
The Human Sequence query page allows the user to
select a set of human mitochondrial sequences based on
haplogroup and disease state. More than 450 different
haplogroup assignments are available in GOBASE, so a
full list might become unwieldy for some queries. As hap-
logroup designators always start with a letter, the user is
offered the option of first selecting an initial letter or let-
ters, and then picking a range of individual haplogroups
from the corresponding subset of haplogroup assign-
ments shown in a menu. The results page (Figure 2)
GOBASE Sequence page, and also shows all the positions
at which this sequence differs from the reference human
mitochondrial genome as defined in GenBank (accession
no NC_001807) using an alignment. On this alignment,
mutations that have been associated with disease are
marked in yellow, and other polymorphic mutations are
indicated in red.
The Human Mutation query page (Figure 3a) allows the
user to search the dataset for mutations of interest within
a specified range of positions on the human mitochondrial
genome sequence, either by specifying start and end posi-
tions directly or by selecting one or more genes from a
list on the interface. This search returns a list of posi-
tions at which mutations are documented. For each muta-
tion (Figure 3b), the result page provides data on its
disease associations, a section of the reference sequence
showing the location and neighborhood of the mutation,
and a list of the sequences in GOBASE containing this
Other functional enhancements
The DNA sequence download functionality has been
modified to allow the user to download either genomic
Figure 2. Human sequence result page, showing the difference between the queried sequence and the reference human mitochondrial genome
sequence, both as a list of divergent positions and as an alignment of relevant sections of the sequences.
Nucleic Acids Research, 2009, Vol. 37, Databaseissue
sequence or gene-coding regions, selectable via buttons
from the Gene query page. There are a small number of
unusual cases, such as trans-spliced genes, where there is
no straightforward correspondence between a single gene
and a contiguous linear region of the source sequence
record. The GOBASE database structure has now been
modified to address these cases transparently. Sequences
of complex gene-coding regions are assembled in advance,
stored and made available in query results through the
same interface as conventional linear genes.
All sequences retrieved from GOBASE now come with
detailed literature references derived from the source
GenBank records. Journal, author and title are provided,
and a direct link to the appropriate PubMed entry if one
Because of practical constraints, any given query in
GOBASE returns at most 5000 results. Users wishing to
execute custom queries retrieving larger amounts of data
are invited to contact the GOBASE team at gobase@bch.
umontreal.ca so that the query can be run directly on the
database via SQL.
The GOBASE database is implemented in version 7.4.1 of
the PostgreSQL relational database management system
with a web interface written in v4.3.8 of the PHP scripting
language. The graphics on the gene pages are generated
using the GD module for Perl/PHP, version 2.0.25. Perl
(5.8.0) scripts are used to download data from GenBank
and process it into GOBASE. All procedures are executed
on PCs with two 2.4GHz or 2.8GHz Intel Xeon CPUs.
Specialized databases with all their valuable information
are prone to disappearance (15), mostly because of fund-
ing constraints, unless transferred to sustainable public
databases. We are therefore collaborating with scientists
at NCBI to establish a database based on the content of
GOBASE as an auxiliary to GenBank. This database will
focus on the additional data that expert curation at
GOBASE has generated, notably the curated gene and
Figure 3. (a) Human mutation query page, allowing the user to select the gene(s) of interest and specify the range of positions on the sequence to
search for mutations. (b) Result page showing details for an individual mutation.
Nucleic Acids Research, 2009,Vol. 37,Database issueD949
product names and synonyms and RNA secondary struc-
ture data, thus providing a permanent repository for two
decades of curation of organelle genome data.
The authors would like to thank Ilene Mizrachi, Susan
Schaefer, Tatiana Tatusova and Jim Ostell at NCBI;
Chris Cesaire, Ousman Diallo, and Olivier Tremblay-
Savard for contributions to the development of the
RNA editing functionality in GOBASE, and Allan Sun
for systems administration.
This project was funded by grants MOP-15331 and MOP-
84453 from the Canadian Institute for Health Research
(CIHR, Genetics Institute). Funding for open access
Conflict of interest statement. None declared.
1. Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and
Wheeler,D.L. (2008) GenBank. Nucleic Acids Res., 36, D25–D30.
2. Korab-Laskowska,M., Rioux,P., Brossard,N., Littlejohn,T.G.,
Gray,M.W., Lang,B.F. and Burger,G. (1998) The Organelle
Genome Database Project (GOBASE). Nucleic Acids Res., 26,
3. Shimko,N., Liu,L., Lang,B.F. and Burger,G. (2001) GOBASE: the
organelle genome database. Nucleic Acids Res., 29, 128–132.
4. O’Brien,E.A., Badidi,E., Barbasiewicz,A., deSousa,C., Lang,B.F.
and Burger,G. (2003) GOBASE – a database of mitochondrial and
chloroplast information. Nucleic Acids Res., 31, 176–178.
5. O’Brien,E.A., Zhang,Y., Yang,L., Wang,E., Marie,V., Lang,B.F.
and Burger,G. (2006) GOBASE – a database of organelle and
bacterial genome information. Nucleic Acids Res., 34, D697–D699.
6. Lang,B.F., Gray,M.W. and Burger,G. (1999) Mitochondrial
genome evolution and the orgin of eukaryotes. Annu. Rev. Genetics.,
7. Burger,G., Gray,M.W. and Lang,B.F. (2003) Mitochondrial gen-
omes: anything goes. Trends Genet., 19, 709–716.
8. Koski,L.B., Gray,M.W., Lang,B.F. and Burger,G. (2005)
AutoFACT: an automatic functional annotation and classification
tool. BMC Bioinform., 6, 151.
9. Covello,P.S. and Gray,M.W. (1989) RNA editing in plant mito-
chondria. Nature, 341, 662–666.
10. Benne,R., Van den Burg,J., Brakenhoff,J.P., Sloof,P., Van Boom,J.H.
and Tromp,M.C. (1986) Major transcript of the frameshifted coxII
gene from trypanosome mitochondria contains four nucleotides
that are not encoded in the DNA. Cell, 46, 819–826.
11. Hoch,B., Maier,R.M., Appel,K., Igloi,G.L. and Ko ¨ ssel,H. (1991)
Editing of a chloroplast mRNA by creation of an initiation codon.
Nature, 353, 178–180.
Pappad,G., Russo,L., Zanchetta,L. and Tommaseo-Ponzetta,M. (2005)
HmtDB, a human mitochondrial genomic resource based on variability
studies supporting population genetics and biomedical research.
BMC Bioinform., 1, S4.
13. Wheeler,D.L., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K.,
Chetvernin,V., Church,D.M., Dicuccio,M., Edgar,R., Federhen,S.
et al. (2008) Database resources of the National Center for
Biotechnology Information. Nucleic Acids Res., 36, D13–D21.
14. Ruiz-Pesini,E., Lott,M.T., Procaccio,V., Poole,J.C., Brandon,M.C.,
Mishmar,D., Yi,C., Kreuziger,J., Baldi,P. and Wallace,D.C. (2007)
An enhanced MITOMAP with a global mtDNA mutational
phylogeny. Nucleic Acids Res., 35, D823–D828.
15. Merali,Z. and Giles,G. (2005) Databases in peril. Nature, 23,
Nucleic Acids Research, 2009, Vol. 37, Databaseissue