Eukaryotic genome size databases.
ABSTRACT Three independent databases of eukaryotic genome size information have been launched or re-released in updated form since 2005: the Plant DNA C-values Database (www.kew.org/genomesize/homepage.html), the Animal Genome Size Database (www.genomesize.com) and the Fungal Genome Size Database (www.zbi.ee/fungal-genomesize/). In total, these databases provide freely accessible genome size data for >10,000 species of eukaryotes assembled from more than 50 years' worth of literature. Such data are of significant importance to the genomics and broader scientific community as fundamental features of genome structure, for genomics-based comparative biodiversity studies, and as direct estimators of the cost of complete sequencing programs.
- SourceAvailable from: PubMed Central[Show abstract] [Hide abstract]
ABSTRACT: Recently, large bio-projects dealing with the release of different genomes have transpired. Most of these projects use next-generation sequencing platforms. As a consequence, many de novo assembly tools have evolved to assemble the reads generated by these platforms. Each tool has its own inherent advantages and disadvantages, which make the selection of an appropriate tool a challenging task. We have evaluated the performance of frequently used de novo assemblers namely ABySS, IDBA-UD, Minia, SOAP, SPAdes, Sparse, and Velvet. These assemblers are assessed based on their output quality during the assembly process conducted over fungal data. We compared the performance of these assemblers by considering both computational as well as quality metrics. By analyzing these performance metrics, the assemblers are ranked and a procedure for choosing the candidate assembler is illustrated. In this study, we propose an assessment method for the selection of de novo assemblers by considering their computational as well as quality metrics at the draft genome level. We divide the quality metrics into three groups: g1 measures the goodness of the assemblies, g2 measures the problems of the assemblies, and g3 measures the conservation elements in the assemblies. Our results demonstrate that the assemblers ABySS and IDBA-UD exhibit a good performance for the studied data from fungal genomes in terms of running time, memory, and quality. The results suggest that whole genome shotgun sequencing projects should make use of different assemblers by considering their merits.BMC Genomics 12/2014; 15(Suppl 9):S10. · 4.04 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Although the number of protein-coding genes is not highly variable between plant taxa, the DNA content in their genomes is highly variable, by as much as 2,056-fold from a 1C amount of 0.0648 pg to 132.5 pg. The mean 1C-value in plants is 2.4 pg, and genome size expansion/contraction is lineage-specific in plant taxonomy. Transposable element fractions in plant genomes are also variable, as low as ~3% in small genomes and as high as ~85% in large genomes, indicating that genome size is a linear function of transposable element content. Of the 2 classes of transposable elements, the dynamics of class 1 long terminal repeat (LTR) retrotransposons is a major contributor to the 1C value differences among plants. The activity of LTR retrotransposons is under the control of epigenetic suppressing mechanisms. Also, genome-purging mechanisms have been adopted to counter-balance the genome size amplification. With a wealth of information on whole-genome sequences in plant genomes, it was revealed that several genome-purging mechanisms have been employed, depending on plant taxa. Two genera, Lilium and Fritillaria, are known to have large genomes in angiosperms. There were twice times of concerted genome size evolutions in the family Liliaceae during the divergence of the current genera in Liliaceae. In addition to the LTR retrotransposons, non-LTR retrotransposons and satellite DNAs contributed to the huge genomes in the two genera by possible failure of genome counter-balancing mechanisms.Genomics & informatics. 09/2014; 12(3):87-97.
- Current Opinion in Insect Science. 02/2015; 35.
Eukaryotic genome size databases
T. Ryan Gregory*, James A. Nicol1, Heidi Tamm2, Bellis Kullman2, Kaur Kullman3,
Ilia J. Leitch4, Brian G. Murray5, Donald F. Kapraun6, Johann Greilhuber7
and Michael D. Bennett4
Department of Integrative Biology, University of Guelph, Guelph, Ontario, N1G 2W1, Canada,1Glossopteris
Web Design and Development, Sydney, Australia,2Institute of Agricultural and Environmental Sciences,
Estonian University of Life Sciences, 181 Riia Street, 51014 Tartu, Estonia,3Trump Trading Ltd, Tallinn, Estonia,
4Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AB, UK,5School of Biological Sciences,
University of Auckland, Private Bag 92019, Auckland, New Zealand,6Department of Biological Sciences,
University of North Carolina-Wilmington, 601 South College Road, Wilmington, NC 28403-3915, USA and
7Institute of Botany and Botanical Garden of the University of Vienna, Rennweg 14, A 1030 Vienna, Austria
Received August 14, 2006; Accepted October 4, 2006
Three independent databases of eukaryotic genome
size information have been launched or re-released
in updated form since 2005: the Plant DNA C-values
html), the Animal Genome Size Database (www.
genomesize.com) and the Fungal Genome Size
Database (www.zbi.ee/fungal-genomesize/). In total,
these databases provide freely accessible genome
size data for >10 000 species of eukaryotes assem-
bled from more than 50 years’ worth of literature.
Such data are of significant importance to the
genomics and broader scientific community as
fundamental features of genome structure, for
genomics-based comparative biodiversity studies,
and as direct estimators of the cost of complete
Eukaryotic genome size data are becoming increasingly
important both as the basis for comparative research into gen-
ome evolution and as direct estimators of the cost and diffi-
culty of genome sequencing programs for an expanding
sphere of non-model organisms (1–3). Nuclear DNA content
data for >10 000 species of plants, animals and fungi are
made freely available through three independent databases
of eukaryotic genome size that have been either launched
or re-released since 2005: the Plant DNA C-values Database
Animal Genome Size Database (www.genomesize.com) (5)
and the Fungal Genome Size Database (www.zbi.ee/fungal-
Genome sizes are typically given as gametic nuclear DNA
contents (‘C-values’) either in units of mass (picograms,
where 1 pg ¼ 10?12g) or in number of base pairs (in eukary-
otes, most often in megabases, where 1 Mb ¼ 106bases).
These are directly interconvertible as 1 pg ¼ 978 Mb
(or 1 Mb ¼ 1.022 · 10?3pg) (7). The majority of modern
genome size estimates are based on either Feulgen densitome-
try (more recently using computerized image analysis) or flow
cytometry, although DNA reassociation kinetics, bulk fluo-
rometry, static fluorometry, electrophoretic methods, quantita-
tive real-time PCR and complete genome sequencing have
also been used. Data from all such measurements are compiled
into the databases along with updated taxonomy, analytical
details and other relevant information (e.g. chromosome
number) where available.
The first genome size estimates were conducted in the late
vided ?25 years later by Sparrow et al. (8). M.D. Bennett and
colleagues carried on this important effort by publishing a
series of lists for botanical genome size data beginning in
1976. Unfortunately, zoological and mycological counterparts
were not forthcoming for another 30 years, aside from a few
taxon-specific compilations based on a small number of
sources [e.g. (9)] or online lists of limited scope [e.g. Database
of Genome Sizes (http://www.cbs.dtu.dk/databases/DOGS/)
and DBA Mammalian Genome Size Database (http://www.
below, therefore, provide the first truly comprehensive cata-
logues of eukaryotic genome size data and represent a much-
needed resource for members of the genomics community.
PLANT DNA C-VALUES DATABASE
By 1995, four major lists of angiosperm genome sizes had
been published, which together contained data for 2802 spe-
cies (10–13). Although the lists were well used (collectively,
they have been cited >1500 times as on August 2006),
*To whom correspondence should be addressed. Tel: +1 519 824 4120, ext. 58053; Fax: +1 519 767 1656; Email: email@example.com
? 2006 The Author(s).
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Nucleic Acids Research, 2007, Vol. 35, Database issue Published online 7 November 2006
it became increasingly cumbersome to determine whether a
particular species was listed. It was therefore decided to
pool the values into a single database and release them on
the internet. The resulting Angiosperm DNA C-values Data-
base was compiled by M.D. Bennett and I.J. Leitch and was
coded by the Information Services Department at the Royal
Botanic Gardens, Kew; it went live in April 1997. Between
1997 and 2001, two updates of the Angiosperm DNA
C-values Database were released and the Pteridophyte DNA
C-values Database was added.
Based on the evident utility and high usage of these two
databases, efforts were initiated to construct counterparts
for other plant taxa as data became available. Ultimately,
this led to the assembly of the overarching Plant DNA
C-values Database, which was made available through
Java-based queries of a SyBase database and was released
in September 2001. Initially, it contained C-values for 3864
species from the four land plant groups [angiosperms, gym-
nosperms, pteridophytes (comprising
lycophytes) and bryophytes], with available genome size esti-
mates for three algal groups (Rhodophyta, Chlorophyta and
Phaeophyta) added to Release 3.0 in December 2004.
Coverage and features of the Plant DNA
C-values Database, Release 4.0
Release 4.0 of the Plant DNA C-values Database, launched in
October 2005, contains genome sizes for 5150 species includ-
ing 4427 angiosperms, 207 gymnosperms, 87 pteridophytes,
176 bryophytes, and 253 algae compiled from over 550
publications or personal communications (4). Tables 1 and
2 provide a breakdown of absolute and relative coverage of
the major plant groups. Table 1 also gives the minimum,
maximum and mean C-values for each of the major groups
and shows that genome sizes in plants range from 0.01 pg
in some unicellular algae (e.g. Cyanidium caldarium) to
127.4 pg in the tetraploid angiosperm Fritillaria assyriaca.
Most of these data have been acquired through either Feulgen
densitometry of stained root tip squashes (63%) or flow
cytometry of freshly chopped leaf material (31%) using a
range of plant calibration standards. These data are accessible
through a variety of search options that allow users to either
analyze C-value data across different groups of plants
(by clicking on the Plant DNA C-values Database icon), or
by searching within taxonomically specific subsections of
the database (by clicking on the appropriate plant group
The database contains information where available for the
(v) Taxonomic authority.
(vi) Genome size, for which users have the choice of
outputting data in pg or Mb and choosing among 1C, 2C
and 4C DNA content. (In plants, the measurement of
genome size by Feulgen microdensitometry involves
determining the amount of staining in mitotic or meiotic
dividing cells, typically prepared from actively dividing
root tips. The most suitable stage for measurement was
considered to be prophase, as chromatin in metaphase or
telophase is too condensed. As prophase cells contain a
fully replicated genome with a 4C DNA amount it is
these values that were reported in publications. Today,
2C values are usually given.)
(i) Ploidy level.
(ii) Chromosome number.
(iii) Method used to estimate the genome size.
(iv) Information on taxonomic vouchers that may exist for
the species analyzed.
(v) The full bibliographic reference from which the original
data were taken.
At the end of each output, the number of records returned
and summary statistics (minimum, maximum, mean and stan-
dard deviations) of these records are given.
Additional search options further enhance the flexibility of
(i) All versus prime estimates. Where multiple genome size
estimates exist for a given species, users have the choice
of outputting all estimates or only the ‘prime’ estimate.
The availability of additional, non-prime estimates for a
Table 1. Minimum, maximum and mean 1C DNA amounts for each of the major plant groups in the Plant DNA C-values Database (Release 4.0, October 2005),
together with the current level of species representation of C-value data
Min. (pg)Max. (pg) Mean (pg)Number of species
in the Plant DNA
% Representation in
the Plant DNA
aNumbers of species recognized were taken from Ref. (36) for algae; Ref. (37) for bryophytes, lycophytes and monilophytes; Ref. (38) for gymnosperms; and
Ref. (13) for angiosperms.
bIncludes recent data from Ref. (39).
Nucleic Acids Research, 2007, Vol. 35, Database issue D333
species provides the user with an indication of the range
of values that have been reported. In some cases the
differences point to genuine intraspecific variation
(e.g. Zea mays) but in others they highlight discrepancies
attributable to either taxonomic or methodological errors
in genome size estimation (14,15). Recent reviews
covering potential problems in genome size estimation
include those by Greilhuber (15) for Feulgen densito-
metry and Dolezel and Bartos (16) for flow cytometry.
(ii) Wild card searches. An asterisk (*) can be used to
indicate wild cards in searches that include only partial
(iii) From/to searches. To restrict searches based on numeric
data (i.e. chromosome number, ploidy level, DNA
amount), users can set criteria in the ‘from’ and ‘to’
boxes of the query page. As examples, user may use this
? To limit the results of a query to taxa with diploid
chromosome numbers between 18 and 36 (inclusive) by
entering 18 in the ‘from’ box and 36 in the ‘to’ box for
? To limit the search to taxa with only 18 chromosomes, by
placing this number in both the ‘from’ and ‘to’ boxes.
? To select all records having a diploid number of 18 or
greater, by entering 18 in the ‘from’ box and leaving the
‘to’ box empty.
(iv) Sorting results. The results of searches are automatically
sorted by increasing 1C DNA amounts in picograms. To
sort the results by family, genus, species, taxonomic
authority, chromosome number or ploidy level, users can
select their appropriate choice from the drop-down
box under the option ‘Sort by’ at the bottom of the
In addition to searching the entire database in this way,
users can choose to search subsections of the database by
selecting the specific plant group of interest (i.e. angiosperms,
gymnosperms, pteridophytes, bryophytes or algae) from the
homepage. In doing so, the user is provided with additional
options for querying and/or outputting that are of unique
relevance to each taxon:
? Angiosperm group (i.e. monocots, eudicots or basal
? Life cycle type (i.e. annual, biennial and perennial).
? Family. In particular, users have the choice of displaying
either the family name given in the original source of the
genome size data or the assigned family following the
Angiosperm Phylogeny Group (APG) circumscription (17).
? Gymnosperm group [i.e. Cycadales, Ginkgoales, Gnetales,
Pinaceae or Coniferales II (all conifer families excluding
? Sperm flagella number (i.e. multiflagellate or none).
? Pteridophyte group.
? Spore type (i.e. homosporous or heterosporous).
? Sporangium type (i.e. eusporangiate or leptosporangiate).
? Sperm flagella number (i.e. biflagellate or multiflagellate).
? Bryophyte group (i.e. hornwort, liverwort or moss).
? Algal group (i.e. Chlorophyta, Phaeophyta or Rhodophyta).
Users are required to provide an email address to query the
database, which aids in the tracking of usage and in the pro-
tection of intellectual property, but otherwise there are no
restrictions whatsoever on access.
Besides genome size data, the database includes a sum-
mary of the development and release history of the database,
instructions on how to search the database, author contact
information, links to other databases containing genome
size data, and the meeting reports from the international Plant
Genome Size meetings, two of which have been held to date
(in 1997 and 2003) at the Royal Botanic Gardens, Kew.
USAGE OF THE DATABASE
The Plant DNA C-values Database has been widely used,
with >110000 hits from over 55 countries since its
(re-)launch in 2001. On average, the database receives
2000–3000 hits per month with a mean of >60 queries per
day, with each query downloading on average 110 genome
size estimates. As on August 2006, the database has been
cited in ?130 publications since its initial launch as the
Angiosperm DNA C-values Database in 1997.
ANIMAL GENOME SIZE DATABASE
The first large-scale compilation of animal genome size data
was created for an analysis of the correlation between gen-
ome size and erythrocyte size in mammals (18), which was
later expanded for a similar study in birds (19). Recognizing
the severe limitations on the study of animal genome size
variation posed by the lack of access to such data, these
unpublished datasets were expanded to include data from
both vertebrates and invertebrates and were posted online
as the Animal Genome Size Database on January 10, 2001.
Table 2. Representation of gymnosperms and angiosperms at different taxo-
nomic levels in the Plant DNA C-values Database (Release 4.0, October 2005)
Group No. in
of species recognized
D334 Nucleic Acids Research, 2007, Vol. 35, Database issue
This initial release consisted only of flat text tables and
included ?2900 animal species. As data continued to be
added over the ensuing 5 years, the flat table format became
increasingly cumbersome in terms of both updates and for the
growing number of users.
Coverage and features of the Animal Genome
Size Database, Release 2.0
A completely redesigned Release 2.0 of the Animal Genome
Size Database was launched on December 24, 2005, meant to
coincide approximately with the 5-year anniversary of the
database (5). Rather than flat tables, the database has been
converted to a MySQL database accessed through a user-
PHP. Its search tools also employ some AJAX (Asynchronous
in information display. At the time of this writing, the data-
base contains 5677 records from 601 sources, covering 2953
species of vertebrates and 1323 invertebrates. Reported ani-
mal genome sizes range >4000-fold, from ?0.03 pg in the
root-knot nematode Meloidogyne graminicola to ?133 pg in
the marbled lungfish Protopterus aethiopicus. Table 3 pro-
vides more detailed breakdowns of the available data, includ-
ing the ranges, means, and absolute and relative coverage of
the major animal groups.
Animal genome size data are accessed through either
browse or search functions. The browse function allows
users to select an entire group of animals (e.g. mammals,
insects), or to select subsections of the database using pro-
gressive pull-down menus ranging in specificity from phylum
to species. The advanced search feature allows a variety of
queries, including genus, species or common name, as well
as options to select genome sizes equal to, less than/greater
than or between user-specified values. Finally, it is also pos-
sible to retrieve all records generated using a given method,
standard species or cell type.
Data are returned in customizable dynamic tables, with
users specifying the number of records displayed per page
(100, 250, 500 or All). The default results page includes taxo-
nomic details (Phylum/Subphylum, Class, Order, Family,
Genus, Species, common name), C-value in pg, chromosome
number (where available), and the method, cell type and stan-
dard species used in the analysis. The source is given as a
numbered reference with a hotlink to the full citation. Two
courses of action are possible from this results table: (i)
the data can be downloaded and can be viewed using
Excel (with the spreadsheet following the same customized
format as the dynamic tables), or (ii) users can click on spe-
cies names to enter individual species pages. The latter
option provides a detailed record for the species of choice,
including taxonomic and methodological details, the C-
value estimate from the chosen record as well as links to
other available records for the same species, chromosome
number, the full source citation and both internal links
(e.g. to call up data for all members of the genus, family,
order, etc.) and external links [e.g. to NCBI, image searches
and both general (e.g. the Integrated Taxonomic Information
Service) and specific (e.g. FishBase, AmphibiaWeb) taxo-
nomic databases as applicable]. There are no limitations
on browsing or searching the database, but downloading
data to Excel requires users to input a name and valid
email address as a digital signature of a data sharing agree-
ment. A randomized and limited-duration link to the com-
piled spreadsheet is then emailed to the input address as a
means of protecting intellectual property without hindering
access to information.
Release 2.0 of the Animal Genome Size Database also pro-
vides users with up-to-the-minute summary statistics for the
entire database and each major taxonomic group and sub-
group therein, number of species covered, min/max, mean
± standard error, a breakdown of methods, cell types, stan-
dards used for all records in the given group, and a brief
summary of the major patterns and correlates reported to date
for the taxon in question. Other features available to users
include a real-time Flash-based graphical summary of the
total dataset, relevant announcements and a list of the 10 most
recently added records on the main page, as well as a fully
searchable reference list, an FAQ, author contact information,
links to related sites and a genome size discussion forum.
Usage of the database
Traffic at the Animal Genome Size Database has increased
steadily since its launch in 2001, and the main page now
receives 50–100 unique visitors per day. Records regarding
individual queries are not kept, but a typical data download
includes all data for one or more entire groups of animals
(i.e. up to several hundred species for a particular vertebrate
Table 3. Summary of the content of the Animal Genome Size Database as on
August 2006, showing the number of records (i.e. including multiple entries
for the same species), species coverage in absolute numbers and percentage of
described diversity (in parentheses; note that for many invertebrate taxa only a
minorityof specieshave beendescribed),ranges in reportedgenomesizes, and
mean of available genome size data
TaxonRecords No. of
The nearly 3:1 bias in favor of vertebrates and the discrepancy between total
records and species coverage for many groups indicate that available animal
genome size data are derived from an unrepresentative subsample of animal
diversity and that considerable work remains to be performed to correct this.
the database [cf. (3)].
Nucleic Acids Research, 2007, Vol. 35, Database issueD335
group). The database has been cited in ?90 publications since
FUNGAL GENOME SIZE DATABASE
Development and coverage of the Fungal
Genome Size Database, Release 1.0
In a discussion of the plant and animal genome size databases
penned in mid-2004, it was noted that ‘unfortunately, equiva-
lent databases have not yet been compiled for fungi or "pro-
tists", although this would clearly be a worthy project for
experts in those groups to undertake’ (3). On March 20,
2005, a major portion of this gap had been filled with the
launch of the Fungal Genome Size Database (6).
Numerous relative genome sizes (i.e. in arbitrary units) had
been estimated in the late 1980s and early 1990s by
researchers at the University of Regensburg in Germany
using a classical cytophotometry technique, including 287
records for Basidiomycetes (20,21) and 743 for Ascomycetes
(22). Using the same method as well as flow cytometry and
image cytometry, and by employing an internal standard
(Saccharomyces cerevisiae), it became possible to convert
these estimates from arbitrary units into far more informative
absolute genome sizes in Mb (23–25). These converted data
formed the basis of the Fungal Genome Size Database, which
has since been expanded to include 1298 records covering
739 species and 335 genera from 40 orders (Table 4) based
on the taxonomy of the Index Fungorum Partnership (www.
Data from the Fungal Genome Size Database are made
available through queries (PHP, HTML) of a MySQL data-
base. The user and administrative interfaces for the database
are generated by a CMS system developed by Trump Trading
Ltd (TTCMS). The data can be queried by different taxo-
nomic levels (phylum, order, genus, species epithet, variety)
as well as by ploidy level, chromosome number, chromosome
size range, method of genome size estimation, standard speci-
mens used, cell type analyzed and source reference.
Responses to queries are presented as HTML tables, with
detailed information about given records (e.g. herbarium
index, original reference and additional remarks) provided
in a separate pop-up window accessed by clicking on a
given genus or species name in the main table.
Compared with plants and animals, fungi display very
small genomes: ?90% of the available fungal data lie within
the range of 1C ¼ 10–60 Mb, with an average of ?37 Mb and
a median of 28 Mb (Figure 1). The largest fungal genome size
reported to date, that of Scutellospora castanea (Diversispo-
rales) is a mere 795 Mb (0.81 pg) (27), whereas the smallest,
6.5 Mb (0.007 pg) in Pneumocystis carinii f. sp. muris (Pneu-
mocystidales), is far more miniscule than even the most
streamlined animal or non-algal plant genomes (www.
As with plants (and to a far lesser but not insignificant
degree with animals), ploidy level variability is an important
consideration in fungi. Ploidy level (x) has been estimated for
1036 (80%) of the records in the database, and varies from 1x
to 50x. Diploidy (2x) is the single most commonly observed
level (36% of records), although haploidy (1x) is also
common; a level of 50x has been reported for only one
species, Neottiella rutilans (22). Chromosome numbers
have been reported for 81 of the species included in the
database, ranging from n ¼ 3 in Schizosaccharomyces
pombe (Schizosaccharomycetales) (29) to n ¼ 20 in Ustilago
hordei (Ustilaginales) and Batrachochytrium dendrobatidis
In both plants and animals, the majority of variation
among estimates for individual species is attributed to experi-
mental error (3,14,15). In fungi, however, it remains unclear
to what extent apparent intraspecific variation is non-
artifactual as data regarding heteroploidy in this group remain
controversial (20,31,32). There is evidence that interspecific
hybrids may occur in most fungal phyla, with both sexual
and asexual origins evident among the growing list of appar-
ent fungal hybrids (33). Hybrids may be diploid or maintain
Table 4. Number of records in fungal genome size database
Phylum order Number of records
Numerals in boldface indicate records within phyla and numerals in roman
indicate records within orders.
D336 Nucleic Acids Research, 2007, Vol. 35, Database issue
the dikaryotic state, they may undergo karyogamy and nor-
mal meiosis to reconstitute the euploid state, or they may
undergo abnormal meiosis to yield a heteroploid hybrid.
During vegetative growth, chromosomes and chromosome
segments can be lost at random, which would generate legiti-
mate variation in estimated genome sizes.
Electrophoretic karyotyping has shown that variation in
chromosome number and size is a rule rather than an excep-
tion for many, mostly asexual, species (32). This method
indicated that genome size in Pleurotus ostreatus (Agari-
cales) ranges from 20.8 to 35.1 Mb (0.021–0.036 pg, a relat-
ive difference of >60%) and chromosome number ranges
from 6 to 11 (34,35). Using flow cytometry, genome size in
the same species appears to range from 18.5 to 28.7 Mb
(0.019–0.021 pg, a 55% difference) (B. Kullman, unpub-
resulted in a reported range of 24.0–27.53 Mb (0.025–
0.028 pg, a 15% difference) (21). It bears noting, however,
that even small absolute differences among estimates that
might be considered within the margin of measurement
error in plants or animals (e.g. 0.01 pg) translate into substan-
tial relative differences in species with such tiny genomes.
Usage of the database
At this early stage, the database receives ?10–20 unique hits
per day, and at the time of this writing has been visited by
>9000 visitors from around the world.
Taken together, the three eukaryotic genome size databases
represent some of the broadest genetic datasets available,
covering >10 000 species. In relative terms, however, this
comprises a very small minority of eukaryotic diversity. It
is therefore a primary objective of modern genome size
research to greatly increase the coverage of taxa in all three
kingdoms. Perhaps the least well studied of all, however,
are the members of the extremely diverse (and paraphyletic)
assemblage commonly known as ‘protists’. The construction
of a database of genome sizes for this group, and subsequent
efforts to fill the gaps therein, represents an equivalently high
priority. Overall, the release of these databases has proved to
be a boon for the advancement of knowledge about eukary-
otic genome structure and evolution, and has made it possible
for the first time to identify the key areas still in need of
The authors wish to thank their many colleagues and
collaborators for assistance with various aspects of the
construction and maintenance of the genome size databases.
by the Natural Sciences and Engineering Research Council of
Canada in the form of several scholarships, fellowships
and grants to T.R.G. Research leading to the development of
the Fungal Genome Size Database was supported by
Estonian Science Foundation grant number 4989 to B.K. The
Open Access publication charges for this article were waived
by Oxford University Press.
Conflict of interest statement. None declared.
1. Bennett,M.D. and Leitch,I.J. (2005) Genome size evolution in plants.
In Gregory,T.R. (ed.), The Evolution of the Genome. Elsevier, San
Diego, CA, pp. 89–162.
Figure 1. Histogram presenting fungal genome sizes (Mb) in the database. A majority of genome size estimates cover the range from 10 to 60 Mb. The odd
values are labeled with species names.
Nucleic Acids Research, 2007, Vol. 35, Database issueD337
2. Gregory,T.R. (2005) Synergy between sequence and size in large-scale
genomics. Nature Rev. Genet., 6, 699–708.
3. Gregory,T.R. (2005) Genome size evolution in animals.
In Gregory,T.R. (ed.), The Evolution of the Genome. Elsevier, San
Diego, CA, pp. 3–87.
4. Bennett,M.D. and Leitch,I.J. (2005) Plant DNA C-values Database.
5. Gregory,T.R. (2005) Animal Genome Size Database.
6. Kullman,B., Tamm,H. and Kullman,K. (2005) Fungal Genome Size
7. Dolezel,J., Bartos,J., Voglmayr,H. and Greilhuber,J. (2003) Nuclear
DNA content and genome size of trout and human. Cytometry, 51A,
8. Sparrow,A.H., Price,H.J. and Underbink,A.G. (1972) A survey of DNA
content per cell and per chromosome of prokaryotic and eukaryotic
organisms: some evolutionary considerations. In Smith,H.H. (ed.),
Evolution of Genetic Systems. Gordon and Breach, New York,
9. Tiersch,T.R. and Wachtel,S.S. (1991) On the evolution of genome size
of birds. J. Hered., 82, 363–368.
10. Bennett,M.D. and Smith,J.B. (1976) Nuclear DNA amounts in
angiosperms. Philos. Trans. R. Soc. Lond. Ser. B, 274, 227–274.
11. Bennett,M.D., Smith,J.B. and Heslop-Harrison,J.S. (1982) Nuclear
DNA amounts in angiosperms. Proc. R. Soc. Lond. B, 216,
12. Bennett,M.D. and Smith,J.B. (1991) Nuclear DNA amounts in
angiosperms. Philos. Trans. R. Soc. Lond. Ser. B, 334, 309–345.
13. Bennett,M.D. and Leitch,I.J. (1995) Nuclear DNA amounts in
angiosperms. Ann. Bot., 76, 113–176.
14. Greilhuber,J. (1998) Intraspecific variation in genome size: a critical
reassessment. Ann. Bot., 82 (Suppl. A), 27–35.
15. Greilhuber,J. (2005) Intraspecific variation in genome size in
angiosperms—identifying its existence. Ann. Bot., 95, 91–98.
16. Dolezel,J. and Bartos,J. (2005) Plant DNA flow cytometry and
estimation of nuclear genome size. Ann. Bot., 95, 99–110.
17. Angiosperm Phylogeny Group II (2003) An update of the Angiosperm
Phylogeny Group classification for the orders and families of flowering
plants. Bot. J. Linnean Soc., 141, 399–436.
18. Gregory,T.R. (2000) Nucleotypic effects without nuclei: genome size
and erythrocyte size in mammals. Genome, 43, 895–901.
19. Gregory,T.R. (2002) A bird’s-eye view of the C-value enigma: genome
size, cell size, and metabolic rate in the class Aves. Evolution, 56,
20. Bresinsky,A., Wittmann-Meixner,B., Weber,E. and Fischer,M.
(1987) Karyologische Untersuchungen an Pilzen mittels
Fluoreszenzmikroskopie. Z. Mykol., 53, 303–318.
21. Wittmann-Meixner,B. (1989) Polyploidie bei Pilzen. Biblioth. Mycol.,
22. Weber,E. (1992) Untersuchungen zu Fortpflanzung und Ploidie
verschiedener Ascomyceten. Biblioth. Mycol., 140, 1–186.
23. Kullman,B. (2000) Application of flow cytometry for measurement of
nuclear DNA content in fungi. Folia Cryptog. Estonica, 36, 31–46.
24. Kullman,B. (2002) Nuclear DNA content, life cycle and ploidy in two
Neottiella species (Pezizales, Ascomycetes). Persoonia, 18, 103–115.
25. Kullman,B. and Teterin,W. (2006) Estimation of fungal genome size:
comparison of image cytometry and photometric cytometry. Folia
Cryptog. Estonica, 42, 43–56.
26. Index Fungorum Partnership (2004) Index Fungorum. Custodians
CABI Bioscience, CBS and Landcare Research.
27. Hijri,M. and Sanders,J.R. (2005) Low gene copy number shows that
arbuscular mycorrhizal fungi inherit genetically different nuclei.
Nature, 433, 160–163.
28. Birren,B., Fink,G. and Lander,E. (2002) Fungal Genome Initiative
29. Wood,V., Gwilliam,R. and Rajandream,M.A. and 131 other authors.
(2002) The genome sequence of Schizosaccharomyces pombe. Nature,
30. McCluskey,K. and Mills,D. (1990) Identification and characterization
of chromosome length polymorphisms among strains representing
fourteen races of Ustilago hordei. Mol. Plant- Micr. Interact., 3,
31. Tolmsoff,W.J. (1983) Heteroploidy as a mechanism of variability
among fungi. Annu. Rev. Phytopathol., 21, 317–340.
32. Beadle,J., Wright,M., McNeely,L. and Bennett,J.W. (2003)
Electrophoretic karyotype analysis in fungi. Adv. Appl. Microbiol., 53,
33. Schardl,C.L. and Craven,K.D. (2003) Interspecific hybridization in
plant-associated fungi and oomycetes: a review. Mol. Ecol., 12,
34. Sagawa,I. and Nagata,Y. (1992) Analysis of chromosomal DNA of
mushrooms in genus Pleurotus by pulsed field gel electrophoresis.
J. Gen. Appl. Microbiol., 38, 47–52.
35. Ramı ´rez,L., Larraya,L.M. and Pisabarro,A.G. (2000) Molecular tools
for breeding basidiomycetes. Int. Microbiol., 3, 147–152.
36. Kapraun,D.F. (2005) Nuclear DNA content estimates in multicellular
eukaryotic green, red and brown algae: phylogenetic considerations.
Ann. Bot., 95, 7–44.
37. Qiu,Y.L. and Palmer,J.D. (1999) Phylogeny of early land plants:
insights from genes and genomes. Trends. Plant Sci., 4, 26–30.
38. Murray,B.G., Leitch,I.J. and Bennett,M.D. (2001) Gymnosperm DNA
39. Greilhuber,J., Borsch,T., Mu ¨ller,K., Worberg,A., Porembski,S. and
Barthlott,W. (2006) Smallest angiosperm genomes found in
Lentibulariaceae with chromosomes of bacterial size. Plant Biol.,
D338Nucleic Acids Research, 2007, Vol. 35, Database issue