ArticlePDF Available

Biological identification through DNA barcodes

Authors:

Abstract and Figures

Although much biological research depends upon species diagnoses, taxonomic expertise is collapsing. We are convinced that the sole prospect for a sustainable identification capability lies in the construction of systems that employ DNA sequences as taxon 'barcodes'. We establish that the mitochondrial gene cytochrome c oxidase I (COI) can serve as the core of a global bioidentification system for animals. First, we demonstrate that COI profiles, derived from the low-density sampling of higher taxonomic categories, ordinarily assign newly analysed taxa to the appropriate phylum or order. Second, we demonstrate that species-level assignments can be obtained by creating comprehensive COI profiles. A model COI profile, based upon the analysis of a single individual from each of 200 closely allied species of lepidopterans, was 100% successful in correctly identifying subsequent specimens. When fully developed, a COI identification system will provide a reliable, cost-effective and accessible solution to the current problem of species identification. Its assembly will also generate important new insights into the diversification of life and the rules of molecular evolution.
Content may be subject to copyright.
Received 29 July 2002
Accepted 30 September 2002
Published onl ine 8 January 2003
Biological identifications through DNA barcodes
Paul D. N. Hebert
*
, Alina Cywinska, Shelley L. Ball
and Jeremy R. deWaard
Department of Zoology, University of Guelph, Guelph, Ontario N1G 2W1, Canada
Although much biological research depends upon species diagnoses, taxonomic expertise is collapsing.
We are convinced that the sole prospect for a sustainable identification capability lies in the construction
of systems that employ DNA sequences as taxon ‘barcodes’. We establish that the mitochondrial gene
cytochrome coxidase I (COI) can serve as the core of a global bioidentification system for animals. First,
we demonstrate that COI profiles, derived from the low-density sampling of higher taxonomic categories,
ordinarily assign newly analysed taxa to the appropriate phylum or order. Second, we demonstrate that
species-level assignments can be obtained by creating comprehensive COI profiles. A model COI profile,
based upon the analysis of a single individual from each of 200 closely allied species of lepidopterans, was
100% succes sful in correctly identifying subsequ ent specimens. When fully develope d, a COI identification
system will provide a reliable, cost-effective and accessible solution to the current problem of species
identification. Its assembly will also generate important new insights into the diversification of life and
the rules of molecular evolution.
Keywords: molecular taxonomy; mitochondrial DNA; animals; insects; sequence diversity; evolution
1. INTRODUCTIO N
The diversity of life underpins all biological studies, but
it is also a harsh burden. Whereas physicists deal with a
cosmos assembled from 12 fundamental particles, biol-
ogists confront a planet populated by millions of species.
Their discrimination is no easy task. In fact, since few
taxonomists can critically identify more than 0.01% of the
estimated 10–15 million species (Hammond 1992; Hawk-
sworth & Kalin-Arroyo 1995), a community of 15 000
taxonomists will be required, in perpetuity, to identify life
if our reliance on morphological diagnosis is to be sus-
tained. Moreover, this approach to the task of routine
species identification has four significant limitations. First,
both phenotypic plasticity and genetic variability in the
characters employed for species recognition can lead to
incorrect identifications. Second, this approach overlooks
morphologically cryptic taxa, which are common in many
groups (Knowlton 1993; Jarman & Elliott 2000). Third,
since morphological keys are often effective only for a
particular life stage or gender, many individuals cannot be
identified. Finally, although modern interactive versions
represent a major advance, the use of keys often demands
such a high level of expertise that misdiagnoses are com-
mon.
The limitations inherent in morphology-based identifi-
cation systems and the dwindling pool of taxonomists sig-
nal the need for a new approach to taxon recognition.
Microgenomic identification systems, which permit life’s
discrimination through the analysis of a small segment of
the genome, represent one extremely promising approach
to the diagnosis of biological diversity. This concept has
already gained broad acceptance among those working
with the least morphologically tractable groups, such as
viruses, bacteria and protists (Nanney 1982; Pace 1997;
*
Author for correspondence (phebert@uoguelph.ca).
Proc. R. Soc . Lond. B (2003) 270, 313– 321 313 Ó2003 The Royal Society
DOI 10.1098/rspb.2002.2218
Allander et al. 2001; Hamels et al. 2001). However, the
problems inherent in morphological taxonomy are general
enough to merit the extension of this approach to all life.
In fact, there are a growing number of cases in which
DNA-based identification systems have been applied to
higher organisms (Brown et al. 1999; Bucklin et al. 1999;
Trewick 2000; Vincent et al. 2000).
Genomic approaches to taxon diagnosis exploit diversity
among DNA sequences to identify organisms (Kurtzman
1994; Wilson 1995). In a very real sense, these sequences
can be viewed as genetic ‘barcodes’ that are embedded in
every cell. When one considers the discrimination of life’s
diversity from a combinatorial perspective, it is a modest
problem. The Universal Product Codes, used to identify
retail products, employ 10 alternate numerals at 11 pos-
itions to generate 100 billion unique identifiers. Genomic
barcodes have only four alternate nucleotides at each pos-
ition, but the string of sites available for inspection is huge.
The survey of just 15 of these nucleotide positions creates
the possibility of 4
1 5
(1 billion) codes, 100 times the num-
ber that would be required to discriminate life if each
taxon was uniquely branded. However, the survey of
nucleotide diversity needs to be more comprehensive
because functional constraints hold some nucleotide pos-
itions constant and intraspecific diversity exists at others.
The impact of functional constraints can be reduced by
focusing on a protein-coding gene, given that most shifts
at the third nucleotide position of codons are weakly con-
strained by selection because of their four-fold degener-
acy. Hence, by examining any stretch of 45 nucleotides,
one gains access to 15 sites weakly affected by selection
and, therefore, 1 billion possible identification labels. In
practice, there is no need to constrain analysis to such
short stretches of DNA because sequence information is
easily obtained for DNA fragments hundreds of base pairs
(bp) long. This ability to inspect longer sequences is sig-
nificant, given two other biological considerations. First,
nucleotide composition at third-position sites is often
314 P. D. N. Hebert and others DNA-based identi cations
strongly biased (A–T in arthropods, C–G in chordates),
reducing information content. However, even if the A–T
or C–G proportion reached 1, the inspection of just 90 bp
would recover the prospect of 1 billion alternatives
(2
3 0
=4
1 5
). The second constraint derives from the limited
use of this potential information capacity, since most
nucleotide positions are constant in comparisons of closely
related species. However, given a modest rate (e.g. 2%
per Myr) of sequence change, one expects 12 diagnostic
nucleotide differences in a 600 bp comparison of species
with just a million year history of reproductive isolation.
As both the fossil record and prior molecular analyses sug-
gest that most species persist for millions of years, the like-
lihood of taxon diagnosis is high. However, there is no
simple formula that can predict the length of sequence
that must be analysed to ensure species diagnosis, because
rates of molecular evolution vary between different seg-
ments of the genome and across taxa. Obviously, the
analysis of rapidly evolving gene regions or taxa will aid
the diagnosis of lineages with brief histories of repro-
ductive isolation, while the reverse will be true for rate-
decelerated genes or species.
Although there has never been an effort to implement
a microgenomic identification system on a large scale,
enough work has been done to indicate key design
elements. It is clear that the mitochondrial genome of ani-
mals is a better target for analysis than the nuclear genome
because of its lack of introns, its limited exposure to
recombination and its haploid mode of inheritance
(Saccone et al. 1999). Robust primers also enable the rou-
tine recovery of specific segments of the mitochondrial
genome (Folmer et al. 1994; Simmons & Weller 2001).
Past phylogenetic work has often focused on mitochon-
drial genes encoding ribosomal (12S, 16S) DNA, but their
use in broad taxonomic analyses is constrained by the
prevalence of insertions and deletions (indels) that greatly
complicate sequence alignments (Doyle & Gaut 2000).
The 13 protein-coding genes in the animal mitochondrial
genome are better targets because indels are rare since
most lead to a shift in the reading frame. There is no com-
pelling a priori reason to focus analysis on a specific gene,
but the cytochrome coxidase I gene (COI) does have two
important advantages. First, the universal primers for this
gene are very robust, enabling recovery of its 59end from
representatives of most, if not all, animal phyla (Folmer
et al. 1994; Zhang & Hewitt 1997). Second, COI appears
to possess a greater range of phylogenetic signal than any
other mitochondrial gene. In common with other protein-
coding genes, its third-position nucleotides show a high
incidence of base substitutions, leading to a rate of mol-
ecular evolution that is about three times greater than that
of 12S or 16S rDNA (Knowlton & Weigt 1998). In fact,
the evolution of this gene is rapid enough to allow the
discrimination of not only closely allied species, but also
phylogeographic groups within a single species (Cox &
Hebert 2001; Wares & Cunningham 2001). Although
COI may be matched by other mitochondrial genes in
resolving such cases of recent divergence, this gene is more
likely to provide deeper phylogenetic insights than alterna-
tives such as cytochrome b(Simmons & Weller 2001)
because changes in its amino-acid sequence occur more
slowly than those in this, or any other, mitochondrial gene
(Lynch & Jarrell 1993). As a result, by examining amino-
Proc. R. Soc. L ond. B (2003)
acid substitutions, it may be possible to assign any
unidentified organism to a higher taxonomic group (e.g.
phylum, order), before examining nucleotide substitutions
to determine its species identity.
This study evaluates the potential of COI as a taxo-
nomic tool. We first created a COI ‘profile’ for the seven
most diverse animal phyla, based on the analysis of 100
representative species, and subsequently showed that this
baseline information assigned 96% of newly analysed taxa
to their proper phylum. We then examined the class Hexa-
poda, selected because it represents the greatest concen-
tration of biodiversity on the planet (Novotny et al. 2002).
We created a COI ‘profile’ for eight of the most diverse
orders of insects, based on a single representative from
each of 100 different families, and showed that this ‘pro-
file’ assigned each of 50 newly analysed taxa to its correct
order. Finally, we tested the ability of COI sequences to
correctly identify species of lepidopterans, a group tar-
geted for analysis because sequence divergences are low
among families in this order. As such, the lepidopterans
provide a challenging case for species diagnosis, especially
since this is one of the most speciose orders of insects.
This test, which involved creating a COI ‘profile for 200
closely allied species and subsequently using it to assign
150 newly analysed individuals to species, was 100% suc-
cessful in identification.
2. MATERIAL AND METHODS
(a)Sequences
Approximately one-quarter of the COI sequences (172 out o f
655) used in this study we re obtained from GenBank. The rest
were obtained by preparing a 30 ml total DNA extract from s mall
tissue samples using the Isoquick (Orca Research Inc,
Bothell, WA, 1997) protocol. The primer pair LCO1490
(59-GGTCAACAAATCATAAAGATATTGG-39) and HCO2198
(59-TAAACTTCAGGGTGACCAAAAAATCA-39) was sub-
sequently used to amplify a 658 bp fragment of the COI gene
(Folmer et al. 1994). Each PCR contained 5 ml of 10´PCR
buffer, pH 8.3 (10 mM of Tris–HCl, pH 8 .3; 1.5 mM of MgCl
2
;
and 50 mM of KCl; 0.01% NP-40), 35 ml of distilled water,
200 mM of each dNTP, 1 unit of Taq polymer ase, 0.3 mM of
each primer and 1–4 ml of DNA tem plate. The PCR thermal
regime consisted of one cycle of 1 min at 94 °C; five cycles of
1 min at 94 °C, 1.5 min at 45 °C and 1.5 min at 72 °C; 35 cycles
of 1 min at 94 °C, 1.5 min at 50 °C and 1 min at 72 °C and a
final cycle of 5 min at 72 °C. Each PCR product was sub-
sequently gel purified usi ng the Qiaex II kit (Qiagen) and
sequenced in on e direction on an ABI 377 automated sequencer
(Applied Biosystems) using the Big Dye v. 3 sequencing kit. All
sequences obtained in this study have been submitted to G en-
Bank; their accession numbers are provide d in Electronic
Appendices A–C, available on The Royal Society’s Publications
Web site.
(b)COI profiles
We created three COI profiles: one for the seven dominant
phyla of animals, another for eight of the largest orders of insects
and the last for 20 0 closely allied species of lepidopterans. These
profiles were designed to provide an overview of COI diversity
within each taxonomic assemblage and were subsequently used
as the basis for identifications to the phylum, ordinal or species
level by determining the sequence congruence between each
‘unknown taxon and the species included in a particular profile.
DNA-based identi cations P. D. N. Hebert and others 315
The phylum profile inc luded 100 COI sequences, all obtained
from Ge nBank (see electronic Appendix A available on The
Royal Society’s Publications Web site). To ensure broad taxo-
nomic coverage, each sequ ence was derived from a different
family and re presentatives were inc luded from all available
classes. Ten sequences were obtained for each o f the five
phyla (Annelida, Chordata, Echinodermata, Nematoda,
Platyhelminthes) that i nclude 5000–50 000 species, while 25
sequences were collected for each o f the phyla (Arthropoda,
Mollusca) with more than 100 000 species.
The ordinal profile was created by obtaining a COI sequence
from a single representative of each of 100 insect families (see
electronic Appendix B available on The Roya l Society’s Publi-
cations Web site). The four most diverse orders (more than
100 000 described species) of insects were selected for analysis,
together wi th four additional orders chosen randomly from
among the 15 insect orders (Gaston & Hudson 1994) with
medium diversity (1000–15 000 described spe cies). Between ten
and 25 families were e xamined for each of the four most diverse
orders (Coleoptera, Diptera, Hymenoptera, Lepidoptera), while
four to ten fami lies were examined for the other orders
(Blattaria, Ephemeroptera, Orthoptera, Plecoptera).
The species profile was based upon COI data for a single indi-
vidual from each of the 200 commonest lepidopteran species
from a site near G uelph, Ontario, Canada (electronic Appendix
C available on The Royal Society’s Publications Web si te). This
profile examined members of just three allied superfamilies
(Geometroidea, Noctuoidea, Sphingoidea) to determ ine the
lower limits of COI divergence in an assemblage of closely
related species. The Noctuoidea included members of three
families (Arctiidae, Noctuidae, Notodontidae) while the others
included representatives from just a single family each
(Geometridae and Sphingidae, respectively).
(c)Test taxa
Additional sequences were collected to test the ability of each
profile to assign newly analysed species to a taxonomic category
(electronic Appendices A–C). COI sequences were obtained
from 55 ‘test’ taxa to assess the success of the phylum profile in
assigning newly analysed species to a phylum. These ‘test’ taxa
included five representatives from each of the fi ve ‘small’ phyla,
and 15 representatives from both the Mollusca and the Arthro-
poda. When possible, the ‘test’ taxa belonged to families that
were not included in the phylum profile. A similar appro ach was
employed to test the ability of the ordinal profile to classify
newly analysed insects to an order. F ifty ne w taxa we re exam-
ined, including between one and five representatives from eac h
small order and five to ten representatives from each of the four
large orders. When possible, the ‘test’ taxa belonged to families
or genera that were not included in the ordinal profil e. A test
of the species profile required a slightly differ ent approach, as
identifications were only possible for species re presented in it. As
a result, sequences were obtained from another 150 individuals
belonging to the speci es included in this profile.
(d)Data analysis
Sequences were aligned in the SeqApp 1.9 sequence editor.
They were subsequ ently reduced to 669 bp for the phylum
analysis, 624 bp for the ordinal analysis and 617 bp for the
species-level analysis. Analyses at the ordinal and phylum levels
examined amino-acid divergences, using Poisson corrected p-
distances to reduce the impacts of homopl asy. For the species-
level analysis, nucleotid e-sequence divergences were calculated
Proc. R. Soc. L ond. B (2003)
using the Kimura-two-parameter (K2P) model, the best metric
when d istances are low (Nei & Kumar 2000) as in this study.
Neighbour-joining ( NJ) analysis, i mplemented in MEGA2.1
(Kumar et al. 2001), was employed to both exam ine relation-
ships among taxa in the profiles and for the subsequent classi-
fication of ‘test’ taxa because of its strong track record in the
analysis of large species assemblages (Kumar & Gadagkar 2000).
This approach has the additional advantage of generating results
much more quickly than alternatives. The NJ profiles for both
the orders and the phyla possessed 100 terminal nodes, each
representing a species from a di fferent family, while the specie s
NJ profile had 200 nodes, each representing a d ifferent lepidop-
teran species. A m ember of the primitive insect order Thysanura
(family Lepismatidae) was used as the outgroup for the ins ect
profile, while single members of three primitive lepidopteran
families were employed as the outgroup for the species profile.
Each of the three (phylum, order, species) NJ profiles was
subsequently used as a c lassification engi ne, by re-running the
analysis with the r epeated add ition of a single ‘test’ taxon to the
dataset. Following each analysis, the ‘test species was assigned
membership of the same taxonomic group as its nearest nei gh-
bouring node . For example, in the ord inal analysis, a ‘test taxon
was identified as a member of the order Lepidoptera if it
grouped most c losely with any of the 24 lepidopteran families
included in the profile. The success of classification was quant-
ified for both the phylum and the ordinal analyses by determining
the proportion of ‘test’ taxa assigned to the proper phylum/order.
In the case of species, a stricter criterion was employed. A ‘test’
taxon was recognized as being correctly identified only if its
sequence grouped most closely with the single representative of
its spe cies in the profile.
Multidimensional scaling (MDS), implemented in S ystat
8.0, was employed to provide a graphical summary of the spec-
ies-level results because of the very large number of taxa. MDS
explores similarity relationships in Euclidean space and has the
advantage of permitting genetically intermediate taxa to remain
spatially intermediate, rather than forcing them to cluster into a
pseudogroup as in hierarchical methods (Lessa 1990). In the
present case, a similarity matrix was constructed by treating
every positi on in the alignment as a separate character and
ambiguous nucleotides as m issing characters. The sequence
information was coded using dummy variables (A =1, G =2,
C=3, T =4 ). However, as noted earlier, a NJ profile was also
constructed for the lepidopteran sequences using K2P distances
and this is provided in electronic Appendix D available on The
Royal Society’s P ublications Web site.
3. RESULTS
(a)Taxon profiles
Each of the 100 species included in the phylum and
ordinal profiles possessed a different amino-acid sequence
at COI. The phylum profile showed good resolution of
the major taxonomic groups (figure 1). Monophyletic
assemblages were recovered for three phyla (Annelida,
Echinodermata, Platyhelminthes) and the chordate lin-
eages formed a cohesive group. Members of the Nema-
toda were separated into three groups, but each
corresponded to one of the three subclasses that comprise
this phylum. Twenty-three out of the 25 arthropods for-
med a monophyletic group, but the sole representatives of
two crustacean classes (Cephalocarida, Maxillopoda) fell
outside this group. Twelve out of the 25 molluscan lin-
316 P. D. N. Hebert and others DNA-based identi cations
AR 1
AR 2
AR 3
AR 4
AR 5
AR 6
AR 7 AR 8
AR 9 AR 10
AR 11 AR 12
AR 13
AR 14
AR 15
AR 16
AR 17 AR 18
AR 19
AR 20
AR 21
AR 22
AR 23 CH 1
CH 2 CH 3
CH 4
CH 5
CH 6
CH 7 CH 8
CH 9
CH 10 EC 1
EC 2 EC 3 EC 4
EC 5
EC 6
EC 7
EC 8
EC 9
EC 10 ML 1
ML 2
ML 3
ML 4 ML 5
AN 1
AN 2 AN 3
AN 4
AN 5
AN 6
AN 7 AN 8
AN 9
AN 10
ML 6
ML 7 ML 8
ML 9
ML 10 ML 11
ML 12
ML 13
ML 14
ML 15
ML 16
ML 17 24
ML 1 8
ML 19
ML 20
25 21
NE 1
NE 2
NE 3
NE 4
NE 5
NE 6
NE 7
ML 22
ML 23
ML 24
ML 25
8 NE 9 NE 10
PL 1
PL 2 PL 3
PL 4
PL 5 PL 6
PL 7
PL 8
PL 9
PL 10
Arthropo da
Chord ata
Echinod ermata
Mollus ca
Cephalopoda)
Annelid a
Mollusca
AR 2
Mollus ca
Pulmon ata)
AR 2 ML 2
Nematoda
Rhabdi toidea)
Mollusc a
Bivalvi a)
NE 8 Nematoda
Spirurida )
Platyhelmi nthes
Figure 1. NJ analysis of Poisson corrected p-distances based on the analysis of 223 amino acids of the COI gene in 100 taxa
belonging to seven animal phyla. The taxa in the grey boxes represent outliers. AR 24 and AR 25 are the sole representatives
of the arthropod classes Cephalocarida and Maxillopoda, respectively. ML 21 is a member of the molluscan class Bivalvia,
while NE 8 is the sole member of the nematode subclass Enoplia. Scale bar, 0.1.
eages formed a monophyletic assemblage allied to the
annelids, but the others were separated into groups that
showed marked genetic divergence. One group consisted
solely of cephalopods, a second was largely pulmonates
and the rest were bivalves. It is worth emphasizing that
these outlying COI sequences always showed considerable
amino-acid divergence from sequences possessed by other
taxonomic groups. As such, the rate acceleration in these
lineages generated novel COI amino-acid sequences
Proc. R. Soc. L ond. B (2003)
rather than secondary convergence on the amino-acid
arrays of other groups.
The ordinal profile showed high cohesion of taxonomic
groups with seven out of the eight orders forming mono-
phyletic assemblages (figure 2). The sole exception was
the Coleoptera whose members were partitioned into
three groups. Two of these groups included 21 families
belonging to the very diverse suborder Polyphaga, while
the other group included four families belonging to the
DNA-based identi cations P. D. N. Hebert and others 317
Geometrid ae
Arctiidae
Nolidae
Notodonti dae
Noctuidae
Lymant riidae
Lasiocam pidae
Epiplem idae
Bombycida e
Drepanidae
Sphingid ae
Nymphalida e
Lycaen idae
Saturniida e
Thyatiridae
Riodinida e
Ceratocampidae
Pyralidae
Satyridae
Cossidae
Pieridae
Papilionida e
Limacodi dae
Hepialidae
Pedilidae
Cerambyci dae
Melandryidae
Tenebrio nidae
Chrysome lidae
Melyridae
Lampyri dae
Cantharida e
Curculi onidae
Scolytida e
Coccinellid ae
Sphecidae
Gasterup tiidae
Ichneum onidae
Chrysidid ae
Formicidae
Tiphiidae
Braconidae
Vespidae
Halictid ae
Apidae
Anthoph oridae
Silphid ae
Dermestida e
Scarabaei dae
Lucanidae
Staphylinid ae
Elateridae
Buprestid ae
Heteroce ridae
Elmidae
Psephenida e
Cicinde llidae
Carabidae
Dytiscidae
Gyrinidae
Ptychopteri dae
Simuliida e
Bombyliid ae
Tabanidae
Pelecorh ynchidae
Athericid ae
Rhagionida e
Syrphidae
Tipulidae
Culicidae
Asilidae
Tephritidae
Sphaeroce ridae
Drosophilid ae
Calliphorid ae
Gryllidae
Tettigoniid ae
Gryllotal pidae
Eumastaci dae
Acridida e
Lecutrida e
Capniida e
Notonemouridae
Peltoperli dae
Perlidae
Chlorope rlidae
Blatell idae
Cryptocerc idae
Blattid ae
Blaberi dae
Baetiscida e
Caenidae
Ameletida e
Ephemere llidae
Ephemer idae
Leptophle biidae
Siphlonurid ae
Metretopod idae
Heptage niidae
Isonychi idae
Thysanura
0.05
Ephemeroptera
Blattaria
Plecoptera
Orthoptera
Diptera
Coleoptera
Adephaga)
Coleoptera
Polyphaga )
Hymenoptera
Coleoptera
Polyphaga )
Lepidoptera
Figure 2. NJ analysis of Poisson corrected p-distances based
on the analysis of 208 amino acids in 100 taxa belonging to
eight insect orders. Scale bar, 0.05.
Proc. R. Soc. L ond. B (2003)
Table 1. Mean Poisson corrected p-distances (d) for 208
amino acids of COI in 100 insect families belonging to eight
orders.
(nindicates the number of families analysed in each order. G/C
content is also reported.)
order n d s.e. G/C (%)
Hymenoptera 11 0.320 0.028 27.7
Coleoptera 25 0.125 0.015 35.6
Orthoptera 5 0.119 0.019 35.4
Blattaria 4 0.076 0.014 35.7
Diptera 15 0.055 0.011 34.1
Lepidoptera 24 0.054 0.009 31.0
Ephemeroptera 10 0.036 0.008 40.5
Plecoptera 6 0.031 0.008 39.5
_1 0 1 2
dimension 1
_2
_1
0
1
2
dimension 2
Figure 3. Multidimensional scaling of Euclidian distances
among the COI genes from 200 lepidopteran species
belonging to three superfamilies: Geometroidea (stars);
Sphingoidea (triangles) and Noctuoidea (circles).
suborder Adephaga. Among the four major orders, species
of Diptera and Lepidoptera showed much less variation
in their amino-acid sequences than did the Hymenoptera,
while the Coleoptera showed an intermediate level of
divergence (table 1).
Each of the 200 lepidopterans included in the species
profile possessed a distinct COI sequence. Moreover, a
MDS plot showed that species belonging to each of the
three superfamilies fell into distinct clusters (figure 3), sig-
nalling their genetic divergence. A detailed inspection of
the NJ tree (electronic Appendix D) revealed further evi-
dence for the clustering of taxonomically allied species.
For example, 23 genera were represented by two species,
and these formed monophyletic pairs in 18 cases. Simi-
larly, five out of the six genera represented by three species
formed monophyletic assemblages.
318 P. D. N. Hebert and others DNA-based identi cations
Table 2. Percentage success in classifying species to member-
ship of a particular taxonomic group based upon sequence vari-
ation at COI.
(nindicates the number of taxa that were classified using each
taxon ‘profile’.)
taxon target group n% success
kingdom Animalia 7 phyla 55 96.4
class Hexapoda 8 orders 50 100
order Lepidoptera 200 species 150 100
(b)Testing taxonomic assignments
Fifty-three out of the 55 ‘test’ species (96.4%) were
assigned to the correct phylum in the analyses at this level
(table 2). The exceptions were a polychaete annelid that
grouped most closely with a mollusc and a bivalve that
grouped with one of the arthropod outliers. However, in
both cases, there was substantial sequence divergence
(13% and 25%, respectively) between the test taxon and
the lineage in the profile that was most similar to it. Identi-
fication success at the ordinal level was 100% as all 50
insect species were assigned to the correct order. More-
over, when a ‘test’ species belonged to a family rep-
resented in the ordinal profile, it typically grouped most
closely with it. Identification success at the species level
was also 100%, as each of the 150 test’ individuals clus-
tered most closely with its conspecific in the profile. The
sequences in the species profile were subsequently merged
with those from the ‘test’ taxa to allow a more detailed
examination of the factors enabling successful classi-
fication (electronic Appendix E available on The Royal
Society’s Publications Web site). MDS analysis (figure 4)
showed that ‘test’ taxa were always either genetically
identical to or most closely associated with their conspe-
cific in the profile. Examination of the genetic distance
matrix quantified this fact, showing that divergences
between conspecific individuals were always small, the
family values averaging 0.25% (table 3). By contrast,
sequence divergences between species were much greater,
averaging 6.8% for congeneric taxa and higher for more
distantly related taxa (table 3). A few species pairs showed
lower values, but only four out of the 19 900 pairwise
comparisons showed divergences that were less than 3%.
Figure 4ashows one of these cases, involving two species
of Hypoprepia, but, even in these situations, there were no
shared sequences between taxa.
4. DISCUSSION
This study establishes the feasibility of developing a
COI-based identification system for animals-at-large.
PCR products were recovered from all species and there
was no evidence of the nuclear pseudogenes that have
complicated some studies employing degenerate COI pri-
mers (Williams & Knowlton 2001). Moreover, the align-
ment of COI sequences was straightforward, as indels
were uncommon, reinforcing the results of earlier work
showing the rarity of indels in this gene (Mardulyn &
Whitfield 1999). Aside from their ease of acquisition and
alignment, the COI sequences possessed, as expected, a
high level of diversity.
Proc. R. Soc. L ond. B (2003)
We demonstrated that differences in COI amino-acid
sequences were sufficient to enable the reliable assignment
of organisms to higher taxonomic categories. It is worth
emphasizing that most newly analysed taxa were placed in
the correct order or phylum despite the fact that our pro-
files were based on a tiny fraction of the member species.
For example, our ordinal profile, which was based on just
0.002% of the total species in these orders, led to 100%
identification success. The two misidentifications at the
phylum level were undoubtedly a consequence of the lim-
ited size and diversity of our phylum profile. The mis-
placed polychaete belonged to an order that was not in
the profile, while the misidentified mollusc belonged to a
subclass that was represented in the profile by just a single
species. Such misidentifications would not occur in pro-
files that more thoroughly surveyed COI diversity among
members of the target assemblage. The general success of
COI in recognizing relationships among taxa in these cases
is important because it signals that character convergence
and horizontal gene transfer (i.e. via retroviruses) have not
disrupted the recovery of expected taxon affinities. More-
over, it establishes that the information content of COI is
sufficient to enable the placement of organisms in the
deepest taxonomic ranks.
The gold standard for any taxonomic system is its ability
to deliver accurate species identifications. Our COI spec-
ies profile was 100% successful in identifying lepidopteran
species, and we expect similar results in other groups,
since the Lepidoptera are one of the most taxonomically
diverse orders of animals and they show low sequence
divergences. There is also reason to expect successful
diagnosis at other locales, as the species richness of lepi-
dopterans at the study site exceeds that which will be
encountered in regional surveys of most animal orders.
Higher diversities will be encountered for some orders in
tropical settings (Godfray et al. 1999), but COI diagnoses
should not fail at these sites unless species are unusually
young.
COI-based identification systems can also aid the initial
delineation of species. For example, inspection of the gen-
etic distance matrix for lepidopterans indicated that diver-
gence values between species are ordinarily greater than
3%. In fact, when this value was employed as a threshold
for species diagnosis, it led to the recognition of 196 out
of the 200 (98%) species recognized through prior mor-
phological study. The exceptions were four congeneric
species pairs that were genetically distinct but showed low
(0.6–2.0%) divergences, suggesting their recent origin.
The general ease of species diagnosis reveals one of the
great values of a DNA-based approach to identification.
Newly encountered species will ordinarily signal their
presence by their genetic divergence from known mem-
bers of the assemblage.
The prospect of using a standard COI threshold to
guide species diagnosis in situations where prior taxo-
nomic work has been limited is appealing. It is, however,
important to validate this approach by determining the
thresholds that distinguish species in other geographical
regions and taxonomic groups. Thresholds will parti-
cularly need to be established for groups with differences
in traits, such as generation length or dispersal regime,
that are likely to alter rates of molecular evolution or the
extent of population subdivision. However, differences in
DNA-based identi cations P. D. N. Hebert and others 319
_1 0 1 2
dimension 1
_2
_1
0
1
2
dimension 2
Ctenucha virginica
Cisseps fulvicollis
Hypoprepia miniata
Hypoprepia fucosa
Grammia virguncula
Spilosoma congrua
Grammia virgo
Grammia arge Spilosoma virginica
Hyphantria cunea
Haploa confusa
Halysidota tessellaris
Euchaetes egle
Lophocampa maculata
Cycnia tenera
Cycnia oregonensis
Pyrrharctia isabella
Phragmatobia fuliginosa
Furcula cinerea
Furcula modesta
Furcula borealis
Notodonta simplaria
Peridea basitriens
Datana ministra
Clostera albosigma
Clostera apicalis
Ellida caniplaga
Gluphisia lintneri
Symmerista leucitys
Odontosia elegans
Nadata gibbosa
Pheosia rimosa
Heterocampa biundata
Heterocampa guttivitta
Schizura unicornis
Lochmaeus manteo
Schizura badia
Heterocampa umbrata
_1 0 1 2
dimension 1
_2
_1
0
1
2
dimension 2
Pachysphinx modesta
Smerinthus cerisyi
Smerinthus jamaicensis
Sphinx canadensis
Paonias myops Pao nias excaecatus
Sphinx gordius
Sphecodina abbottii
Lapara bombycoides
Ceratomia undulosa
Deidamia inscripta
_1 0 1 2
dimension 1
_2
_1
0
1
2
dimension 2
a)b)
c)
Figure 4. Multidimensional scaling of Euclidian distances among the COI genes from (a) 18 species of Arctiidae, (b) 20
species of Notodontidae and (c) 11 species of Sphingidae. Circles identify the single representatives of each species included in
the profile, while the crosses mark the position of ‘test’ individuals. Profile and test individuals from the same species always
grouped together.
thresholds may be smaller than might be expected. For
example, different species of vertebrates ordinarily show
more than 2% sequence divergence at cytochrome b
(Avise & Walker 1999), a value close to the 3% COI thres-
hold adopted for lepidopterans in this study.
The likely applicability of a COI identification system
to new animal groups and geographical settings suggests
the feasibility of creating an identification system for
animals-at-large. Certainly, existing primers enable recov-
ery of this gene from most, if not all, animal species and
its sequences are divergent enough to enable recognition
of all but the youngest species. It is, of course, impossible
for any mitochondrially based identification system to
resolve fully the complexity of life. Where species bound-
Proc. R. Soc. L ond. B (2003)
aries are blurred by hybridization or introgression, sup-
plemental analyses of one or more nuclear genes will be
required. Similarly, when species have arisen through
polyploidization, determinations of genome size may be
needed. While protocols will be required to deal with such
complications, a COI-based identification system will
undoubtedly provide taxonomic resolution that exceeds
that which can be achieved through morphological stud-
ies. Moreover, the generation of COI profiles will provide
a partial solution to the problem of the thinning ranks of
morphological taxonomists by enabling a crystallization of
their knowledge before they leave the field. Also, since
COI sequences can be obtained from museum specimens
without their destruction, it will be possible to regain taxo-
320 P. D. N. Hebert and others DNA-based identi cations
Table 3. Percentage nucleotide sequence divergence (K2P distances) at COI between members of five lepidopteran families at
three levels of taxonomic affinity.
(At the species level, nindicates the number of species for which two or more individuals were analysed. At a generic level, n
represents the number of genera with two or more species, while at the family level it indicates the total number of species that
were analysed.)
family nwithin species nwithin genus nwithin family
Arctiidae 13 0.33 4 7.0 18 10.0
Geometridae 30 0.23 10 9.1 61 12.5
Noctuidae 42 0.17 12 5.8 90 10.4
Notodontidae 14 0.36 4 5.9 20 12.4
Sphingidae 8 0.17 3 6.4 11 10.5
nomic capability, albeit in a novel format, for groups that
currently lack an authority.
We believe that a COI database can be developed within
20 years for the 5–10 million animal species on the planet
(Hammond 1992; Novotny et al. 2002) for approximately
$1 billion, far less than that directed to other major science
initiatives such as the Human Genome project or the
International Space Station. Moreover, initial efforts
could focus on species of economic, medical or academic
importance. Data acquisition is now simple enough for
individual laboratories to gather, in a single year, COI pro-
files for 1000 species, a number greater than that in many
major taxonomic groups on a continental scale. Once
completed, these profiles will be immediately cost-
effective in many taxonomic contexts, and innovations in
sequencing technology promise future reductions in the
cost of DNA-based identifications.
If advanced comprehensively, a COI database could
serve as the basis for a global bioidentification system
(GBS) for animals. Implementation on this scale will
require the establishment of a new genomics database.
While GenBank aims for comprehensive coverage of
genomic diversity, the GBS database would aim for com-
prehensive taxonomic coverage of just a single gene.
Through web-based delivery, this system could provide
easy access to taxonomic information, a particular benefit
to developing nations. Its adoption by an organization
such as the Global Biodiversity Information Facility or the
All Species Foundation would be an important step
towards ensuring the longevity lacking in many web-based
resources. Once established, this microgenomic identifi-
cation system will overcome the deficits of morphological
approaches to species discrimination: the bounds of intra-
specific diversity will be quantifiable, sibling species will
be recognizable, taxonomic decisions will be objective and
all life stages will be identifiable. Moreover, once com-
plete, the GBS will allow single laboratories to execute
taxon diagnoses across the full spectrum of animal life.
The creation of the GBS will be a substantial undertaking
and will require close alliances between molecular biol-
ogists and taxonomists. However, its assembly promises
both a revolution in access to basic biological information
and a newly detailed view of the origins of biological diver-
sity.
This work was supported by grants from NSERC and the Can-
ada Research Chairs Program to P.D.N.H. Teri Crease, Mel-
ania Cristescu, Derek Taylor, Jonathan Witt and two reviewers
provided helpful comments on earlier drafts of this manuscript.
Proc. R. Soc. L ond. B (2003)
The authors thank Win Bailey, Klaus Bolte, Steve Burian, Don
Klemm, Don Lafontaine, Steve Marshall, Christine Nalepa,
Jeff Webb and Jack Zloty for either providing specimens or
verifying taxonomic assignments. They also thank Lisa Schie-
man, Tyler Zemlak, Heather Cole and Angela Holliss for their
assistance with the DNA analyses.
REFERENCES
Allander, T., Emerson, S. U., Engle, R. E., Purcell, R. H. &
Bukh, J. 2001 A virus discovery method incorporating
DNase treatment and its application to the identification of
two bovine parvovirus species. Proc. Natl Acad. Sci. USA 98,
11 609–11 614.
Avise, J. C. & Walker, D. 1999 Species realities and numbers
in sexual vertebrates: perspectives from an asexually trans-
mitted genome. Proc. Natl Acad. Sci. USA 96, 992–995.
Brown, B., Emberson, R. M. & Paterson, A. M. 1999 Mito-
chondrial COI and II provide useful markers for Weiseana
(Lepidoptera, Hepialidae) species identification. B ull. Ento-
mol. Res. 89, 287–294.
Bucklin, A., Guarnieri, M., Hill, R. S., Bentley, A. M. &
Kaartvedt, S. 1999 Taxonomic and systematic assessment
of planktonic copepods using mitochondrial COI sequence
variation and competitive, species-specific PCR. Hydrobiol-
ogy 401, 239–254.
Cox, A. J. & Hebert, P. D. N. 2001 Colonization, extinction
and phylogeographic patterning in a freshwater crustacean.
Mol. Ecol. 10, 371–386.
Doyle, J. J. & Gaut, B. S. 2000 Evolution of genes and taxa: a
primer. Plant Mol. Biol. 42, 1–6.
Folmer, O., Black, M., Hoeh, W., Lutz, R. & Vrijenhoek, R.
1994 DNA primers for amplification of mitochondrial cyto-
chrome coxidase subunit I from diverse metazoan invert-
ebrates. Mol. Mar. Biol. Biotechnol. 3, 294–299.
Gaston, K. J. & Hudson, E. 1994 Regional patterns of diversity
and estimates of global insect species richness. Biodivers.
Conserv. 3, 493–500.
Godfray, H. C. J., Lewis, O. T. & Memmott, J. 1999 Studying
insect diversity in the tropics. Phil. Trans. R. Soc.
Lond. B354, 1811–1824. (DOI 10.1098/rstb.1999.0523.)
Hamels, J., Gala, L., Dufour, S., Vannuffel, P., Zammatteo,
N. & Remacle, J. 2001 Consensus PCR and microarray for
diagnosis of the genus Staphylococcus, species, and methicil-
lin resistance. BioTechniques 31, 1364–1372.
Hammond, P. 1992 Species inventory. In Global biodiversity:
status of the earth’s living resou rces (ed. B. Groombridge), pp.
17–39. London: Chapman & Hall.
Hawksworth, D. L. & Kalin-Arroyo, M. T. 1995 Magnitude
and distribution of biodiversity. In Global biodiversity assess-
ment (ed. V. H. Heywood), pp. 107–191. Cambridge Uni-
versity Press.
Jarman, S. N. & Elliott, N. G. 2000 DNA evidence for mor-
DNA-based identi cations P. D. N. Hebert and others 321
phological and cryptic Cenozoic speciations in the Anaspidi-
dae, ‘living fossils’ from the Triassic. J. Evol. Biol. 13,
624–633.
Knowlton, N. 1993 Sibling species in the sea. A. Rev. Ecol.
Syst. 24, 189–216.
Knowlton, N. & Weigt, L. A. 1998 New dates and new rates
for divergence across the Isthmus of Panama. Proc. R. Soc.
Lond. B265, 2257–2263. (DOI 10.1098/rspb.1998.0568.)
Kumar, S. & Gadagkar, S. R. 2000 Efficiency of the neigh-
bour-joining method in reconstructing deep and shallow
evolutionary relationships in large phylogenies. J. Mol. Evo l.
51, 544–553.
Kumar, S., Tamura, K., Jacobsen, I. B. & Nei, M. 2001
MEGA2: molecular evolutionary genetics analysis sof tware.
Tempe, AZ: Arizona State University.
Kurtzman, C. P. 1994 Molecular taxonomy of the yeasts. Yeast
10, 1727–1740.
Lessa, P. 1990 Multidimensional scaling of geographic genetic
structure. Syst. Zool. 39, 242–252.
Lynch, M. & Jarrell, P. E. 1993 A method for calibrating mol-
ecular clocks and its application to animal mitochondrial
DNA. Genetics 135, 1197–1208.
Mardulyn, P. & Whitfield, J. B. 1999 Phylogenetic signal in
the COI, 16S, and 28S genes for inferring relationships
among genera of Microgastrinae (Hymenoptera:
Braconidae): evidence of a high diversification rate in this
group of parasitoids. Mol. Phylogenet. Evol. 12, 282–294.
Nanney, D. L. 1982 Genes and phenes in Tetrahymena.Biosci-
ence 32, 783–788.
Nei, M. & Kumar, S. 2000 Molecular evolution and phylogen-
etics. Oxford University Press.
Novotny, V., Baset, Y., Miller, S. E., Weiblen, G. D., Bremer,
B., Cizek, L. & Drezel, P. 2002 Low host specificity of her-
bivorous insects in a tropical forest. Nature 416, 841–845.
Proc. R. Soc. L ond. B (2003)
Pace, N. R. 1997 A molecular view of microbial diversity and
the biosphere. Science 276, 734–740.
Saccone, C., DeCarla, G., Gissi, C., Pesole, G. & Reynes, A.
1999 Evolutionary genomics in the Metazoa: the mitochon-
drial DNA as a model system. Gene 238, 195–210.
Simmons, R. B. & Weller, S. J. 2001 Utility and evolution of
cytochrome bin insects. M ol. Phylogenet. Evol. 20 , 196–
210.
Trewick, S. A. 2000 Mitochondrial DNA sequences support
allozyme evidence for cryptic radiation of New Zealand Per-
ipatoides (Onychophora). Mol. Ecol. 9, 269–282.
Vincent, S., Vian, J. M. & Carlotti, M. P. 2000 Partial
sequencing of the cytochrome oxidase-b subunit gene. I. A
tool for the identification of European species of blow flies
for post mortem interval estimation. J. Forensic Sci. 45,
820–823.
Wares, J. P. & Cunningham, C. W. 2001 Phylogeography and
historical ecology of the North Atlantic intertidal. Evolution
12, 2455–2469.
Williams, S. T. & Knowlton, N. 2001 Mitochondrial pseudo-
genes are pervasive and often insidious in the snapping
shrimp Alpheus.Mol. Biol. Evol. 18, 1484–1493.
Wilson, K. H. 1995 Molecular biology as a tool for taxonomy.
Clin. Infect. Dis. 20(Suppl.), 192–208.
Zhang, D.-X. & Hewitt, G. M. 1997 Assessment of the univer-
sality and utility of a set of conserved mitochondrial primers
in insects. Insect Mol. Biol. 6, 143–150.
As this paper exceeds the maximum leng th normally permitted, the
authors have agreed to contribute to production costs.
Visit http://www.pubs.royalsoc.ac.uk to see electronic appendices to
this paper.
... Because there are no consistent morphological features that distinguish between these two species, the identi cation issue can be resolved by employing appropriate DNA markers and methods of DNA barcoding (Hebert et al. 2003;Kress and Erickson 2007). The barcoding of representatives of the plant kingdom requires the use of rbcL and matK genes, as well as intergenic spacer trnH-psbA of chloroplast DNA (cpDNA) and sequence ITS1-5.8s-ITS2 of the ribosomal nuclear DNA cluster (nrDNA) ( An effective marker for phylogenetic studies of the tribe Tordylieae, which comprises the Heracleum genus, is the nrDNA ETS (external transcribed spacer) sequence, especially in combination with the ITS sequence (Logacheva et al. 2010). ...
Preprint
Full-text available
Heracleum mantegazzianum Sommier & Levier and Heracleum sosnowskyi Manden. are two species that belong to the giant invasive hogweed complex. H. mantegazzianum is predominantly found in Western European countries, while H. sosnowskyi is invasive in the European part of Russia and Eastern European countries. The taxonomy of the Heracleum genus is quite complex, and identifying these species requires extensive expertise. Surprisingly, although H. mantegazzianum and H. sosnowskyi are considered separate species, their morphological and ecological-physiological properties, as well as their ontogeny and population structure, exhibit remarkable similarities, making them ecological twins. The intentional introduction of this invasive species was initially conducted in the cities of Kirovsk city (Murmansk region, Russia) and Syktyvkar city (Komi Republic, Russia). Plant materials sourced from these two regions were subsequently distributed to all regions encompassing the modern hogweed invasion range across the former USSR countries. The objective of this study was to test the hypothesis that the plants initially introduced in Kirovsk and Syktyvkar actually belong to H. mantegazzianum . To accomplish this, herbarium material was collected, and DNA barcoding was performed on 16 samples of giant invasive hogweed from the vicinity of the cities of Kirovsk and Syktyvkar, as well as on 30 H. mantegazzianum samples collected within its native range in the Western Caucasus. The results of morphological identification combined with DNA barcoding demonstrate that H. mantegazzianum and the plants growing in Kirovsk and Syktyvkar belong to the same species – H. mantegazzianum , rather than H. sosnowskyi as previously believed.
... This limitation can be partially removed by the use of molecular methods when DNA markers can be isolated from specimens and compared with special reference libraries, for example, BOLD systems or GenBank. DNA barcoding is established as a modern approach to species identification using short DNA sequences which are characteristic for the species (Hebert et al. 2003;Schoch et al. 2012). The aim of this paper is to study species-poor lichen communities on shells of the Tendra Spit (northern Black Sea coast, Ukraine) using fungal DNA barcode data for the identification of dominant lichens from the genera Xanthoria and Xanthocarpia, which are suspected to constitute an intricate complex (Vondrák et al. 2011) or include so far unrecognized taxa. ...
Article
A new species, Xanthoria tendraensis (Teloschistales, Lecanoromycetes), is described from species-poor lichen communities on shell dunes of the Tendra Spit (northern Black Sea coast, Ukraine), based both on morphological and molecular data. This lichen is similar to X. ectaneoides but differs in having longer ascospores with thicker walls at the poles. A new lichen association, Xanthorietum tendraensis, is described from stable grey dunes. Its characteristic species are Lecania sylvestris s. lat., Polyozosia perpruinosa, Scythioria phlogina and X. tendraensis. The species composition of the new association is similar to communities of Aspicilion contortae (Verrucarietea nigrescentis). Six new sequences of nrITS region were obtained for Xanthoria tendraensis, Xanthocarpia fulva s. lat. and Trebouxia crenulata, which was identified as a photobiont of X. tendraensis.
... Cytochrome Oxidase 1 (CO1), which is a portion of the mitochondrial gene, was transcribed in leafminers and parasitoids populations by aiding Folmer primers LCOI490 (Forward) and HCO2198 (Reverse) (Hebert et al., 2003). Forward primer (5'-3') : GGTCAACAAATCATAAAGATATTGG, Reverse primer (3'-5') : TAAACTTCAGGGTAACCAAAAAATCA. ...
Article
Full-text available
The South American leafminer, Tuta absoluta is an exotic devastative pest on solanaceous vegetables, including tomatoes, which leads to a cent per cent economic loss in India. The molecular markers assist in assessing gene flow, migratory frequencies, and genetic variety, as well as helping to evaluate the genetic makeup and diversification of an exotic species population to indigenous ones. With this, the present study aimed to investigate the genetic divergence of T. absoluta in different districts of Tamil Nadu, India. The study depicted the examination of genetic divergence of T. absoluta by aiding amplified region of mitochondrial DNA encoding cytochrome oxidase I (COI) from the T. absoluta samples gathered from Coimbatore, Dharmapuri and Dindigul districts of Tamil Nadu. The findings showed that the phylogenetic tree constructed from all sequences of T. absoluta acquired from the NCBI (National Center for Biotechnology Information) and BOLD (The Barcode of Life Data System) databases exhibited 99 percent identity and aggregated together into a single clade. . 5Hence, the present study revealed the great genetic uniformity in T. absoluta populations in India and corroborates that most of the globe rely on the partial COI gene, evidenced by minimal nucleotide diversity.
... Cytochrome Oxidase 1 (CO1), which is a portion of the mitochondrial gene, was transcribed in leafminers and parasitoids populations by aiding Folmer primers LCOI490 (Forward) and HCO2198 (Reverse) (Hebert et al., 2003). Forward primer (5'-3') : GGTCAACAAATCATAAAGATATTGG, Reverse primer (3'-5') : TAAACTTCAGGGTAACCAAAAAATCA. ...
Article
Full-text available
Crop modelling can make it easier for researchers to comprehend and describe experimental results and pinpoint yield disparities. In this competition, the impact of pigeonpea growth and yield under various fertigation levels was examined using the Decision Support Systems for Agrotechnology Transfer 4.6 (DSSAT) and CROPGRO pigeonpea models. Under drip fertigated levels, the cultivars received various nutrient doses. The pigeonpea model developed by DSSAT-CROPGRO successfully simulated measured pigeonpea grain yield. The field trials took place in Coimbatore at the millet breeding facility of the Tamil Nadu Agricultural University. The study ran the GLUE coefficient estimator to estimate the cultivar coefficients until it had a good match between the predicted and observed seed yield. The accuracy of the model was measured by calculating its R-squared, RMSE, NRMSE, and Agreement percentage. According to model simulation and field measurements, drip fertigation at 125% RDF via WSF + Azophosmet and foliar spray of 1% PPFM resulted in the highest seed output of 1875 kg ha-1(V1F5) over both years. The increase in seed yield with drip fertigation at 125% RDF via WSF + Azophosmet and foliar spray of 1% PPFM (V1F5) was 8.0 - 11.0% when compared to drip fertigation at 100% RDF via WSF + Azophosmet and foliar spray of 1% PPFM (V1F4)12.9 - 16.1 % compared to drip fertigation at 100% RDF through WSF; and 68.0 - 74.3 % compared to conventional fertilizer. It was indicated that the DSSAT v.4.6 can be a helpful tool for determining and forecasting pigeonpea growth yield if it is appropriately calibrated. Simulation models substantially facilitated maximizing crop growth and generating management advice.
... Small ventral integument tissue samples were assembled in 96-well plates and shipped to the Centre for Biodiversity Genomics at the University of Guelph (Ontario, Canada) for processing. After the total genomic DNA was extracted using a CTABbased approach, the standard DNA barcode for animals [18] a 658bp fragment of mitochondrial gene cytochrome c oxidase subunit 1 (COI) -was amplified using the primers C_LepFolF-C_LepFolR [19]. Sequencing reactions were carried out with the same primer pair, and the products were subjected to clean-up with PureSeq-MP (Aline Biosciences, Woburn, USA) before Sanger sequencing in a DNA sequencer (ABI 3730XL). ...
Article
Full-text available
The genus Aporrectodea includes some of the most conspicuous earthworm species, but its taxonomic history is among the most complex within the family Lumbricidae. Molecular phylogenetic studies have produced some advances by assigning former Aporrectodea species to other monophyletic clades and by detecting species level lineages within the cosmopolitan caliginosa-trapezoides complex. However, little attention has been devoted to endemic taxa of Aporrectodea such as Ap. rubra, Ap. arverna, Ap. gogna, Ap. balisa, Ap. velox, Ap. giardi voconca and Ap. longa ripicola. These earthworms (and additional populations of Ap. longa and Ap. nocturna) were included in a molecular phylogenetic framework in order to reconstruct the ancestral range of the genus, as well as to help understand its diversification within its native range and to perform a systematic revision. Species delimitation, ancestral area reconstruction and Bayesian inference of the phylogenetic relationships were performed using a large gene sequence (COI) dataset and a narrower dataset composed of 5 mitochondrial and nuclear markers. Phylogenetic position and species delimitation indicated that Ap. giardi voconca and Ap. longa ripicola constitute species-level entities not closely related to Ap. giardi or Ap. longa, and they were thus redescribed as Aporrectodea voconca stat. nov. and Aporrectodea ripicola stat. nov. Ancestral area reconstruction enabled location of the origin of Aporrectodea in the Auvergne-Rhône-Alps, in Southeastern France. The study findings provide some insight into the evolution of functional traits in this ecologically successful genus. Ap. rubra and Ap. arverna (small, reddish, epigeic/epianecic) and Ap. gogna (very large, dark, anecic) were recovered as the earliest branching taxa, suggesting a complex evolution of functional traits within this genus.
... They also have low nucleotide substitution rates and demonstrate greater overall conservation concerning gene structure, content, and organization (Zhang et al. 2023;Waheed et al. 2023). The technique of 'DNA barcoding' was introduced by Hebert et al. (2003). Within this approach, DNA barcode sequences serve as potent tools facilitating the swift and extensive taxonomic identification of species (Gostel et al. 2022). ...
Article
Full-text available
In this study, DNA barcoding and phylogenetic analysis of five Prunus armeniaca L. genotypes (Cataloglu, Hacihaliloglu, Hasanbey, Hudayi, and Kabaasi) grown in Malatya/Türkiye were conducted using chloroplast DNA (cpDNA) matK and rbcL regions. The cpDNA matK region was amplified using matK472F and matK1248R primers, while the rbcL region was amplified with rbcLaF and rbcLaR primers. The matK and rbcL sequences were utilized to assess nucleotide ratios, genetic distance, and nucleotide diversity (π). The neighbor- joining (NJ) tree including other Prunus species from NCBI was created. In addition, physiochemical and 3-D analysis of matK and rbcL proteins were performed. As a result, π = 0.000549 for matK sequence and π = 0.002657 for rbcL sequence were determined. NJ (neighbor joining) phylogenetic trees formed with both matK and rbcL sequences were found to be compatible with each other. The matK and rbcL gene regions were found to be suitable for phylogenetic analysis.
... In contrast, DNA barcoding occupies an intermediary role, aiming for comprehensive species coverage while emphasizing their identi cation rather than relational aspects. DNA barcoding represents a relatively recent technique that has been developed to offer swift, accurate, and automated species identi cation by utilizing standardized DNA sequences as tags (Hebert et al. 2003;Taberlet 2007 In the present context, barcoding has evolved into a dependable technique for species identi cation (Vijayan and Tsou 2010;Singh et al. 2021). The fundamental principle underlying barcoding involves comparing sequence data from an unknown sample (the specimen under study) to a reference sequence obtained from a voucher specimen. ...
Preprint
Full-text available
Evolution of genus is accompanied by ecological diversification. The majority of species grow in open, sunny, rather dry sites in arid and moderately humid climates. However, Allium species have adapted for many other ecological niches. Classical approaches for the identification of Allium cultivars are based on morphological traits. The assessment of these traits is difficult and their evaluation can be subjective considering that most of these cultivars are closely related. Hence, this study of Internal Transcribed Spacer (ITS) sequencing and four barcoding regions, matK, rbcL, trnH-psbA, trnL and Inter Simple Sequence Repeats (ISSR) were researched in onion, Allium cepa L. (Alliaceae) collected from three different cultivation sites. The results established noticeable hereditary divergence among the three cultivars. In ITS and matK, BDUT 1453, BDUT 1454 and BDUT 1455 were independent of each other and formed three clusters. In rbcL, BDUT 1453 formed an independent cluster from the cluster of BDUT 1454 and BDUT 1455. But in trnH-psbA, BDUT 1454 formed an independent cluster and BDUT 1453 and BDUT 1455 were closely placed whereas trnL showed all the three forming a cluster wherein BDUT 1453 and BDUT 1454 were placed closely in a sub-cluster. In ISSR, BDUT 1454 and BDUT 1455 formed a single cluster and BDUT 1453 diverged from it. Even though the tested cultivars belong to the same species they showed genetic divergence among themselves.
... The BLAST analysis shows that the Wild B. burdigala sequence is very close to B. uberis with a similarity identity of up to 96.66%. According to Hebert et al. (2003), species with 97-100% similarity levels are identical and species with differences above 3% based on COI genes are a different species. Based on the result, we confirm that the sequence we obtained from Bikang Village, Bangka Island, Indonesia is Wild B. burdigala, however we cannot show genetic similarity between our sequence with th Wild B. burdigala because the Wild B. burdigala sequence is not yet in the NCBI GenBank database. ...
Article
Full-text available
Betta burdigala is an endemic Wild Betta which only known from Bangka Island, Indonesia. This species was listed as a Critically Endangered (CR) species based on the IUCN Red List of threatened species where the population of this species was significantly decreasing. We collect fivespecimens of B. burdigala from Bangka Island, Indonesia with 36-37 mm of Total Lenght (TL) from 7 to 13 April 2023 in Bikang River, Toboali District, South Bangka Regency, Bangka Island, Indonesia. In this research, we report the first record of the DNA Barcoding of B. burdigala based on the Cytochrome C oxidase Subunit I (COI) gene. We have registered the DNA Barcode of B. burdigala to the Genbank with the accession code OQ281707. DNA barcoding is a solution to provide accurate, fast and automatable identify species and species discovery. B. burdigala Based on the COI gene, B. burdigala and B. uberis have a close genetic distance and a DNA similarity of 96.66%, much higher than other bettas. Their genetic distance is 0.04, while a genetic distance between 0.010 and 0.099 is considered to be low and indicative of high similarity. According to the phylogenetic tree, these species are descended from a single, closely related ancestor on the same branch. Based on the COI gene, we assume that they are identical. Additionally, we advise conducting additional research using the mitochondrial DNA complex and in-depth morphological examination to confirm the additional of the study's findings.
Article
Full-text available
A specimen of Eptesicus furinalis was collected in the municipality of Cândido Mendes, in the state of Maranhão, Brazil. It was a non-lactating adult female, with dark chestnut dorsal coloration, yellow venter, hairless membranes, short and rounded ears, and a pointed tragus. The analysis of the DNA barcode of the COI mitochondrial gene revealed a 99.80 % similarity with the sequence of E. furinalis deposited in the BOLDSystems platform. The combined analysis of the morphological and molecular data confirmed the occurrence of E. furinalis in the state of Maranhão. This extends the known distribution of the species 676.1 km from the nearest recorded locality in the Ceará State.
Article
Full-text available
The Cuttlefish Sepia pharaonis, is an Indo–Pacific organism and is one of the most economic species in the Suez Canal fisheries. Despite its economic value, there is a shortage of taxonomical information on the species. A total of 50 specimens of cuttlefish were collected from the Suez Canal. The sampling was during the period from winter to autumn 2021. Samples were identified morphologically and genetically by using COI gene. The species showed a genetic variation between the different populations of the same species and a genetic variation from the other species of cephalopods. The molecular identification method showed a great reliability in the identification of the cuttlefish.
Article
Full-text available
A method for identifying the members of the endemic genus Wiseana Viette from New Zealand is described. Seven species have been described in the genus: W. cervinata (Walker), W. copularis (Meyrick), W. fuliginea (Butler), W. jocosa (Meyrick), W. mimica (Philpott), W. signata (Walker) and W. umbraculata (Guenée). No morphological characters have been identified to distinguish between the larvae of each species and adult females exhibit high levels of intra and interspecific morphological variation making identification difficult or impossible. Adult males can be distinguished by a combination of scale, antennal and genital characters, but this requires considerable taxonomic experience. Molecular markers were generated via amplification of the cytochrome oxidase subunit I and II (COI and II) of the mitochondrial DNA by the polymerase chain reaction (PCR). Amplified DNA was digested with restriction enzymes to give characteristic fragment patterns. Fourteen restriction enzymes were surveyed and a combination of four of these distinguish all Wiseana taxa except W. fuliginea and W. mimica.
Article
Full-text available
Abstract Recent glaciation covered the full extent of rocky intertidal habitat along the coasts of New England and the Canadian Maritimes. To test whether this glaciation in fact caused wholesale extinction of obligate rocky intertidal invertebrates, and thus required a recolonization from Europe, we compared American and European populations using allelic diversity and techniques adapted from coalescent theory. Mitochondrial DNA sequences were collected from amphi-Atlantic populations of three cold-temperate obligate rocky intertidal species (a barnacle, Semibalanus balanoides, and two gastropods, Nucella lapillus and Littorina obtusata) and three cold-temperate habitat generalist species (a seastar, Asterias rubens; a mussel, Mytilus edulis, and an isopod, Idotea balthica). For many of these species we were able to estimate the lineage-specific mutation rate based on trans-Arctic divergences between Pacific and Atlantic taxa. These data indicate that some obligate rocky intertidal taxa have colonized New England from European populations. However, the patterns of persistence in North America indicate that other life-history traits, including mechanisms of dispersal, may be more important for surviving dramatic environmental and climatic change.
Article
The speciation history of Anaspides tasmaniae (Crustacea: Malacostraca) and its close relatives (family Anaspididae) was studied by phylogenetic and molecular clock analyses of mitochondrial DNA sequences. The phylogenetic analyses revealed that the Anaspides morphotype conceals at least three cryptic species belonging to different parts of its range. The occurrence of multiple cryptic phylogenetic species within one morphological type shows that substantial genetic evolution has occurred independently of morphological evolution. Molecular clock dating of the speciation events that generated both the cryptic and the morphological species of Anaspididae indicated continuous speciation within this group since the Palaeocene similar to 55 million years ago. This relatively constant rate of recent morphological and cryptic speciation within the Anaspididae suggests that the speciation rate in this group does not correlate with its low extinction rate or morphological conservatism.
Chapter
The objective of this section is to explore how far global biodiversity may have been accounted for by taxonomic description, emphasising diversity at the species level. This is done with reference to the total number of species currently recognised (itself very imprecisely known) and the degree to which we can estimate the completeness of taxonomic knowledge.
Article
The Tetrahymena pyriformis complex is a large group of species of ciliated protozoa that are totally isolated from each other genetically, but that cannot be distinguished from each other phenotypically. In contrast, they are enormously diversified in their nucleic acids and proteins. The simplest interpretation of these observations is that the Tetrahymena phenotypic design was established long ago in a common ancestor, and this design has persisted unchanged, while its molecular components have become scrambled by microevolutionary processes.
Article
The term ‘yeast’ is often taken as a synonym for Saccharomyces cerevisiae, but the phylogenetic diversity of yeasts is illustrated by their assignment to two taxonomic classes of fungi, the ascomycetes and the basidiomycetes. Subdivision of taxa within their respective classes is usually made from comparisons of morphological and physiological features whose genetic basis is often unknown. Application of molecular comparisons to questions in yeast classification offers an unprecedented opportunity to re-evaluate current taxonomic schemes from the perspective of quantitative genetic differences. This review examines the impact of molecular comparisons, notably rRNA/rDNA sequence divergence, on the current phenotypically defined classification of yeasts. Principal findings include: 1) budding ascomycetous yeasts are monophyletic and represent a sister group to the filamentous ascomycetes, 2) fission yeasts are ancestral to budding and filamentous ascomycetes, 3) the molecular phylogeny of basidiomycetous yeasts is generally congruent with type of hyphal septum, presence or absence of teliospores in the sexual state, and occurrence of cellular xylose.
Article
Accurate taxonomic identification of species at all life stages is critical to understand and predict the processes that together determine marine community dynamics. However, zooplankton assemblages may include numerous sibling and congeneric species distinguished by subtle morphological characteristics. Molecular systematic databases, including DNA sequences of homologous gene regions for selected taxonomic groups, allow the design of rapid protocols to determine species' diversity and identify individuals. In this study, the DNA sequence of a 300 base-pair region of the mitochondrial cytochrome oxidase I (COI) gene was determined for eight species of three genera of calanoid copepods: Calanus finmarchicus, C. glacialis and C. helgolandicus; Neocalanus cristatus, N. flemingeri and N. plumchrus; and Pseudocalanus moultoni and P. newmani. The DNA sequences differed between congeneric species by 13 – 22% of the nucleotides; the protein sequences differed by zero to five amino acid substitutions. Both the DNA and amino acid sequences resolved the evolutionary relationships among congeneric species; relationships among the genera were not well-resolved by this region of mtCOI. Using the same conserved primers, the only amplification product for C. finmarchicus was an aberrant sequence (and putative pseudogene) which differed from the C. finmarchicus COI sequence by 36% of the nucleotides and 32 amino acid substitutions. Species-specific oligonucleotide primers were designed for Calanus spp. (which cannot be distinguished at larval stages) and Pseudocalanus spp. (which are difficult to distinguish even as adults). Individual copepods were identified using competitive, multiplexed species-specific polymerase chain reactions (PCR) in two studies of co-occurring sibling species. The first study confirmed the presence of three Calanus spp. in Oslofjord, Norway and found a predominance of C. helgolandicus. The second study determined patterns of distribution and abundance of Pseudocalanus spp. on Georges Bank in the NW Atlantic and showed that P. moultoni predominated in shallow and coastal waters, while P. newmani was more abundant in offshore regions flanking the Bank. Competitive, species-specific PCR is a useful tool for biological oceanographers. This simple, rapid, and inexpensive assay may be used to identify morphologically-similar individuals of any size and life stage, and to determine a species' presence or absence in pooled samples.