ArticlePublisher preview availableLiterature Review

The unholy trinity: Taxonomy, species delimitation and DNA barcoding

The Royal Society
Philosophical Transactions B
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Recent excitement over the development of an initiative to generate DNA sequences for all named species on the planet has in our opinion generated two major areas of contention as to how this 'DNA barcoding' initiative should proceed. It is critical that these two issues are clarified and resolved, before the use of DNA as a tool for taxonomy and species delimitation can be universalized. The first issue concerns how DNA data are to be used in the context of this initiative; this is the DNA barcode reader problem (or barcoder problem). Currently, many of the published studies under this initiative have used tree building methods and more precisely distance approaches to the construction of the trees that are used to place certain DNA sequences into a taxonomic context. The second problem involves the reaction of the taxonomic community to the directives of the 'DNA barcoding' initiative. This issue is extremely important in that the classical taxonomic approach and the DNA approach will need to be reconciled in order for the 'DNA barcoding' initiative to proceed with any kind of community acceptance. In fact, we feel that DNA barcoding is a misnomer. Our preference is for the title of the London meetings--Barcoding Life. In this paper we discuss these two concerns generated around the DNA barcoding initiative and attempt to present a phylogenetic systematic framework for an improved barcoder as well as a taxonomic framework for interweaving classical taxonomy with the goals of 'DNA barcoding'.
The unholy trinity: taxonomy, species delimitation
and DNA barcoding
Rob DeSalle*, Mary G. Egan and Mark Siddall
Division of Invertebrate Zoology, American Museum of Natural History, 79th Street at Central Park West,
New York, NY 10024, USA
Recent excitement over the development of an initiative to generate DNA sequences for all named
species on the planet has in our opinion generated two major areas of contention as to how this ‘DNA
barcoding’ initiative should proceed. It is critical that these two issues are clarified and resolved,
before the use of DNA as a tool for taxonomy and species delimitation can be universalized. The first
issue concerns how DNA data are to be used in the context of this initiative; this is the DNA barcode
reader problem (or barcoder problem). Currently, many of the published studies under this initiative
have used tree building methods and more precisely distance approaches to the construction of the
trees that are used to place certain DNA sequences into a taxonomic context. The second problem
involves the reaction of the taxonomic community to the directives of the ‘DNA barcoding’ initiative.
This issue is extremely important in that the classical taxonomic approach and the DNA approach
will need to be reconciled in order for the ‘DNA barcoding’ initiative to proceed with any kind of
community acceptance. In fact, we feel that DNA barcoding is a misnomer. Our preference is for the
title of the London meetings—Barcoding Life. In this paper we discuss these two concerns generated
around the DNA barcoding initiative and attempt to present a phylogenetic systematic framework for
an improved barcoder as well as a taxonomic framework for interweaving classical taxonomy with the
goals of ‘DNA barcoding’.
Keywords: DNA barcoding; taxonomy; species delimitation; muntjac; leeches; sturgeon
1. INTRODUCTION: BUILDING A BETTER DNA
BARCODER
One of the major issues concerning the inclusion of
molecular information into taxonomic aspects of
biology that has yet to be discussed in detail in the
commentaries on this subject is concerning the best
way to read the barcodes. There are two separate tasks
to which DNA barcodes are currently being applied.
The first is the use of DNA data to distinguish between
species (equivalent to species identification or species
diagnosis) and the second is the use of DNA data to
discover new species (equivalent to species delimita-
tion, species description). These two activities differ in
the types and amount of data required. Below we
highlight some of the issues that may limit the utility of
current DNA barcoding endeavours (especially those
used for species discovery) and suggest a framework for
the development of a barcoder that addresses these
issues.
(a)The barcoder engine: distances or characters?
A major issue that needs to be resolved is how to read
the organismal barcode once it is generated. Most
recently published approaches to DNA barcoding have
utilized distance measures to make the inference as to
species designation (Hebert et al. 2003a,b,2004a,b).
Distances are used in two major approaches; the first is
a simple BLAST (Altschul et al. 1990) approach where
a raw similarity score is used to determine the nearest
neighbour to the query sequence. The second
approach utilizes distances in tree building (Hebert
et al. 2003a,b). We point out the following short-
comings with these approaches and further suggest that
character based approaches are more appropriate for
DNA barcoding both for theoretical and for practical
reasons.
A major shortcoming of using distances in DNA
barcoding is that all classical studies and taxonomic
schemes that accomplish the same thing that barcodes
are meant to accomplish are character based, making
the union of classical and DNA barcoding a difficult
process if the use of distances is continued in barcoding
studies (see below). This shortcoming also is related to
the need for diagnostic characters that classical studies
use to validate the existence of a species. A second
shortcoming is that similarity scores often do not give
the nearest neighbour as the closest relative (Koski &
Golding 2001). Nevertheless, similarity scores will
always give a nearest neighbour. Character based
methods have the logical advantage that when diag-
nostic character data are lacking, they will fail to
diagnose, allowing for a degree of hypothesis testing not
available when using distances. A third shortcoming
involves the lack of an objective set of criteria to
delineate taxa when using distances. For example, a
universal similarity cut-off to determine species status
will simply not exist, because of the broad overlap of
inter- and intra-specific distances (Goldstein et al.
2000). Researchers will have to constantly revise their
similarity cut-offs from group to group. We suspect that
Phil. Trans. R. Soc. B (2005) 360, 1905–1916
doi:10.1098/rstb.2005.1722
Published online 14 September 2005
One contribution of 18 to a Theme Issue ‘DNA barcoding of life’.
*Author for correspondence (desalle@amnh.org).
1905 q2005 The Royal Society
... Obviously, the analysis of DNA sequences provides very powerful lines of taxonomic evidence, leading even to proposals of purely DNA-based taxonomy [27]. Indeed, although DNA barcoding was mostly conceived for specimen identification, its application was rapidly extended to also encompass species discovery and delimitation [28][29][30][31]. Organellar DNA sequences such as COI barcodes represent the majority of the molecular data produced so far by taxonomists, as they are easy to amplify and remarkably informative due to their high evolutionary rate. ...
... Considering a probabilistic approach to species, this basically means that the lower the genetic distance of a subset Fig. 6 Three different previously published workflows of integrative species delimitation. (a) The "taxonomic circle" (from DeSalle et al. [29]). This protocol represents the congruence approach proposed by these authors. ...
... The biogeographical context constitutes an important interpretative framework. Interestingly, many species delimitation approaches invoke allopatry, i.e., absence of co-occurrence, as a relevant criterion to consider two genetically divergent subsets as distinct species [16,29]. In contrast, along with Hillis et al. [83], we highlight that co-occurrence of two subsets of individuals without admixture in sympatry or parapatry provides the ultimate evidence for them representing two distinct species and thus highly valuable information in species delimitation. ...
Article
Over the past two decades, DNA barcoding has become the most popular exploration approach in molecular taxonomy, whether for identification, discovery, delimitation, or description of species. The present contribution focuses on the utility of DNA barcoding for taxonomic research activities related to species delimitation, emphasizing the following aspects: (1) To what extent DNA barcoding can be a valuable ally for fundamental taxonomic research, (2) its methodological and theoretical limitations, (3) the conceptual background and practical use of pairwise distances between DNA barcode sequences in taxonomy, and (4) the different ways in which DNA barcoding can be combined with complementary means of investigation within a broader integrative framework. In this chapter, we recall and discuss the key conceptual advances that have led to the so-called renaissance of taxonomy, elaborate a detailed glossary for the terms specific to this discipline (see Glossary in Chap. 35), and propose a newly designed step-by-step species delimitation protocol starting from DNA barcode data that includes steps from the preliminary elaboration of an optimal sampling strategy to the final decision-making process which potentially leads to nomenclatural changes.
... In biological groups where the traditional taxonomy is ambiguous or where speciation has occurred without changes in morphological features, molecular data is an important tool for delimiting species (Hebert et al. 2003 a,b;Hebert et al. 2004;Barrett & Hebert 2005). Using an integrative approach that corroborates hypotheses generated from multiple approaches and under different lines of evidence is essential, with species limits expected to show congruency across all results (DeSalle et al. 2005;Carstens et al. 2013). ...
... These methods and approximations can differ amongst themselves and often show results that depend on the study group Valdez-Mondragón 2020;Nolasco & Valdez-Mondragón 2022). Therefore, it is recommended to use more than one molecular method to define more precise species hypotheses, in combination with morphological evidence as part of an integrative approach (Dayrat 2005;DeSalle et al. 2005;Padial et al. 2010;Carstens et al. 2013). ...
Article
Full-text available
In modern systematics, different sources of evidence are commonly used for the discovery, identification, and delimitation of species, especially when morphology fails to delineate between species or in underestimated species complexes or cryptic species. In this study, morphological data and two DNA barcoding markers—cytochrome c oxidase subunit I (COI) and internal transcribed spacer 2 (ITS2)—were used to delimit species in the spider genus Loxosceles from North America. The molecular species delimitation analyses were carried out using three different methods under the corrected p-distance Neighbor-Joining (NJ) criteria: 1) Assemble Species by Automatic Partitioning (ASAP), 2) General Mixed Yule Coalescent model (GMYC), and 3) Bayesian Poisson Tree Processes (bPTP). The analyses incorporated 192 terminals corresponding to 43 putative species of Loxosceles, of which 15 are newly recognized herein, as putative new species, based on morphology and congruence between molecular methods with COI. The average intraspecific genetic distance (p-distance) was <2%, whereas the average interspecific genetic distance was 15.6%. The GMYC and bPTP molecular methods recovered 65-79 and 69 species respectively, overestimating the diversity in comparison with morphology, whereas the ASAP method delimited 60 species. The morphology of primary sexual structures (males palps and female seminal receptacles) was congruent with most of the molecular methods mainly with COI, showing that they are robust characters for identification at the species level. For species delimitation COI was more informative than ITS2. The diversity of Loxosceles species is still underestimated for North America, particularly in Mexico which holds the highest diversity of this genus worldwide.
... DNA taxonomy 1 -to characterize species based on DNA sequences-has become well feasible and highly effective in the past two decades [1][2][3][4]. Meanwhile, the array and number of markers (Table 1) and approaches for species identification and delimitation have considerably diversified [5][6][7]. ...
Article
The use of DNA has helped to improve and speed up species identification and delimitation. However, it also provides new challenges to taxonomists. Incongruence of outcome from various markers and delimitation methods, bias from sampling and skewed species distribution, implemented models, and the choice of methods/priors may mislead results and also may, in conclusion, increase elements of subjectivity in species taxonomy. The lack of direct diagnostic outcome from most contemporary molecular delimitation approaches and the need for a reference to existing and best sampled trait reference systems reveal the need for refining the criteria of species diagnosis and diagnosability in the current framework of nomenclature codes and good practices to avoid nomenclatorial instability, parallel taxonomies, and consequently more and new taxonomic impediment.
... Because it is not always feasible to discriminate between species belonging to the same genus, larval taxonomy is complex and time-consuming " (Burington, 2011;Ruiter et al., 2013)." "By permitting fast surveys of novel regions, In combination with traditional taxonomy, DNA barcoding and the use of sequence databases can help speed up this process (Ruiter et al., 2013;DeSalle, Egan and Siddall, 2005;Jinbo et al., 2011;Pauls et al., 2010;Zhou et al., 2007). The Barcode of Life Database (BOLD) contains DNA barcodes for approximately 260,000 species, including 4555 Trichoptera species, and enables species identification using the COI DNA gene component I (Mir et al., 2020)." ...
Chapter
In India, the phylogenetic studies of Trichoptera at the species level are poorly known; however, they are important for studying the freshwater fauna and censorious use in biological indication. India records the lowest number of insect DNA barcodes in BOLD. This chapter is concentrating on the phylogenetic study of Trichoptera found in India. The Himalayas are representing the largest association of caddisflies. The "DNA Barcoding" approach is based on the sequencing of the systemized portion of the "mitochondrial (mt) COX subunit 1 (COI)" gene to enable fast and exact identification of a wide range of faunal samples, which are used to infer evolutionary relationships within Trichoptera. The MT-CO1 gene encodes a component called cytochrome c oxidase I (COX1), also known as mitochondrial cytochrome c oxidase I (MT-CO1). The gene sequences of interest are synthesized at designated barcoding centers, and the sequences are then submitted to the gene bank to infer evolutionary relationships.
... Because it is not always feasible to discriminate between species belonging to the same genus, larval taxonomy is complex and time-consuming " (Burington, 2011;Ruiter et al., 2013)." "By permitting fast surveys of novel regions, In combination with traditional taxonomy, DNA barcoding and the use of sequence databases can help speed up this process (Ruiter et al., 2013;DeSalle, Egan and Siddall, 2005;Jinbo et al., 2011;Pauls et al., 2010;Zhou et al., 2007). The Barcode of Life Database (BOLD) contains DNA barcodes for approximately 260,000 species, including 4555 Trichoptera species, and enables species identification using the COI DNA gene component I (Mir et al., 2020)." ...
Chapter
Full-text available
In India, the phylogenetic studies of Trichoptera at the species level are poorly known; however, they are important for studying the freshwater fauna and censorious use in biological indication. India records the lowest number of insect DNA barcodes in BOLD. This chapter is concentrating on the phylogenetic study of Trichoptera found in India. The Himalayas are representing the largest association of caddisflies. The "DNA Barcoding" approach is based on the sequencing of the systemized portion of the "mitochondrial (mt) COX subunit 1 (COI)" gene to enable fast and exact identification of a wide range of faunal samples, which are used to infer evolutionary relationships within Trichoptera. The MT-CO1 gene encodes a component called cytochrome c oxidase I (COX1), also known as mitochondrial cytochrome c oxidase I (MT-CO1). The gene sequences of interest are synthesized at designated barcoding centers, and the sequences are then submitted to the gene bank to infer evolutionary relationships.
... Because it is not always feasible to discriminate between species belonging to the same genus, larval taxonomy is complex and time-consuming " (Burington, 2011;Ruiter et al., 2013)." "By permitting fast surveys of novel regions, In combination with traditional taxonomy, DNA barcoding and the use of sequence databases can help speed up this process (Ruiter et al., 2013;DeSalle, Egan and Siddall, 2005;Jinbo et al., 2011;Pauls et al., 2010;Zhou et al., 2007). The Barcode of Life Database (BOLD) contains DNA barcodes for approximately 260,000 species, including 4555 Trichoptera species, and enables species identification using the COI DNA gene component I (Mir et al., 2020)." ...
Chapter
In India, the phylogenetic studies of Trichoptera at the species level are poorly known; however, they are important for studying the freshwater fauna and censorious use in biological indication. India records the lowest number of insect DNA barcodes in BOLD. This chapter is concentrating on the phylogenetic study of Trichoptera found in India. The Himalayas are representing the largest association of caddisflies. The "DNA Barcoding" approach is based on the sequencing of the systemized portion of the "mitochondrial (mt) COX subunit 1 (COI)" gene to enable fast and exact identification of a wide range of faunal samples, which are used to infer evolutionary relationships within Trichoptera. The MT-CO1 gene encodes a component called cytochrome c oxidase I (COX1), also known as mitochondrial cytochrome c oxidase I (MT-CO1). The gene sequences of interest are synthesized at designated barcoding centers, and the sequences are then submitted to the gene bank to infer evolutionary relationships.
Article
This chapter on the history of the DNA barcoding enterprise attempts to set the stage for the more scholarly contributions in this volume by addressing the following questions. How did the DNA barcoding enterprise begin? What were its goals, how did it develop, and to what degree are its goals being realized? We have taken a keen interest in the barcoding movement and its relationship to taxonomy, collections, and biodiversity informatics more broadly considered. This chapter integrates our two different perspectives on barcoding. DES was the Executive Secretary of the Consortium for the Barcode of Life from 2004 to 2017, with the mission to support the success of DNA barcoding without being directly involved in generating barcode data. RDMP viewed barcoding as an important entry into the landscape of biodiversity data, with many potential linkages to other components of that landscape. We also saw it as a critical step toward the era of international genomic research that was sure to follow. Like the Mercury Program that paved the way for lunar landings by the Apollo Program, we saw DNA barcoding as the proving grounds for the interdisciplinary and international cooperation that would be needed for success of whole-genome research.
Preprint
Full-text available
The mountains in the Atlantic Forest domain are environments that harbor a high biodiversity, including species adapted to colder climates that were probably influenced by the climatic variations of the Pleistocene. To understand the phylogeographic pattern and assess the taxonomic boundaries between two sister montane species, a genomic study of the butterflies Actinote mantiqueira and A. alalia (Nymphalidae: Acraeini) was conducted. Analyses based on the COI barcode region failed to recover any phylogenetic or genetic structure discriminating the two species or sampling localities. However, SNPs gathered using GBS provided a strong isolation pattern in all analyses (genetic distance, phylogenetic hypothesis, clustering analyses, and FST statistics) that is consistent with morphology, separating all individuals of A. alalia from all populations of A. mantiqueira. The three sampled mountain ranges where A. mantiqueira populations occur — Serra do Mar, Serra da Mantiqueira, and Poços de Caldas Plateau — were identified as three isolated clusters. Paleoclimate simulations indicate that both species’ distributions changed according to climatic oscillations in the Pleistocene period, with the two species potentially occurring in areas of lower altitude during glacial periods when compared to the interglacial periods (as the present). Besides, a potential path between their distribution through the Serra do Mar Mountain range was inferred. Therefore, the Pleistocene climatic fluctuation had a significant impact on the speciation process between A. alalia and A. mantiqueira, which was brought on by isolation at different mountain summits during interglacial periods, as shown by the modeled historical distribution and the observed genetic structure.
Article
Full-text available
Costa Rica is within the Mesoamerican biodiversity hotspot and has about 53 native species of small mammals. This high diversity, along with recent records of new species and indications of cryptic genetic diversity, suggest that application of the DNA barcoding approach would be worthwhile. Here we used 131 tissue samples of small mammals from multiple localities in Costa Rica and sequenced the complete mitochondrial cytochrome b (1140 bp). These samples represented 17 recognized species and two taxa of uncertain status. The new sequence data were supplemented with previously published data from INSDC. Our phylogenetic analyses are consistent with and extend upon recent revisions in Heteromys , Peromyscus and Reithrodontomys and suggest possible new cryptic forms within what are currently named Melanomys chrysomelas , Nyctomys sumichrasti and Proechimys semispinosus. The previously named “ Heteromys sp” is indeed likely a new species requiring a full taxonomic description. Moreover, we found new localities for previously described species substantiating recent taxonomic surveys and field guides for the small mammals of Costa Rica . To confirm the presence of cryptic species and major genetic forms in Heteromys , Peromyscus , Reithrodontomys , Melanomys , Nyctomys and Proechimys there needs to be greater sampling, additional genetic markers, morphometrics and other studies. Scotinomys also shows interesting phylogenetic subdivision, requiring further investigation.
Article
Full-text available
In an age of species declines, delineating and discovering biodiversity is critical for both taxonomic accuracy and conservation. In recent years, there has been a movement away from using exclusively morphological characters to delineate and describe taxa and an increase in the use of molecular markers to describe diversity or through integrative taxonomy, which employs traditional morphological characters, as well as genetic or other data. Tiger beetles are charismatic, of conservation concern, and much work has been done on the morphological delineation of species and subspecies, but few of these taxa have been tested with genetic analyses. In this study, we tested morphologically based taxonomic hypotheses of polymorphic tiger beetles in the Eunota circumpicta (LaFerté-Sénectère, 1841) species complex using multilocus genomic and mtDNA analyses. We find multiple cryptic species within the previous taxonomic concept of Eunota circumpicta, some of which were historically recognized as subspecies. We found that the mtDNA and genomic datasets did not identify the same taxonomic units and that the mtDNA was most at odds with all other genetic and morphological patterns. Overall, we describe new cryptic diversity, which raises important conservation concerns, and provide a working example for testing species and subspecies validity despite discordant data.
Article
Full-text available
Although much biological research depends upon species diagnoses, taxonomic expertise is collapsing. We are convinced that the sole prospect for a sustainable identification capability lies in the construction of systems that employ DNA sequences as taxon 'barcodes'. We establish that the mitochondrial gene cytochrome c oxidase I (COI) can serve as the core of a global bioidentification system for animals. First, we demonstrate that COI profiles, derived from the low-density sampling of higher taxonomic categories, ordinarily assign newly analysed taxa to the appropriate phylum or order. Second, we demonstrate that species-level assignments can be obtained by creating comprehensive COI profiles. A model COI profile, based upon the analysis of a single individual from each of 200 closely allied species of lepidopterans, was 100% successful in correctly identifying subsequent specimens. When fully developed, a COI identification system will provide a reliable, cost-effective and accessible solution to the current problem of species identification. Its assembly will also generate important new insights into the diversification of life and the rules of molecular evolution.
Article
Full-text available
Data in this study confirm the validity of a species of muntjac, Muntiacus rooseveltorum, that has been controversial for 60 years. Diagnostic DNA characters are presented for each species examined including the M. rooseveltorum holotype. Three specimens of a recently collected small Laos barking deer have identical sequences to the type specimen of M. rooseveltorum. These DNA characters unambiguously diagnose the newly collected specimens as M. rooseveltorum. This study highlights the importance of continued field surveys in remote regions and the utility of diagnostic DNA characters in identifying species.
Article
Full-text available
Molecular species identification methods are an important component of CITES monitoring programs for trade in sturgeon and caviar. To date, obtaining molecular evidence for distinguishing caviar from four closely related Eurasian sturgeon species Acipenser baerii (Siberian sturgeon), A. gueldenstaedtii (osetra), A. persicus (Persian sturgeon), A. naccarii (Italian sturgeon) remains problematic. Using approximately 2.3 kb of mtDNA sequence data (cytochrome b, NADH5, control region), we find this to be attributable to the polyphyletic nature of these mitochondrial DNA markers in the Russian sturgeon, A. gueldenstaedtii. Two mitochondrial lineages are present within this species: one is phylogenetically affiliated with A. persicus and A. naccarii, while the other clusters with A. baerii. These findings have a direct impact on molecular testing of commercial caviar and demonstrate the necessity of using large sample sizes when constructing forensic databases. Furthermore, the results affect current taxonomic designations for these species as well as hypotheses concerning their evolutionary origins.
Article
A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.
Article
Character congruence, the principle of using all the relevant data, and character independence are important concepts in phylogenetic inference, because they relate directly to the evidence on which hypotheses are based. Taxonomic congruence, which is agreement among patterns of taxonomic relationships, is less important, because its connection to the underlying character evidence is indirect and often imperfect. Also, taxonomic congruence is difficult to justify, because of the arbitrariness involved in choosing a consensus method and index with which to estimate agreement. High levels of character congruence were observed among 89 biochemical and morphological synapomorphies scored on 10 species of Epicrates. Such agreement is consistent with the phylogenetic interpretation attached to the resulting hypothesis, which is a consensus of two equally parsimonious cladograms: (cenchria (angulifer (striatus ((chrysogaster, exsul) (inornatus, subflavus) (gracilis (fordii, monensis)))))). Relatively little (11.4%) of the character incongruence was due to the disparity between the biochemical and morphological data sets. Each of the clades in the consensus cladogram was confirmed by two or more unique and unreversed novelties, and six of the eight clades were corroborated by biochemical and morphological evidence. Such combinations of characters add confidence to the phylogenetic hypothesis, assuming the qualitatively different kinds of data are more likely to count as independent than are observations drawn from the same character system. Most of the incongruence occurred in the skeletal subset of characters, and much of that independent evolution seemed to be the result of paedomorphosis.