Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding.
ABSTRACT DNA barcoding should provide rapid, accurate and automatable species identifications by using a standardized DNA region as a tag. Based on sequences available in GenBank and sequences produced for this study, we evaluated the resolution power of the whole chloroplast trnL (UAA) intron (254-767 bp) and of a shorter fragment of this intron (the P6 loop, 10-143 bp) amplified with highly conserved primers. The main limitation of the whole trnL intron for DNA barcoding remains its relatively low resolution (67.3% of the species from GenBank unambiguously identified). The resolution of the P6 loop is lower (19.5% identified) but remains higher than those of existing alternative systems. The resolution is much higher in specific contexts such as species originating from a single ecosystem, or commonly eaten plants. Despite the relatively low resolution, the whole trnL intron and its P6 loop have many advantages: the primers are highly conserved, and the amplification system is very robust. The P6 loop can even be amplified when using highly degraded DNA from processed food or from permafrost samples, and has the potential to be extensively used in food industry, in forensic science, in diet analyses based on feces and in ancient DNA studies.
- SourceAvailable from: Kunlin Song[Show abstract] [Hide abstract]
ABSTRACT: DNA was isolated from the sapwood, transition wood and heartwood of fresh and dried Cunninghamia lanceolata wood using two DNA extraction protocols: the modified CTAB method and the modified Qiagen kit. Our major objective was to (i) determine an optimized method for retrieving good quality and sufficient quantity of DNA from wood, and to (ii) investigate the effect of different radial positions of fresh and dried wood for DNA extraction. In comparison with the modified CTAB method, a greater quantity of higher quality DNA – both chloroplast and nuclear ribosomal DNA – was retrieved using the Qiagen kit protocol. The chloroplast DNA regions retrieved from both fresh and dried wood were successfully amplified using both protocols, but the PCR amplification for the rDNA-ITS region from the heartwood failed using both protocols. The quantity and purity of the DNA from the sapwood and transition wood (derived from nuclei and plastids in the parenchyma cells) was greater than that from the heartwood (derived mainly from amyloplasts). Due to the influence of the drying treatment, the quantity of DNA decreased by more than 50%. The optimized radial position for DNA extraction in the stem was demonstrated based on anatomical observation.IAWA journal / International Association of Wood Anatomists 01/2012; 33(4):441-456. · 0.96 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Closely related sympatric species commonly develop different ecological strategies to avoid competition. Ctenomys minutus and C. flamarioni are subterranean rodents parapatrically distributed in the southern Brazilian coastal plain, showing a narrow sympatric zone. To gain understanding on food preferences and possible competition for food resources, we evaluated their diet composition performing DNA metabarcoding analyzes of 67 C. minutus and 100 C. flamarioni scat samples, collected along the species geographical ranges. Thirteen plant families, mainly represented by Poaceae, Araliaceae, Asteraceae and Fabaceae, were identified in the diet of C. minutus. For C. flamarioni, 10 families were recovered, with a predominance of Poaceae, Araliaceae and Asteraceae. A significant correlation between diet composition and geographical distance was detected in C. minutus, whereas the diet of C. flamarioni was quite homogeneous throughout its geographical distribution. No significant differences were observed between males and females of each species. However, differences in diet composition between species were evident according to multivariate analysis. Our results suggest some level of diet partitioning between C. flamarioni and C. minutus in the sympatric region. While the first species is more specialized on few plant items, the second showed a more varied and heterogeneous diet pattern among individuals. These differences might have been developed to avoid competition in the region of co-occurrence. Resource availability in the environment also seems to influence food choices. Our data indicate that C. minutus and C. flamarioni are generalist species, but that some preference for Poaceae, Asteraceae and Araliaceae families can be suggested for both rodents. Heredity advance online publication, INTRODUCTION According to the competitive exclusion principle, two complete competitor species cannot coexist in the same area at the same time under a limited resource (Gause, 1934). In real populations, complete ecological overlap is unexpected, as individuals within and between species can use the resources in their habitat differently. Ecological differences may vary depending on the population density, community composition, environment heterogeneity and throughout time and space. These ecological differences may allow the long-term coexistence of species in competitive communities, and hence add to community diversity. However, competition can be increased between sympatric congeners due to their similarities, particularly when resources become scarce. To reduce the ecological superposition and avoid competition, closely related species commonly use different habitats or microhabitats; can present differences in their diet composition; or can be active at different periods of time (Pianka, 2011). The investigation of diet composition is one of the first steps to better understand the ecological mechanisms involved in the avoidance of competition between closely related sympatric species. However, to determine whether the diet composition of species with overlapping distribution is influenced by the presence of closely related competitors, it is important to assess feeding habits when species are distributed in allopatry. The burrowing rodents of the genus Ctenomys can provide a useful study system to explore questions related to ecological overlapping and competition for food resources. Subterranean rodents of the genus Ctenomys commonly show allopatric or parapatric species distribution, and overlapping zones, when present, are very narrow (Lessa, 2000). The first reported case of sympatric species for the genus Ctenomys was that of C. australis and C. talarum (Reig et al., 1990). A comparative study of their diets revealed that these species have small microhabitat differences that influence their preferences for different plant species within their home ranges (Comparatore et al., 1995). Little is known about the dietary composition of ctenomyid species. Some of the few existing studies were conducted using microhisto-logical techniques to analyze fecal contents of C. australis, C. talarum, C. mendocinus and C. eremophilus (Comparatore et al., 1995; Valle et al., 2001; Rosi et al., 2009; Albanese et al., 2010). These subterranean rodents are herbivorous and usually collect their food above ground,Heredity 02/2015; · 3.80 Impact Factor
- Journal of medicinal plant research 11/2011; 5(28). · 0.59 Impact Factor
Power and limitations of the chloroplast trnL (UAA)
intron for plant DNA barcoding
Alice Valentini1,4,5, Thierry Vermat6, Ge ´rard Corthier7, Christian Brochmann8and
1Laboratoired’EcologieAlpine,CNRSUMR5553,Universite ´ JosephFourier,BP53,38041GrenobleCedex9,France,
2Laboratoire Adaptation et Pathoge ´nie des Microorganismes, CNRS UMR 5163, Universite ´ Joseph Fourier, BP 170,
38042 Grenoble Cedex 9, France,3INRIA Rho ˆne-Alpes, He ´lix Project, 655 Avenue de l’Europe, 38334 Montbonnot
Cedex,France,4DipartimentodiEcologiaeSviluppoEconomicoSostenibile,Universita ` degliStudidellaTuscia,viaS.
Giovanni Decollato 1, 01100 Viterbo, Italy,5Department of Ecology and Natural Resource Management, Norwegian
University of Life Sciences, PO Box 5003, No-1432 A˚s, Norway,6Bioinformatics, GENOME Express, 11 Chemin des
Pre ´s, 38944 Meylan, France,7UR 910 Ecologie et Physiologie du Syste `me Digestif, INRA Domaine de Vilvert, 78352
Jouy-en-Josas Cedex, France,8National Centre for Biosystematics, Natural History Museum, University of Oslo, PO
Box 1172 Blindern, NO-0318 Oslo, Norway and9Center for Ancient Genetics, Niels Bohr Institute & Biological
Institutes, University of Copenhagen, Juliane Maries vej 30, DK-2100 Copenhagen, Denmark
Received June 29, 2006; Revised September 21, 2006; Accepted October 16, 2006
DNA barcoding should provide rapid, accurate
and automatable species identifications by using
a standardized DNA region as a tag. Based on
sequences available in GenBank and sequences
produced for this study, we evaluated the resolution
power of the whole chloroplast trnL (UAA) intron
(254–767 bp) and of a shorter fragment of this
intron (the P6 loop, 10–143 bp) amplified with highly
conserved primers. The main limitation of the whole
trnL intron for DNA barcoding remains its relatively
low resolution (67.3% of the species from GenBank
unambiguously identified). The resolution of the
P6 loop is lower (19.5% identified) but remains
higher than those of existing alternative systems.
The resolution is much higher in specific contexts
such as species originating from a single ecosys-
tem, or commonly eaten plants. Despite the rela-
tively low resolution, the whole trnL intron and its
P6 loop have many advantages: the primers are
highly conserved, and the amplification system is
very robust. The P6 loop can even be amplified
when using highly degraded DNA from processed
food or from permafrost samples, and has the
potential to be extensively used in food industry,
in forensic science, in diet analyses based on feces
and in ancient DNA studies.
DNA barcoding is a relatively new concept (1,2), aiming
to provide rapid, accurate and automatable species identi-
fications by using a standardized DNA region as a tag (3).
As recently pointed out by Chase et al. (4), there are two
categories of potential DNA barcode users: taxonomists and
scientists in other fields (e.g. forensic science, biotechnology
and food industry, animal diet).
According to the current technology, the ideal DNA
barcoding system should meet the following criteria. First,
it should be sufficiently variable to discriminate among all
species, but conserved enough to be less variable within
than between species. Second, it should be standardized,
with the same DNA region as far as possible used for differ-
ent taxonomic groups. Third, the target DNA region should
contain enough phylogenetic information to easily assign
species to its taxonomic group (genus, family, etc.). Fourth,
it should be extremely robust, with highly conserved priming
sites, and highly reliable DNA amplifications and sequencing.
This is particularly important when using environmental
DNA where each extract contains a mixture of many species
to be identified at the same time. Fifth, the target DNA region
should be short enough to allow amplification of degraded
DNA. Unfortunately, such an ideal DNA marker does not
exist. However, for different category of users (i.e. taxono-
mists versus scientists in other fields), the five criteria listed
above will not be equally important. For example, a high
level of variation with sufficient phylogenetic information
will be most important for taxonomists. In contrast, the levels
*To whom correspondence should be addressed. Tel: +33 476 51 45 24; Fax: +33 476 51 42 79; Email: firstname.lastname@example.org
? 2006 The Author(s).
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Published online 14 December 2006Nucleic Acids Research, 2007, Vol. 35, No. 3e14
of standardization and robustness will be most important in
forensics or when analyzing processed food.
So far, methodological papers published on DNA barcod-
ing have typically dealt with the most suitable region of the
genome according to the taxonomists’ point of view [e.g.
Ref. (5–7)]. In animals, the 50fragment of the mitochondrial
gene for the cytochrome oxidase subunit I (COI or COXI)
represents a good candidate [e.g. Ref. (5,8,9)]. However, there
is no consensus in the scientific community, and 16S rRNA,
another mitochondrial gene, or the nuclear ribosomal DNA
have also been proposed as useful barcoding markers (7,10).
In plants, the situation is much more difficult, because both
the mitochondrial and chloroplast genomes are evolving too
slowly to provide enough variation. For taxonomists, the cur-
rent strategy is to sequence several DNA regions (4), including
both nuclear and chloroplast fragments such as the internal
transcribed spacer (ITS) region of the 18S–5.8S–26S nuclear
ribosomal cistron (11) or the chloroplast trnH–psbA region (6).
In this study, we approach the plant DNA barcoding
problem in another way, by emphasizing the point of view
of scientists other than taxonomists, looking for standardized
and robust methodologies. For this purpose, we must find a
genome region as variable as possible, but bearing the possi-
bility of designing highly conserved PCR primers that amplify
a very short DNA region, of no more than 100–150 bp. Such a
short region should allow reliable amplifications of even highly
degraded DNA found in processed food or in fossil remains.
Up to now, when working with substrates such as ancient
DNA, the strategy has been to use primers based on the chloro-
plast rbcL gene (12), but this system only allows in most cases
the identification of families, not genera or species.
The chloroplast trnL (UAA) intron may represent a good
target region for our purpose. Its sequences have been widely
used for reconstructing phylogenies between closely related
species (13–15) or for identifying plant species (16,17).
Nevertheless, it is widely recognized that it does not represent
the most variable non-coding region of chloroplast DNA (18),
but it bears some unique advantages. Universal primers
for this region were designed ?15 years ago (19), and sub-
sequently extensively used, mainly in phylogenetic studies
among closely related genera and species (20). The evolution
of the trnL (UAA) intron has been thoroughly analyzed and is
well understood (21,22). Furthermore, this region is the only
Group I intron in chloroplast DNA (23,24). This means that it
has a conserved secondary structure (25,26) with alternation
of conserved and variable regions (22). As a consequence,
the alignment of diverse trnL intron sequences might allow
the design of new versatile primers embedded in conserved
regions and amplifying the short variable region in between.
More specifically, our objective in this paper is to evaluate
the power and the limitations of the chloroplast trnL (UAA)
intron for plant DNA barcoding, and to assess the possibility
for designing a new system allowing species identification
with highly degraded DNA.
MATERIALS AND METHODS
The power and the robustness of the trnL intron for DNA
barcoding were first evaluated with the data available in
GenBank. Then, they were evaluated on two specific datasets
by sequencing the whole intron for more than 100 plant
species originating from the same environment, and by com-
piling sequences of the main plants used in the food industry.
Finally, we tested the robustness of a new pair of internal
primers applied on different substrates supposed to contain
highly degraded DNA.
Figure 1 presents the location of the primers in the chloro-
plast trnL (UAA) gene, and Table 1 gives their sequences.
The primers c and d are from Taberlet et al. (19). This frag-
ment encompasses the entire trnL (UAA) intron plus a few
base pairs on each side belonging to the trnL (UAA) gene
itself. The primers g and h were designed for this study on
two highlyconserved regions
sequences, either from GenBank or produced earlier in the
The Arctic plant dataset
We analyzed 123 arctic plant samples collected between 1998
and 2003, partly taken from herbarium specimens and partly
from field-collected, silica-dried leaf samples deposited at the
Natural History Museum in Oslo. Total DNA was extracted
from around 10 mg of dried leaf tissue with the DNeasy
96 Plant Kit (Qiagen), following the manufacturer’s protocol.
Double-stranded DNA amplifications were performed in vol-
umes of 25 ml containing 2.5 mM MgCl2, 200 mM of each
dNTP, 1 mM of each primer and 1 U of AmpliTaq Gold?
DNA polymerase (Applied Biosystems). The trnL (UAA)
intron was amplified with primers c and d (19). Following
an activation step of 10 min at 95?C for the enzyme (Applied
Biosystems specification), the PCR mixture underwent
35 cycles of 30 s at 95?C, 30 s at 50?C and 2 min at 72?C
on a GeneAmp PCR system 2720 (Applied Biosystems).
Figure 1. Position of the primers c, d, g and h on the chloroplast trnL (UAA)
gene. The P6 loop amplified with primer g and h is indicated in green.
Table 1. Sequences of the two universal primer pairs amplifying the trnL
NameCode Sequence 50–30
Length of the amplified fragment with primers c–d in tobacco: 456 bp. Length
the 30-most base pairs in the published tobacco cpDNA sequence (23). Primers
(France patent no 2 876 378; April 14, 2006).
e14 Nucleic Acids Research, 2007, Vol. 35, No. 3
PAGE 2 OF 8
To remove excess primers and deoxynucleotide triphosphates
after amplification, PCR products were purified on QIAquick
PCR Purification Kit columns (Qiagen), according to the
manufacturer’s instructions. Sequencing was performed, on
both strands, using the BigDye?Terminator v1.1 Cycle
Sequencing Kit (Applied Biosystems) in volumes of 20 ml
containing 20 ng of purified DNA and 4 pmol of amplifica-
tion primer, according to the manufacturer’s specifications.
Sequencing reactions underwent 25 cycles of 30 s at 96?C,
30 s at 50?C and 4 min at 60?C. Excess dye terminators
were removed by a spin-column purification. Sequencing
reactions were electrophoresed for 45 min on an ABI
PRISM?3100 Genetic Analyzer (Applied Biosystems) using
36 cm capillaries and POP-4? polymer.
The Food dataset
Seventy-two sequences of the main plants used in the
food industry were retrieved from GenBank or sequenced
following the previous protocol. For this analysis, we
restricted our investigations to the short fragment amplified
with the g–h primer pair.
PCR were simulated on the full plant division of GenBank
2005 (ftp://ftp.ncbi.nlm.nih.gov/genbank). This release corre-
sponds to 731 531 entries. The electronic PCR software
(ePCR) was specially developed for this study. It is based
on the agrep algorithm (27) that allows identifying occur-
rences of a small pattern (corresponding to a PCR primer)
on a large text (genomic sequence) with a fixed maximum
mismatch count. This strategy is more relevant than simple
blast queries, which are not suitable to identify similarity
on nucleic sequences when the query sequence (here
oligonucleotide sequence) is too short. Our ePCR software
allows specifying maximum mismatch count, minimum
and maximum length of the amplified region and takes care
to also retrieve taxonomic data from analyzed entries.
It works on Genbank, EMBL or fasta formatted sequence
files (in the latter case, taxonomic data must be encoded in
a special format on the title line). The ePCR software is avail-
able for academic users upon e-mail request to Eric Coissac
ePCR was realized on GenBank data, first with the c and d
primers, second with the g and h primers, third on a short
rbcL fragment with the h1aF and h2aR primers (12), and
finally with eight primer pairs found in Shaw et al. (18).
ePCR was also realized on the arctic plant dataset with the
c and d primers (after adding the c and d sequences on
each side of the sequenced PCR product), and with the g
and h primers.
Next, amplicon databases constructed by the ePCR soft-
ware were analyzed to extract taxonomic specificities of the
amplified sequences. This analysis used the taxonomic classi-
fication provided by NCBI to assess taxonomic relationships
between sequences. The main goal of this analysis was to
determine the proportion of the species, genera and families
unambiguously identified by the sequences amplified via
ePCR. A taxon (species, genus or family) was defined as
‘unambiguously identified’ if all the sequences associated
onthe December 14,
with this taxon are not found in any other taxa. To limit the
influence of the taxonomic coverage of the GenBank data-
base, we discarded genera represented by only one species
and families represented by only one genus. The same mea-
sure of specificity was applied to the arctic plant dataset
described above. We also assessed the intraspecific variation
of the whole trnL intron and of the short P6 loop fragment by
extracting, from the GenBank amplicon database constructed
by the ePCR software, all the species represented by more
than one entry.
The universality of the four primers c, d, g and h was exam-
ined by comparing their sequences with homologous
sequences, either from GenBank (for primers c, d, g and h)
or produced in this study (for primers g and h).
Robustness of the system for biotechnological
To illustrate the possibility of using the g–h primer pair in
biotechnology, we retrieved from GenBank some sequences
corresponding to common plant species frequently used in
food industry. To demonstrate the robustness of the system
using the g and h primers, we tried to amplify this fragment
in several highly degraded templates, such as processed food
(four samples: brown sugar from sugar cane, cooked potatoes,
cooked pasta and lyophilized potage), human feces (two
samples) and permafrost samples (four samples). Appropriate
criteria for the retrieval of highly degraded DNA were
followed (28). This included DNA extraction and PCR
setup in dedicated and isolated ancient DNA facilities in
Grenoble and Copenhagen, and the use of multiple extraction
and PCR blank controls. Importantly, the permafrost sample
had been drilled spiking the drilling apparatus with a recog-
nizable bacterial vector (pCR4-TOPO; Stratagene) to test
for contamination during drilling and handling. After arrival
(frozen) in the laboratory, ?2–3 cm of the core surfaces was
removed. The outer scrape and the interior core material were
subjected to DNA extractions followed by 40 cycles of PCR
using vector-specific primers T3/T7. No vector contaminants
were detected in the inner core extracts used for the plant
DNA studies. For processed food, total DNA was extracted
from 50 mg of dried material using the DNeasy Tissue
Kit (Qiagen) following the manufacturer’s instructions. The
DNA extract was recovered in a volume of 200 ml. Total
DNA was extracted according to Godon et al. (29) and to
Willerslev et al. (30) for the human feces and the permafrost
sample, respectively. DNA amplifications were carried out
using the primers g and h in final volume of 25 ml, using
2.5 ml of DNA extract as template. The amplification mixture
contained 1 U of AmpliTaq?Gold DNA Polymerase
(Applied Biosystems), 10 mM Tris–HCl, 50 mM KCl,
2 mM of MgCl2, 0.2 mM of each dNTPs, 1 mM of each
primer (for some experiments, the g primer was labeled
with the HEX fluorochrome, or the h primer was labeled
with the FAM fluorochrome), and 200 mg/ml of BSA
(Roche). After 10 min at 95?C (Taq activation), the PCR
cycles were as follows: 35 cycles of 30 s at 95?C, 30 s at
55?C and 30 s at 72?C, except for the sugar extract for
which we performed 50 cycles, and for the amplifications
PAGE 3 OF 8
Nucleic Acids Research, 2007, Vol. 35, No. 3 e14
with the fluorescent g primer for which we removed the
elongation time in order to reduce the +A artefact (31,32).
PCR products obtained with the fluorescent g or h primers
were electrophoresed for 35 min on an ABI PRISM?
3100 Genetic Analyzer (Applied Biosystems) using 36 cm
capillaries and POP-4? polymer. PCR products obtained
with non-fluorescent primers were either directly sequenced,
or cloned (except for the permafrost samples) if the sequences
obtained with direct sequencing were not readable (i.e. a
mixture of different sequences).
The three datasets
Via the ePCR with primers c and d we retrieved 1308
sequences from GenBank, corresponding to 706 species,
366 genera and 119 families (excluding all sequences with
at least one ambiguous nucleotide, and excluding genera
with a single species and families with a single genera).
With primers g and h, we retrieved 18200 sequences,
corresponding to 11404 species, 4215 genera and 410 fami-
lies. These 18200 sequences give a good evaluation of the
number of chloroplast trnL (UAA) intron sequences in Gen-
Bank. The much lower number obtained for the c–d ePCR is
simply due to the fact that the recorded sequences do not con-
tain the primer sequences, and thus are not ‘amplified’ via our
ePCR approach. The arctic plant dataset produced for this
study consists of 132 species, 58 genera and 28 families
(GenBank accession nos DQ860511–DQ860642). The food
dataset analyzed for primers g and h, consists of 72 species,
64 genera and 37 families retrieved from GenBank, or pro-
duced for this study (GenBank accession numbers of species
sequenced for this study: EF010967–EF010973).
For all datasets, the length of the sequences amplified with
c and d varies from 254 to 767 bp, and the length of the P6
loop amplified with g and h varies from 10 bp in Cuscuta
indecora to 143 bp in Schoenoplectus littoralis.
Universality of primer sites
Table 1 presents the sequences of the two primer pairs c–d,
and g–h. Figure 2 shows the exact positions of the four
Figure 2. Positions of the primers c and d on the secondary structure of the trnL (UAA) exon (A) and of the primers g and h on the secondary structure of the trnL
(UAA) intron (B) for Nymphaea odorata [modified from Ref. (33)]. Highly conserved elements of the catalytic core (P, Q, R1, R2 and S) are located in grey
boxes. The P6 loop, amplified with primers g and h, is identified by green letters. The 30ends of each of the four primers c, d, g and h are marked out by an arrow
and their positions are identified by red letters.
e14Nucleic Acids Research, 2007, Vol. 35, No. 3
PAGE 4 OF 8
primers used in the secondary structure RNAs produced by
both the trnL (UAA) exon and the trnL (UAA) intron.
Primers g and h are located on highly conserved catalytic
parts of the intron, leading to the amplification of the short
Table 2 shows the variation at the priming sites. Only
sequence variants with a frequency of more than 0.005
were listed. Primers c and d are highly conserved among
land plants, from Angiosperms to Bryophytes. Even in
some algae, this primer pair has the potential to produce
PCR products. The very large number of trnL (UAA) intron
sequence retrieved as well as those produced for this study
allowed an extensive evaluation of the universality of primers
g and h. These new primers are highly conserved in Angios-
perms and Gymnosperms.
Proportions of species, genera and families identified
Table 3 shows the percentage of species, genera and families
properly identified using the primer pairs c–d and g–h in both
the GenBank and arctic plant datasets, and the primer pair
h1aF–h2aR (12). Globally, on the GenBank dataset, the
entire trnL (UAA) intron and the P6 loop amplified with pri-
mers g and h allow the identification of 67.3 and 19.5% of
the species without taking into account single species within
a genus, respectively. However, these values are probably
underestimates, because of the possibility of misidentification
in GenBank (i.e. a wrong species assignment, either by mis-
identification of the specimen, by problems of synonymy
or by PCR contamination). The ePCR using other primer
pairs found in Shaw et al. (18), which amplify psbB-psbH,
rpoB-trnC (GCA), rpS16 intron, trnD (GUC)-trnT (GGU),
trnH (GUG)-psbA and trnS (UGA)-trnfM (CAU), never
retrieved more than 100 sequences, and were not taken
into account. Table 4 illustrates the sequence variation of
g-h amplicons for commonly eaten plant species.
Among all the amplicons retrieved from GenBank by using
the ePCR software, the percentage of species represented by
more than a single entry was 11% for the whole trnL intron
and 14% for the P6 loop. This subset of sequences allowed to
estimate the lower and upper limits of the intraspecific vari-
ability. The lower limit was estimated assuming no variation
in species represented by a single entry in GenBank, and the
upper limit by taking into account only species represented by
more than one entry in GenBank. The intraspecific variability
lies between 5.9 and 55.0% for the whole intron, and 3.4 and
24.1% for the P6 loop. However, the upper values certainly
represent a large overestimation of the real values, because
a single entry in GenBank might correspond to many
analyzed individuals from the same species. Furthermore,
for the P6 loop, the intraspecific polymorphism does not com-
promise the species identification in 85 cases out of 481.
Robustness of the system using the g and h primers
We obtained PCR products with 35 cycles for all the samples
analyzed, except for the sugar sample, for which 50 cycles
were necessary. After electrophoresis of the fluorescent
PCR products, some samples gave a single peak (data not
shown; sugar, cooked potatoes, cooked pasta) while all the
other samples gave a multi-peak profile. The sequences
obtained after direct sequencing for the three samples that
gave a single peak correspond to sugarcane (Saccharum
officinarum), potato (Solanum tuberosum) and wheat (Triti-
cum vulgare). Figure 3 illustrates the multi-peak profiles
obtained after electrophoresis of the fluorescent PCR products
for more than 20 000 years old permafrost sample, and for a
human fecal sample. The PCR products of the lyophilized
potage and of the human feces were cloned and sequenced.
Table 5 shows the sequences obtained after cloning the PCR
product obtained from the lyophilized potage. Twenty-three
clones were sequenced, and three species were unambigu-
ously identified: leek (Allium porum), potato (S.tuberosum)
and onion (Allium cepa). The same approach was used
for the human feces, and the plant species identified are
banana (Musa acuminata), lettuce (Lactuca sativa) and
cacao (Theobroma cacao).
Table 2. Sequence variation of priming site for primer c, d, g and h
% SpeciesAcc. no.
Only variants at a frequency higher than 0.005 are indicated. A total of 1014 and 14 145 GenBank entries were used for the primer pairs c–d and g–h,
respectively. %: percentage of sequence variants found in GenBank. Species: Example of species corresponding to the sequence variant. Acc. no.: accession
number in GenBank.
PAGE 5 OF 8
Nucleic Acids Research, 2007, Vol. 35, No. 3e14
DNA barcoding concerns two categories of scientists:
taxonomists and scientists in fields other than taxonomy (4).
The goal of this paper was to evaluate the potential use of the
chloroplast DNA trnL (UAA) intron for plant DNA barcod-
ing in areas other than taxonomy. We will first discuss the
drawbacks of this molecular marker, and then its advantages.
The main, and maybe the only but extremely important
drawback is the relatively low resolution of the trnL (UAA)
intron compared with several other noncoding chloroplast
regions. This has already been pointed out in several studies
(6,18). It is clear that the trnL intron does not represent
the best choice for characterizing plant species and for
phylogenetic studies among closely related species. Obvi-
ously, this drawback is even more dramatic when using the
very short P6 loop (amplified with primers g and h), but on
the same subset of species, the short P6 loop performs signifi-
cantly better than the alternative system used to date when
analyzing highly degraded DNA [rbcL fragment amplified
with h1aF and h2aR (12)]. Finally, even if the proportion
of species unambiguously identified with the P6 loop seems
low (around 20%), usually only closely related species are
It is interesting to note that the relatively low resolution of
the trnL (UAA) intron is logically linked to a lower intraspe-
cific variation, compared with other noncoding regions of
Table 3. Percentages of species, genera and families identified using the chloroplast trnL (UAA) intron, the P6 loop of this intron and comparison with another
cpDNA gene and dataset Length variation
No. of species/genera/
Species (%)Genus (%) Family (%)
Chloroplast trnL (UAA) intron amplified with primers
c and d. GenBank dataset
Chloroplast trnL (UAA) intron amplified with primers
c and d. Arctic plant dataset
P6 loop of trnL intron amplified with primers
g and h. GenBank dataset
P6 loop of trnL intron amplified with primers
g and h. Arctic plant dataset
P6 loop of trnL intron amplified with primers
g and h. Food dataset
P6 loop of trnL intron amplified with primers
g and h. Subset of the GenBank datasetc
rbcL amplified with primers h1aF and h2aR (12).
Subset of the GenBank datasetc
355–653103/47/2485.44 100.00 100.00
10–14311 404/4225/310 19.4841.4079.35
22–83 106/48/2547.17 89.58 100.00
22–6572/64/3777.78 87.50 100.00
91–981524/1525/244 15.0937.51 68.03
Note thatthese estimateswere made by taking into accountgenerawith more than two species for the species identification, families withmore than two genera for
genus identification, and orders with more than two families for family identification.
aLength in base pairs excluding primers.
bExcluding families with a single genera, genera with a single species and species alone in a genus except for food dataset.
cBased on species in common between the g–h and the h1aF–h2aR datasets.
Table 4. Example of P6 loop [trnL (UAA)] sequences of commonly eaten plant species amplified with primers g and h
Common name Scientific nameP6 loop sequence amplified with primers g and h
e14 Nucleic Acids Research, 2007, Vol. 35, No. 3
PAGE 6 OF 8
chloroplast DNA (18). Nevertheless, even the short P6 loop
can present some intraspecific variation, due in 21.2% of
the cases to the presence of a T (or A) stretch of >10 bp long.
However, the strong drawback posed by the relatively low
resolution is compensated by several advantages. First, the
primers used to amplify both the entire region (c and d)
and the P6 loop (g and h) are extremely well conserved
(Table 2), from Bryophytes to Angiosperms for the c–d
primer pair, from Gymnosperms to Angiosperms for the
g–h pair. The primers g and h are much more conserved
than the primers h1aF and h2aR (12) targeting a protein
sequence, and thus having much more variable positions.
This advantage is particularly important when amplifying
multiple species within the same PCR. Second, the number
of trnL (UAA) intron sequences available in databases is
already very high, by far the most numerous among noncod-
ing chloroplast DNA sequences, allowing in many cases
the identification of the species or the genus. Finally, the
robustness of both systems (the entire intron and the P6
loop) also represents an important advantage. This last advan-
tage might be linked to the two previous ones, because a
robust system will incite scientists to use this region, increas-
ing the number of sequences in databases, and the robustness
mainly comes from the primer universality.
Actually, in some situations, the relatively low resolution
of the trnL intron can be largely compensated by the possi-
bilities of standardization. In many situations, the number
of possible plant species is restricted, reducing the impact
of the relatively low resolution. In our arctic plant dataset,
the number of species unambiguously identified among
123 is close to 50% for the P6 loop, and close to 85% for
the entire intron. In the same way, the eaten plant species
are few and taxonomically diverse, and can be identified in
most cases. Even the short P6 loop allows the identification
of the three commonly eaten species of the genus Solanum
(potato, tomato and eggplant), which differ by a single muta-
tion (see Table 4). However, the P6 loop does not allow the
identification of the different cultivars of the same species
[specifically, of Brassica oleracea (Brussels sprouts, Kohl
rabi, Broccoli, etc.) or of Phaseolus vulgaris (different culti-
vated varieties)]. In addition, the P6 loop cannot distinguish
most of the species of the genus Prunus (apricot, peach,
To conclude, the trnL (UAA) intron, despite its relatively
low resolution, provide a unique opportunity for plant DNA
barcoding in the biotechnology area, because of the univer-
sality of the c–d and g–h primers, of the robustness of
the amplification process, and of the possibility of develop-
ing highlystandardized procedures.Furthermore,the
Figure 3. Example of multi-peak profiles obtained after capillary electrophoresis of the fluorescent PCR products obtained using the g and h primers.
(A) Permafrost sample drilled from Main River Ice Bluff (N.E. Siberia, 64.06N, 171.11E), between 21 050 and 25 440 years old (uncalibrated14C years, based on
AMS dating of plant macrofossils from the section); g fluorescent primer; each peak represents at least one arctic plant species. (B) Human feces sample; h
fluorescent primer; three of the four main peaks have been identified after cloning and sequencing: peak 1, nonidentified; peak 2, banana (Musa acuminata); peak
3, lettuce (Lactuca sativa); and peak 4, cacao (Theobroma cacao).
Table 5. Sequences obtained after cloning the PCR product from the
Sequence obtained 50–30
Note that onion and leek belong to the same genus Allium, and that their
sequences differ by a single substitution.
PAGE 7 OF 8
Nucleic Acids Research, 2007, Vol. 35, No. 3e14
low-intraspecific variation represents an important advantage
if the amplicons are detected by hybridization. Even the short
P6 loop allows to gather valuable information about plant
identification and will undoubtedly become the marker of
choice for highly degraded template DNA. This P6 loop
has the potential to be extensively used in food industry, in
forensic science, in diet studies based on feces, and in per-
mafrost analyses for reconstructing past plant communities.
This study has been financially supported by an ECLIPSE II
grant (CNRS). We thank Dietmar Quandt for help with
Figure 2, and Jean-Pierre Furet for extracting the DNA from
human fecal samples. E.W. wants to thank Andrei Sher and
James Haile for helping with sample collection, Tina Brand
for assisting the lab work and The Wellcome Trust, UK and
the National Science Foundation, DK for financial support.
F.P. is supported by the French ‘Institut National de la
Recherche Agronomique’. C.B. thanks Reidar Elven and
Hanne H. Grundt for help with the arctic plant sample
collection and the Research Council of Norway (grant
146 515/420) for funding. Funding to pay the Open Access
publication charges for this article was provided by CNRS.
Conflict of interest statement. None declared.
1. Floyd,R., Abebe,E., Papert,A. and Blaxter,M. (2002) Molecular
barcodes for soil nematode identification. Mol. Ecol., 11, 839–850.
2. Hebert,P.D.N., Cywinska,A., Ball,S.L. and de Waard,J.R. (2003)
Biological identification through DNA barcodes. Proc. R. Soc.
Lond., B. Biol. Sci., 270, 313–321.
3. Hebert,P.D.N. and Gregory,T.R. (2005) The promise of DNA
barcoding for taxonomy. Syst. Biol., 54, 852–859.
4. Chase,M.W., Salamin,N., Wilkinson,M., Dunwell,J.M.,
Kesanakurthi,R.P., Haidar,N. and Savolainen,V. (2005) Land plants
and DNA barcodes: short-term and long-term goals. Philos. Trans. R.
Soc. B Biol. Sci., 360, 1889–1895.
5. Hebert,P.D.N., Ratnasingham,S. and de Waard,J.R. (2003) Barcoding
animal life: cytochrome c oxidase subunit 1 divergences among closely
related species. Proc. R. Soc. Lond. B Biol. Sci., 270, S96–S99.
6. Kress,W.J., Wurdack,K.J., Zimmer,E.A., Weigt,L.A. and Janzen,D.H.
(2005) Use of DNA barcodes to identify flowering plants. Proc. Natl
Acad. Sci. USA, 102, 8369–8374.
7. Vences,M., Thomas,M., van der Meijden,A., Chiari,Y. and Vieites,D.
(2005) Comparative performance of the 16S rRNA gene in DNA
barcoding of amphibians. Front. Zool., 2, 5.
8. Hebert,P.D.N., Penton,E.H., Burns,J.M., Janzen,D.H. and
Hallwachs,W. (2004) Ten species in one: DNA barcoding reveals
cryptic species in the neotropical skipper butterfly Astraptes fulgerator.
Proc. Natl Acad. Sci. USA, 101, 14812–14817.
9. Hebert,P.D.N., Stoeckle,M.Y., Zemlak,T.S. and Francis,C.M. (2004)
Identification of birds through DNA barcodes. PLoS Biol., 2, e312.
10. Tautz,D., Arctander,P., Minelli,A., Thomas,R.H. and Vogler,A.P.
(2003) A plea for DNA taxonomy. Trends Ecol. Evol., 18, 70–74.
11. A´lvarez,I. and Wendel,J.F. (2003) Ribosomal ITS sequences and plant
phylogenetic inference. Mol. Phylogenet. Evol., 29, 417–434.
12. Poinar,H.N., Hofreiter,M., Spaulding,W.G., Martin,P.S.,
Stankiewicz,B.A., Bland,H., Evershed,R.P., Possnert,G. and Pa ¨a ¨bo,S.
(1998) Molecular coproscopy: Dung and diet of the extinct ground
sloth Nothrotheriops shastensis. Science, 281, 402–406.
13. Scharaschklin,T. and Doyle,J.A. (2005) Phylogeny and historical
biogeography of Anaxagorea (Annonaceae) using morphology and
noncoding chloroplast sequence data. Syst. Bot., 30, 712–735.
14. McDade,L.A., Daniel,T.F., Kiel,C.A. and Vollesen,K. (2005)
Phylogenetic relationships among Acantheae (Acanthaceae): major
lineages present contrasting patterns of molecular evolution and
morphological differentiation. Syst. Bot., 30, 834–862.
15. Chen,S.Y., Xia,T., Wang,Y.J., Liu,J.Q. and Chen,S.L. (2005)
Molecular systematics and biogeography of Crawfurdia, Metagentiana
and Tripterospermum (Gentianaceae) based on nuclear ribosomal and
plastid DNA sequences. Ann. Bot., 96, 413–424.
16. Ronning,S.B., Rudi,K., Berdal,K.G. and Holst-Jensen,A. (2005)
Differentiation of important and closely related cereal plant species
(Poaceae) in food by hybridization to an oligonucleotide array.
J. Agric. Food Chem., 53, 8874–8880.
17. Ward,J., Peakall,R., Gilmore,S.R. and Robertson,J. (2005) A molecular
identification system for grasses: a novel technology for forensic
botany. Forensic Sci. Int., 152, 121–131.
18. Shaw,J., Lickey,E.B., Beck,J.T., Farmer,S.B., Liu,W., Miller,J.,
Siripun,K.C., Winder,C.T., Schilling,E.E. and Small,R.L. (2005) The
tortoise and the hare II: relative utility of 21 noncoding chloroplast
DNA sequences for phylogenetic analysis. Am. J. Bot., 92, 142–166.
19. Taberlet,P., Gielly,L., Pautou,G. and Bouvet,J. (1991) Universal
primers for amplification of three noncoding regions of chloroplast
DNA. Plant Mol. Biol., 17, 1105–1109.
20. Gielly,L. and Taberlet,P. (1996) A phylogeny of the European gentians
inferred from chloroplast trnL (UAA) intron sequences. Bot. J. Linn.
Soc., 120, 57–75.
21. Quandt,D. and Stech,M. (2005) Molecular evolution of the trnL (UAA)
intron in bryophytes. Mol. Phylogenet. Evol., 36, 429–443.
22. Quandt,D., Mu ¨ller,K., Stech,M., Frahm,J.P., Frey,W., Hilu,K.W. and
Borsch,T. (2004) Molecular evolution of the chloroplast trnL-F region
in land plants. Monogr. Syst. Bot. Missouri Botanic Garden, 98, 13–37.
23. Shinozaki,K., Ohme,M., Tanaka,M., Wakasugi,T., Hayashida,N.,
Matsubayashi,T., Zaita,N., Chunwongse,J., Obokata,J.,
Yamaguchi-Shinozaki,K. et al. (1986) The complete nucleotide
sequence of tobacco chloroplast genome: its gene organization and
expression. EMBO J., 5, 2043–2049.
24. Palmer,J.D. (1991) Plastid chromosomes: structure and evolution.
Cell Cult. Som. Cell Genet. Plants, 7A, 5–53.
25. Michel,F., Jacquier,A. and Dujon,B. (1982) Comparison of fungal
mitochondrial introns reveals extensive homologies in RNA secondary
structure. Biochimie, 64, 867–881.
26. Davies,R.W., Waring,R.B., Ray,J.A., Brown,T.A. and Scazzocchio,C.
(1982) Making ends meet—a model for RNA splicing in fungal
mitochondria. Nature, 300, 719–724.
27. Wu,S. and Manber,U. (1992) Agrep-a fast approximate pattern-
matching tool. In Proceedings of the USENIX Winter 1992 Technical
Conference, USENIX Association, Berkeley, CA, pp. 153–162.
28. Willerslev,E. and Cooper,A. (2005) Ancient DNA. Proc. R. Soc. Lond.
B, 272, 3–16.
29. Godon,J.J., Zumstein,E., Dabert,P., Habouzit,F. and Moletta,R. (1997)
Molecular microbial diversity of an anaerobic digestor as determined
by small-subunit rDNA sequence analysis. Appl. Environ. Microbiol.,
30. Willerslev,E., Hansen,A.J., Binladen,J., Brand,T.B., Gilbert,M.T.P.,
Shapiro,B., Bunce,M., Wiuf,C., Gilichinsky,D.A. and Cooper,A. (2003)
Diverse plant and animal genetic records from Holocene and
Pleistocene sediments. Science, 300, 791–795.
31. Brownstein,M.J., Carpten,J.D. and Smith,J.R. (1996) Modulation of
non-templated nucleotide addition by Taq polymerase: primer
modification that facilitate genotyping. BioTechniques, 20, 1004–1010.
32. Magnuson,V.L., Ally,D.S., Nylund,S.J., Karanjawala,Z.E.,
Rayman,J.B., Knapp,J.I., Lowe,A.L., Ghosh,S. and Collins,F.S. (1996)
Substrate nucleotide-determinated non-templated addition of adenine
by Taq DNA polymerase: implications for PCR-based genotyping and
cloning. BioTechniques, 21, 700–709.
33. Borsch,T., Hilu,K.W., Quandt,D., Wilde,V., Neinhuis,C. and
Barthlott,W. (2003) Noncoding plastid trnT-trnF sequences reveal a
well resolved phylogeny of basal angiosperms. J. Evol. Biol.,
e14Nucleic Acids Research, 2007, Vol. 35, No. 3
PAGE 8 OF 8