ArticlePDF Available

Abstract and Figures

Background. Phylogenetic and barcoding studies usually employ fresh parts of plants as the source of DNA. Successful DNA amplification has been achieved in such investigations for different regions. However, there is need for the utilization of dried samples, due to frequent inaccessibility of fresh precious plants or their parts for genetic analyses or barcoding studies. Difficulties in obtaining amplifiable DNA have appeared as one of the major pitfalls that resulted in slowdown of the use of herbarium specimens for DNA analyses. Methods. Recent study highlights the crucial issues that are being faced by comparison of herbarium and fresh plants for barcoding purposes. We analyzed the performance of samples from herbarium specimens of different age and fresh plants in PCR reaction and sequencing of seven regions (cpDNA: rbcL, rpoC1, trnL-F intergenic spacer, trnL intron, psbA-trnH, mtDNA: atp1 and nrDNA: ITS1-5.8S-ITS2) with a combination of twenty-eight primers. Conclusions. We show that herbarium specimens may be successfully applied both for phylogenetic as well as for barcoding purposes. In comparison with fresh samples, working with dried herbarium specimens is more complicated, but may lead to amplification and sequencing success in almost all cases when appropriate internal primers are designed or optimization methods are used. Both attempts are useful for this aim: using the set of universal primers recommended by CBOL and design specific primers for a particular group of interest. We found limited detrimental effect of specimen age and length of the amplicon on the amplification success in most of the tested regions in the Juncaceae.
No caption available
… 
No caption available
… 
Content may be subject to copyright.
Feasibility of using dried plant specimens for DNA barcoding.
A case study of the Juncaceae
Danka Do 1 , Lenka Záveská Drábková Corresp. 1, 2
1 Department of taxonomy, Institute of Botany Academy of Sciences of the Czech Republic, Průhonice, Czech Republic
2 Laboratory of pollen biology, Institute of Experimental Botany Academy of Sciences of the Czech Republic, Prague, Czech Republic
Corresponding Author: Lenka Záveská Drábková
Email address: lenka.zaveska.drabkova@gmail.com
Background. Phylogenetic and barcoding studies usually employ fresh parts of plants as
the source of DNA. Successful DNA amplification has been achieved in such investigations
for different regions. However, there is need for the utilization of dried samples, due to
frequent inaccessibility of fresh precious plants or their parts for genetic analyses or
barcoding studies. Difficulties in obtaining amplifiable DNA have appeared as one of the
major pitfalls that resulted in slowdown of the use of herbarium specimens for DNA
analyses. Methods. Recent study highlights the crucial issues that are being faced by
comparison of herbarium and fresh plants for barcoding purposes. We analyzed the
performance of samples from herbarium specimens of different age and fresh plants in
PCR reaction and sequencing of seven regions (cpDNA: rbcL, rpoC1, trnL-F intergenic
spacer, trnL intron, psbA-trnH, mtDNA: atp1 and nrDNA: ITS1-5.8S-ITS2) with a
combination of twenty-eight primers. Conclusions. We show that herbarium specimens
may be successfully applied both for phylogenetic as well as for barcoding purposes. In
comparison with fresh samples, working with dried herbarium specimens is more
complicated, but may lead to amplification and sequencing success in almost all cases
when appropriate internal primers are designed or optimization methods are used. Both
attempts are useful for this aim: using the set of universal primers recommended by CBOL
and design specific primers for a particular group of interest. We found limited detrimental
effect of specimen age and length of the amplicon on the amplification success in most of
the tested regions in the Juncaceae.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
1Feasibility of using dried plant specimens for DNA barcoding. A case study of the
2Juncaceae
3
4 DANKA DO1 and LENKA ZÁVESKÁ DRÁBKOVÁ1,2
5
61 Department of Taxonomy, Institute of Botany, Academy of Sciences of the Czech Republic,
7 Zámek 1, CZ-252 43 Průhonice and 2 Laboratory of Pollen Biology, Institute of Experimental
8 Botany Academy of Sciences of the Czech Republic, Rozvojová 263, 165 02 Prague 6 -
9 Lysolaje, Czech Republic, lenka.zaveska.drabkova@gmail.com
10
11 Running title: Are dried samples appropriate for DNA barcoding?
12
13 ABSTRACT
14 Background. Phylogenetic and barcoding studies usually employ fresh parts of plants as the
15 source of DNA. Successful DNA amplification has been achieved in such investigations for
16 different regions. However, there is need for the utilization of dried samples, due to frequent
17 inaccessibility of fresh precious plants or their parts for genetic analyses or barcoding studies.
18 Difficulties in obtaining amplifiable DNA have appeared as one of the major pitfalls that resulted
19 in slowdown of the use of herbarium specimens for DNA analyses.
20 Methods. Recent study highlights the crucial issues that are being faced by comparison of
21 herbarium and fresh plants for barcoding purposes. We analyzed the performance of samples
22 from herbarium specimens of different age and fresh plants in PCR reaction and sequencing of
23 seven regions (cpDNA: rbcL, rpoC1, trnL-F intergenic spacer, trnL intron, psbA-trnH, mtDNA:
24 atp1 and nrDNA: ITS1-5.8S-ITS2) with a combination of twenty-eight primers.
25 Conclusions. We show that herbarium specimens may be successfully applied both for
26 phylogenetic as well as for barcoding purposes. In comparison with fresh samples, working with
27 dried herbarium specimens is more complicated, but may lead to amplification and sequencing
28 success in almost all cases when appropriate internal primers are designed or optimization
29 methods are used. Both attempts are useful for this aim: using the set of universal primers
30 recommended by CBOL and design specific primers for a particular group of interest. We found
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
31 limited detrimental effect of specimen age and length of the amplicon on the amplification
32 success in most of the tested regions in the Juncaceae.
33
34 Keywords: DNA barcoding, dry plants, herbarium specimens, Juncaceae
35
36 INTRODUCTION
37 A traditional way of conserving specimens of higher plants is a preparation of herbarium sheets,
38 traditionally stored in the botanical collections. There are approximately 3,400 herbarium
39 collections through the world (Thiers, continuously updated). They provide the comparative
40 material essential for studies in taxonomy, phylogenetics, systematics, anatomy, morphology,
41 conservation biology, biodiversity, ecology and many other fields. Moreover, they represent a
42 veritable gold mine of information for comparative DNA studies.
43 Barcoding studies, potentially applicable in many areas of research and practice, may
44 allow rapid identification of the unidentifiable plant material. This may be useful not only in
45 specific cases of basic plant identification for research purposes and in forensic applications,
46 including the verification of composition of food supplements is required. In all these cases, only
47 dried plant material is often available for genetic analyses. Therefore, reliable techniques
48 enabling employment of dried plant material for such analyses of DNA are desirable.
49 There exist several prerequisites necessary for establishing an investigated DNA region
50 as a barcode. Among the most important issues, it must simultaneously contain unique identifier,
51 be short enough to be sequenced in one reaction and it should also contain invariant regions for
52 developing primers (e. g., Chase et al., 2007; Sass et al., 2007). The requirement for a short
53 length of DNA sequences may favor dried plant tissues as their DNA is fractioned into low
54 molecular weight fragments. However, there may be several pitfalls using this type of DNA. For
55 example, DNA may be broken in invariant primer potential region. In this case, the region has
56 lower potential and PCR reaction conditions should be optimized. The factors affecting the
57 quality and usefulness of DNA from herbarium samples and consequently the efficiency of DNA
58 analyses are known insufficiently. As the analyses of dried plant material are relatively scarce in
59 comparison with fresh material, the systematic studies focused on elucidation of these factors are
60 needed for DNA barcoding.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
61 Many previous studies have been concentrated on testing different extraction/isolation
62 techniques to obtain DNA with high yield and quality from herbarium samples (e.g., Rogers,
63 1994; Erkens et al., 2008; Andreasen et al., 2009; reviewed Záveská Drábková, 2014; Choi et al.,
64 2015). However, many pre-analytical factors may affect the quality and quantity of isolated
65 DNA, and subsequently the results of further applications. The way of drying the plant material
66 may be a considerable parameter as it has been suggested to be as rapid as possible to obtain
67 high-quality DNA (e.g., Záveská Drábková, 2014). Other factors may be associated with
68 material conservation after drying, for example application of fungicides and insecticides.
69 Herbarium material is usually very old. Protocols and guidelines for DNA extraction and
70 sequencing from plant herbarium specimens can be traced back to eighties of the 20th century.
71 However, obtaining DNA sequences from herbarium specimens can be far from routine even in
72 recent years. As the sufficient quality of DNA is essential for the success of the whole molecular
73 study, optimization steps in extraction methods, PCR and sequencing protocols are often needed.
74 Unfortunately, these required modifications may differ in relation to a particular taxonomic
75 group (Hollingsworth et al., 2011).
76 Elucidation of above-mentioned factors is necessary for establishing the DNA barcoding
77 techniques based on dried conserved plant material. This will have a considerable importance for
78 interpretations of how realistic is DNA barcoding on conserved plant material. In the present
79 study, we provide a systematic investigation of PCR amplification, optimization and sequencing
80 efficiency from herbarium specimens of different age, and in fresh samples across the model
81 monocot family Juncaceae. We tested twenty-eight primers for seven regions (rbcL, rpoC1,
82 trnL-F intergenic spacer, trnL intron, psbA-trnH, atp1 and ITS1-5.8S-ITS2), most of them being
83 universal within the plant kingdom. We also applied different PCR optimization tests based on
84 differential temperature and cycling protocols including PCR additives. The main effort of the
85 study was to address and answer these questions: (1) Are DNA samples of old herbarium
86 specimens with usually degraded DNA (checked on the gel) useful for DNA barcoding in spite
87 of problematic methodology? (2) Is a successful DNA amplification correlated with age of the
88 specimen and differs between fresh and dried samples? (3) Is the amplification success
89 dependent on length of the amplicon? (4) Is it possible to use universal primers recommended by
90 CBOL PWG for herbarium specimens?
91
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
92 MATERIAL AND METHODS
93 Plant sampling
94 The Juncaceae family comprise eight genera and about 450 species. In this study, we sampled
95 most of the representatives from all genera, subgenera and sections. The list of 140 taxa used in
96 presented analyses is showed in supporting information Table S1 (including outgroups). For
97 more details about localities and vouchers see also Drábková et al., 2003, 2004, 2006; Záveská
98 Drábková et Vlček, 2009, 2010; Záveská Drábková, 2010. The herbarium specimens analyzed in
99 this study (see Table 1) came from AAU, BM, C, E, K, KRAM, NY, PRA, RSA-POM, UPS
100 collections (for acronyms see Index Herbariorum at
101 http://sci.web.nybg.org/science2/IndexHerbariorum.asp). Most of the specimens were 20 to 50
102 years old, the oldest sample was 144 years old (mean age = 29 years).
103
104 Molecular methods and sequence analysis
105 We used both fresh and dried plant material in the present study. The majority of the samples
106 were obtained from herbarium specimens using modified CTAB extractions and the DNeasy
107 Plant Mini Kit (Qiagen, Hilden, Germany) according to Drábková et al. (2002). Fresh samples
108 were extracted with the same method without modifications according to manufacturer protocol.
109 DNA was extracted from at least 0.1 g of dried samples or 1 g of fresh samples. Double-stranded
110 copies of chloroplast rbcL, rpoC1, trnL-trnF, trnL, psbA-trnH, mitochondrial atp1 and nuclear
111 regions were amplified from total DNA using a set of primers described in Table 2. Three other
112 chloroplast regions recommended by CBOL PWG (rpoB, psbK-psbI and atpF-atpH) were tested
113 as potential DNA barcodes (Do and Záveská Drábková, in prep.). Occasionally, the amplification
114 of sample may not be successful within the first attempt. To obtain a high-quality PCR product, a
115 reamplification with modified PCR conditions is often needed. This step is usually not applied in
116 majority of publications or it is just briefly mentioned in others (e.g., Le Clerc-Blain, 2010).
117 Thus, we provide this study to show the important role of PCR optimization. Initial
118 amplifications were performed with the following program: initial denaturation 95 ºC for 15
119 minutes, 30–40 cycles of denaturation 95 ºC for 1 min, annealing 50 ºC for 1 min and extension
120 72 ºC for 3 min. Termination was 72 ºC for 10 min. Tested and optimized annealing temperature
121 ranged from 45 to 66 ºC and extension temperature was 68 to 72 ºC. Also the time of the steps
122 was optimized, ranging from 30 s to 1 min for the annealing and from 1 to 3 min for extension.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
123 Many PCR additives and enhancing agents have been used to increase the yield, specificity and
124 consistency of reactions. We used DMSO (dimethyl sulfoxide) and Betaine (N,N,N-
125 trimethylglycine) in our optimization steps. However, we did not use any DNA reconstruction
126 (Xu et al., 2015). Sequences were obtained directly without cloning on a CEQ™ 2000XL
127 automated sequencer (Beckman Coulter).
128
129 Bioinformatic analyses
130 Assembly and alignment of sequences. DNA sequences were assembled in GeneSkipper (EMBL
131 Heidelberg). Alignments of the sequences were made in MAFFT 6.611b (Katoh et al., 2002)
132 using the default setting and Fast Fourier Transform algorithm, followed by manual gap
133 adjustments in BioEdit version 7.0.9.0 (Hall, 1999) to improve the alignments. All sequences
134 have been deposited in GenBank (Accession Numbers in Table S1).
135
136 Evaluation of PCR / sequencing success of barcoding markers
137 Two characteristics for the evaluation of PCR success were used. First, the age of the specimen,
138 that is considered in a broad context. It is including both the time since the plant was collected,
139 and the impact of various preservation techniques against the different type of pests during the
140 time. It means the older specimen usually got more preservation hits potentially destroying
141 DNA. Therefore, we used the simplified term “age of the specimen”, that in reality means “time
142 of storage and preservation”. Second, the length of the amplicon was evaluated in association
143 with PCR success and age of the specimen. Each region was quantified for PCR amplification
144 success, which is defined as the successful PCR reaction and sequencing (Table 3, Figure S1). In
145 order to try predicting PCR success in relation to age of specimens, we specified seven
146 categories for the age of the herbarium specimens: 1: 0 = fresh, 2: 1-20, 3: 21-40, 4: 41-60, 5: 61-
147 80, 6: 80-100, 7: 101-144 years. All statistical analyses (linear regression, Mann-Whitney
148 nonparametric test, ANOVA) were conducted using Statistica software (version 12; StatSoft Inc.,
149 2013), and the level of significance P=0.05.
150
151 Tree based analyses. Phylogenetic analyses were carried out to evaluate if the gene trees with
152 each potential barcode are robust enough to be able to identify and discriminate genera and
153 species. We performed described analyses on the set of all matrices separately for 140 species
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
154 and after that total evidence analysis for 95 representatives was run. Initially, phylogenetic
155 analyses were performed under the maximum parsimony (MP) criterion. To increase the
156 likelihood of exploring all possible islands of the shortest trees, the program NONA (Goloboff,
157 1999) was used under the shell of WinClada 1.00.08 (Nixon, 2002) The parsimony ratchet
158 procedure was used to search tree space by reweighing some iterations of a search (Nixon,
159 1999). It was performed by running 1000 replicates holding 50 trees in each replicate, and
160 sampling 75 characters. The ambiguity setting was amb=. A strict consensus tree was
161 constructed. The resulting cladograms were then submitted to the commands ‘‘hard collapse
162 unsupported nodes in all trees” and ‘‘keep best only”. No a priori weighting was applied to the
163 characters.
164
165 RESULTS
166 DNA extraction from herbarium specimens
167 The DNA yield and quality were good enough to obtain acceptable PCR products (ca 6.0-100.0
168 ng/µL; single band without smear or partly degraded on the gel). The study includes 135 species,
169 70% represents herbarium specimens (95 taxa) and 30% fresh plants (40 taxa). In total, 463
170 sequences were obtained from seven regions. Most of the herbarium specimens produced
171 amplifiable DNA for all tested regions.
172
173 Success of PCR amplification (and sequencing) and length of the region
174
175 Comparison of herbarium and fresh samples from all regions showed that 73% fresh samples and
176 61% herbarium samples were amplified at the first attempt (Fig. 1; excluding rbcL where the
177 experiment was per partes amplification and without optimization).
178
179 Genes: rbcL, rpoC1, atp1
180 RbcL is specific in our analyses, because we decided to avoid optimization techniques and
181 divided the region in total length around 1,400 bp to eight separate regions with length 199 to
182 1352 bp. In total, after per partes amplification and sequencing, we obtained 736 to 1381 bp for
183 each taxon. Most of the regions were amplified in the first experiment, except for a few cases.
184 Results for each region are shown in Fig. 2.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
185 Analyses of complete genes rpoC1 and atp1 resulted in length of 550 bp and 1110 bp,
186 respectively. This size variation does not depend on length of genes, but on sequencing reliability
187 in 5´and 3´ends. Herbarium samples of rpoC1 were successfully amplified in 70% of samples
188 and in 88.8% of fresh samples (obtained high quality PCR product at the first attempt; Fig. 3a).
189 The worst performance has atp1 region, where 44.8% herbarium samples and 48.4% fresh
190 samples were successfully amplified in the first experiment (Fig. 3e). The region is the longest
191 (1,110 bp) from all tested regions except of rbcL, where internal primers were designed.
192 Individual PCR success ranged from 20% to 100% for rpoC1 and from 17% to 100% for atp1
193 with most of the samples amplified at the first attempt or after one optimization (not shown).
194
195 Non-coding regions: trnL-F, trnL, psbA-trnH, ITS1-5.8S-ITS2
196 Analyses of complete non-coding sequences demonstrated that the length of trnL-F region varied
197 from 147 to 540 bp for trnL-F intergenic spacer and 332 to 654 bp for trnL intron (for details see
198 Drábková et al., 2004), psbA-trnH region varied from 244 to 706 bp and ITS1-5.8S-ITS2 region
199 had the length 247 to 657 bp in the Juncaceae (Table 3). This size variation reflected mainly the
200 varying number and length of regions.
201 In total, trnL-F intergenic spacer was successfully amplified in 42.4% of dried samples
202 and in 56.8% of fresh samples. The rest of samples needed one to five optimization steps (Fig.
203 3b). However, the fresh samples also needed very similar optimization depending on the
204 variability of the region for each specimen (Fig. 7). Same result was found for ITS1-5.8S-ITS2
205 region (53.6% dry samples and 85.7% fresh samples, Fig. 3f), but in this case all fresh samples
206 were amplified in the first (18 samples) or second (3 samples) experiment. We observed better
207 outcomes in trnL intron, where 59.6% herbarium samples were amplified in the first experiment
208 (Fig. 3c). The most promising region is psbA-trnH where 75.8% of herbarium samples were
209 successfully amplified in the first PCR reaction (Fig. 3d). Most of the residual samples (twelve
210 samples) were amplified in the second attempt. This region was most successful for fresh
211 samples as well (90.3%).
212 Individual success of PCR amplification and product sequencing ranged from 14% (five
213 optimization steps for one sample needed) to 100% (first attempt successful) for trnL-F (Fig. 3b),
214 11 to 100% for trnL and 14 to 100% for psbA-trnH (not shown).
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
215 The PCR success and number of optimization steps for selected 95 plants included in
216 total evidence phylogenetic analysis is depicted in Fig. 6.
217 In most of the cases after successful PCR reaction also a bi-directional sequencing was
218 successful except of a few samples for rpoC1 and ITS region. In these cases we obtained
219 unreadable chromatogram, but after adding additives to the PCR reaction, the result was
220 successful.
221
222 Effect of the age of herbarium specimens
223 We tested whether PCR success is affected by the age of the herbarium specimens (range from 0
224 to 144 years). We applied a linear regression analysis (Fig. 4) and Mann-Whitney non-
225 parametric test. From six tested regions, analyses of Mann-Whitney test showed five regions
226 where PCR success is not dependent on the age of the herbarium specimens (rpoC1 – Z =
227 0.7366, P = 0.4614; trnL-F – Z = -1.1082, P = 0.2678; trnL – Z = 0.4366, P = 0.6624; atp1 – Z =
228 -0.0444, P = 0.9646; ITS1-5.8S-ITS2 – Z = 0.00, P = 1.0000) and only one region where age of
229 the specimen has an important role (psbA-trnH – Z = 2.2187, P = 0.0265).
230 Analyses of one-way ANOVA performed separately for each region indicated no
231 significant effect for age of the specimens in four regions (rpoC1 –F = 1.5519, df = 42, P =
232 0.0601; trnL-F – F = 1.5524, df = 19, P = 0.1237; trnL – F = 1.0778, df = 18, P = 0.4089; atp1 –
233 F = 0.4729, df = 14, P = 0.9347). On the contrary, two regions psbA-trnH and ITS1-5.8S-ITS2
234 revealed significant detrimental effects of the age of the specimens (psbA-trnH – F = 2.679, df =
235 34, P = 0.0008; ITS1-5.8S-ITS2 – F = 3.7068, df = 29, P = 0.0003) on PCR success.
236 We investigated effect of the age of the specimens of PCR successfully amplified regions
237 to length of the region for three non-coding regions (trnL-F, trnL, psbA-trnH) where size
238 variation strongly reflected the number of indels within the region. The results are depicted in
239 Fig. 5. We found a strong correlation between the age of the specimens and length of the region
240 only for trnL intron (P = 0.0685).
241
242 Phylogenetic analyses
243 We analysed all seven regions separately and then combined the data sets in the total evidence
244 matrix (Fig. 6, separate analyses not shown). Neighbour joining tree of combined data sets was
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
245 highly dependent on missing data and showed false data distances among a few pairs of taxa
246 where more regions missed.
247 The strict consensus tree (Fig. 6) of 7,256 bp (rbcL, rpoC1, trnL-F intergenic spacer, trnL
248 intron, psbA-trnH, atp1 and ITS1-5.8S-ITS2) and 98 taxa is based on 1,290 most parsimonious
249 trees with length 6,576, Ci = 60 and Ri = 82. Most of the taxa used for this analysis were
250 successfully amplified from herbarium specimens (61 taxa, in Fig. 6 in orange colour).
251
252 DISCUSSION
253 Barcoding projects explore specimens usually coming from different sources. Some of them are
254 collected in the field, others originate from the immense collections housed in botanical gardens,
255 natural history museums or seed banks, to name only a few. Furthermore, dried herbarium sheets
256 represent the common way of conserving specimens of higher plants, traditionally stored in the
257 botanical museums throughout the world. Drying of the plants is an important parameter for
258 extracting DNA and further applications. Moreover, this plant material is usually considerably
259 old and contains fragmented DNA (e.g., Särkinen et al., 2012; Drábková et al., 2002).
260 Many different regions of the plastid, nuclear and mitochondrial genomes used previously
261 for phylogenetic purposes were satisfactorily applied for DNA barcoding of some groups of
262 plants (e.g., Hollingsworth et al., 2009). However, modifications are required and may differ in
263 relation to a particular taxonomic group (Hollingsworth et al., 2011). Thus, more attention
264 should be paid to their usefulness in many different plant families and for the identification of
265 dried specimens of unknown plants.
266
267 Proper examination of plant material – the most crucial point of all molecular work
268 DNA barcoding has a broad range of uses, from forensic purposes, assessment of the content of
269 various food supplements to understanding plant community structure to commercial
270 applications. The use of herbarium material illustrates the substantial value of the world's
271 herbaria. However, proper determination and revision of concrete plant material is necessary, to
272 avoid risk of mix plants on one herbarium sheet (e. g., Kristiansen et al., 2004; Drábková and
273 Vlček, 2007). The plea for proper examination of plant material before publishing the sequence
274 data for comparison in the GenBank database is crucial especially for barcoding studies where
275 BLAST search is frequently used.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
276 Another challenge conceals markers itself. For instance in nuclear regions (as most
277 widely used ITS) amplified from herbarium specimens contamination of different species is
278 more frequent. As was showed many times fungal and algal contaminations were detected in
279 herbarium samples (e. g., Kuzmina et al. 2012, de Vere et al., 2012).
280
281 Usefulness of herbarium specimens for DNA barcoding
282 The DNA obtained from herbarium sheets of the Juncaceae showed all tested regions of different
283 cell compartments to be useful for DNA barcoding. Comparison of herbarium and fresh samples
284 from all regions showed expected difference between fresh and herbarium samples. Fresh
285 samples performed about 12% better in the first amplification experiment, while 12% herbarium
286 samples needed more optimization steps than fresh samples. If we compare usefulness of
287 herbarium specimens in the way described by Hollingsworth et al. (2009), the amplification of
288 most of the regions is far from routine. As we showed above and in agreement with de Vere et al.
289 (2012), herbarium specimens require more attempts at amplification with more primer
290 combinations. Thus, working with herbarium material is not easy and fast, it is time consuming,
291 more expensive and needs more experienced researchers. Nevertheless, after resolving
292 methodical difficulties, the obtained regions from herbarium samples have the same length and
293 sequence as in freshly obtained plants. Phylogenetic analysis with most of the samples were
294 obtained from ancient DNA, confirmed usefulness of herbarium specimens for phylogenetic and
295 barcoding purposes. In contrast with rbcL findings in different plant species of de Vere et al.
296 (2012), we investigate that from all regions used in our study correct sequences were obtained in
297 both herbarium material and freshly collected specimens as well, so most likely greater
298 barcoding primer universality and ease of amplification does not play big role in the Juncaceae.
299 The ancient DNA needs to be more often treated differently from the fresh DNA. The
300 DNA amplification is frequently divided to more than one step with a few different sets of
301 primers (e.g., rbcL twelve primer set; Fig. S1a), and the optimization of the reaction conditions is
302 also difficult. The separate analysis of short DNA fragments is not problematic by itself, but the
303 short regions usually have not a strong discriminatory power. Thus, we need to amplify usually
304 the whole region step by step. In many cases, PCR optimization should solve the problem
305 satisfactorily, as well. It should be emphasized that quite a lot of fresh samples were also
306 amplified more than once to achieve the successful PCR in our study (Fig. 2, 3 and 6). The
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
307 optimization of the PCR reactions is usually necessary for novel plant samples (i.e., plant order,
308 family, and genus).
309 The preservation of samples especially during long-term herbarium storage remains the
310 most challenging issue to consider. In general, old air-dried material that has not been treated
311 with chemical preservatives has the best chance of yielding useful DNA for barcoding purposes.
312 It has been described many times that to preserve DNA well, it is necessary to dry plants as fast
313 as possible with mild temperature. High temperatures (60 to 70 °C) cause cell to rupture quickly,
314 concomitantly releasing nucleases, reactive oxygen species (ROS) and enzymes that bring about
315 necrosis and DNA fragmentation (Staats et al., 2011). Extraction results depend on how the plant
316 material is prepared, how many times the collection is treated against insects and the type of
317 chemicals or used procedures. For instance, DNA has been found seriously degraded in leaves
318 that were microwaved, boiled in water, or immersed in chemical solutions (e.g., Tailor, 1994).
319 The regular herbarium treatment used to keep specimens free of pests is another important factor
320 to be taken into account. Fumigation methods may result in changes during the sample storage,
321 often making it difficult to be sure about the DNA quality. Unfortunately, we usually have no
322 data on time and kind of treatment that was used for each specimen in the herbarium collection.
323 Therefore, the only solution how to recognize applicability of the specimen is to analyze it.
324
325 Appropriate length for successful DNA amplification and sequencing from herbarium
326 specimens
327 The current plant barcodes recommended by CBOL PWG (CBOL, 2009) are longer than 650 bp
328 (e.g., for rbcL, matK). However, to achieve this length is problematic in many herbarium
329 samples. In our study, we showed the easily amplifiable fragments from herbarium specimens
330 have length below 500 bp, in agreement with Särkinen et al. (2012; see Table 3). The best region
331 was psbA-trnH with length ranging from 392 to 706 bp in our study. Moreover, the specimens
332 with the shortest region were amplified usually at the first attempt. However, the length is not the
333 only one important factor. The low template DNA yield from herbarium samples is a limiting
334 factor for a successful amplification of longer fragments as well.
335 The most promising regions are rpoC1, psbA-trnH and trnL primer combinations (Table
336 3). If we compare designing new set of primers to amplify longer region step by step, using
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
337 “universal” primers necessitate more optimization steps. However, most of the fragments were
338 successfully amplified in the first experiment (Fig. 3).
339
340 Prediction of PCR success for herbarium specimens – reality or not?
341 DNA analyses are common procedures in studies of plants for more than 30 years. However,
342 there is a relevant predictor of high yield and good quality DNA extraction and PCR success still
343 missing (Erkens et al., 2008). The herbarium DNA is usually degraded into low molecular
344 weight fragments, because the specimen preparation includes high level of metabolic and cellular
345 stress responses and ultimately cell death resulting in irreparably damaged DNA (Savolainen et
346 al., 1995). We agree with the results of Erkens et al. (2008) that uncritical using such indicators
347 as leaf color or age is not recommendable. These indicators might be taken under consideration
348 only as supporting factors. Nevertheless, we tried to find general correlation between the age of
349 the herbarium specimen and PCR success (and sequencing success) for chloroplast,
350 mitochondrial and nucleus regions. The results are unambiguous. For a few regions (psbA-trnH
351 and ITS1-5.8S-ITS2), we identified statistically significant negative association for age of the
352 specimens, but for others (rpoC1, trnL-F, trnL and atp1) there was no effect of the age factor
353 confirmed.
354 For future studies, it should be of interest to define in detail why some regions are
355 probably more susceptible to result in worse PCR amplification success rates both in dried and
356 fresh samples. Unfortunately, as above-mentioned, we could not find any convincing effect of
357 the age of the specimen on the PCR amplification success, as the significant detrimental impact
358 of age was found in regions where relatively high success PCR amplification rates were
359 observed, both in fresh and dried samples (e.g., psbA-trnH). On the contrary, the worst
360 performing region atp1 (44.8% versus 48.4% success in herbarium and fresh samples,
361 respectively) did not exhibit any negative impact of age on PCR success.
362 Therefore, based on our findings, the PCR success is more dependent on the locus to be
363 amplified, quality of primers (optimally no mismatch in the sequence) and variability of the
364 region in a particular plant. We can also tentatively suggest that some regions (e.g., psbA-trnH)
365 may be more suitable as potential candidates for barcoding purposes. This is due to their overall
366 better PCR performance despite potential negative effect of the old age in comparison with fresh
367 samples, as observed in our study. More research will be needed to elucidate the impact of
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
368 particular treatments applied in preparation, processing and long-term storage of dried samples
369 of other plant groups, and to establish the most relevant barcode regions generally applicable
370 across the plant kingdom.
371
372 CONCLUSION
373 The results from our study confirmed that both fresh and herbarium samples may be successfully
374 used for DNA barcoding in the Juncaceae. In comparison with fresh samples, working with dried
375 herbarium specimens is more complicated, but may lead to amplification and sequencing success
376 in almost all cases when appropriate internal primers are designed or optimization methods are
377 used. However, to solve the potential pitfalls, we recommend following these steps to handle
378 with herbarium specimens for DNA barcoding purposes:
379
380 1) A proper examination of the herbarium specimen. Brown, black leaves (often treated
381 with alcohol or dried not carefully) or covered by fungi should be avoided to work
382 with. It is necessary to be careful, the color of the leaves differ from family to family
383 because of chemical compounds in the plant bodies. The old age itself may not be an
384 obstacle in achieving the successful PCR amplification.
385 2) Optimization techniques may be necessary. Optimization of the PCR conditions using
386 universal CBOL PWG designed primers according to common PCR design
387 successfully used for the Juncaceae family (Fig. 7).
388 3) Design of new internal primer sets to amplify shorter regions: It may be faster for
389 analyses of specific group of samples, e. g., on the level of family or genus.
390 4) Steps 2) and 3) depend on the region under the investigation. Designing new internal
391 primers might work perfectly for genes; however, in non-coding regions this is more
392 challenging.
393 5) A proper determination of comparative dry plant material is sometimes essential. As
394 an example: in our previous studies there was shown that two sequences in GenBank
395 were erroneous by re-examining two vouchers of Oxychloë andina and five new
396 accessions. One was a chimeric sequence and the other was most likely a contaminant
397 (e.g., Kristiansen et al., 2004; Drábková and Vlček, 2007).
398
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
399 ACKNOWLEDGEMENT
400 We are grateful to herbarium keepers for the opportunity to study and collect plant material in
401 their collections and Čestmír Vlček for help with sequence editing.
402
403 ADDITIONAL INFORMATION AND DECLARATIONS
404 Funding
405 The study was done in the laboratories of Center for Integrated Genomics - Institute of
406 Molecular Genetics ASCR, Institute of Botany ASCR and was supported in part by grants
407 obtained by LZD: SYNTHESYS DK-TAF 1295, SYNTHESYS GB-TAF 2052 and GAČR
408 206/07/P147, P506/11/0774, 16-14649S. The study was supported a long-term research
409 development project of Institute of Botany ASCR, no. RVO 67985939.
410 Competing Interests
411 The authors declare that they have no competing interests.
412 Author contributions
413 Lenka Záveská Drábková designed the study, sampled specimens, performed molecular
414 analyses of most of the data, analyzed the data, and wrote the manuscript.
415 Danka Do was responsible for molecular analyses of rpoC1 and psbA-trnH regions,
416 helped with data arrangement and writing the manuscript.
417 Data Deposition
418 Data sequences: The list of 463 sequences used in our analyses is presented in supporting
419 information Table S1 and available at GenBank.
420 Supplementary Information
421 Supplemental information for this article can be found on line…
422
423 References
424
425 Andreasen K, Manktelow M, Razafi mandimbison SG. 2009. Successful DNA amplification
426 of a more than 200-year- old herbarium specimen: recovering genetic material from the
427 Linnaean era. Taxon 58:959–962.
428 Chase MW, Cowan RS, Holingworth PM et al. 2007. A proposal for a standardized protocol
429 to barcode all land plants. Taxon 56:296-299.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
430 CBOL Plant Working Group. 2009. A DNA barcode for land plant. Proceedings of the
431 National Academy of Sciences of the United States of America 106/31:12794–12797.
432 Choi JH, Lee HJ, Shipunov A. 2015. All that is gold does not glitter? Age, taxonomy, and
433 ancient plant DNA quality. PeerJ 3:e1087; DOI 10.7717/peerj.1087.
434 Daugbjerg N, Moestrup Ø, Arctander P. 1994. Phylogeny of the genus Pyramimonas
435 (Prasinophyceae) inferred from the rbcL gene. Journal of Phycology 30:991–999.
436 de Vere N, Rich TCG, Ford CR, Trinder SA, Long C, et al. 2012. DNA Barcoding the Native
437 Flowering Plants and Conifers of Wales. PLoS ONE 7(6):e37945.
438 Do D, Záveská Drábková L. 2016. DNA barcoding of the Juncaceae: Evaluating of ten
439 candidate regions. In prep.
440 Drábková L, Kirschner J, Vlček Č. 2002. Historical herbarium specimens in molecular
441 taxonomy of the Juncaceae: A comparison of DNA extraction and amplification
442 protocols. Plant Molecular Biology Reporter 20:161-175.
443 Drábková L, Kirschner J, Seberg O, Petersen G, Vlček Č. 2003. Phylogeny of the Juncaceae
444 based on rbcL sequences, with special emphasis on Luzula DC. and Juncus L. Plant
445 Systematics and Evolution 240:133-147.
446 Drábková L, Kirschner J, Vlček Č, Pačes V. 2004. TrnL-trnF intergenic spacer and trnL
447 intron define clades within Luzula and Juncus (Juncaceae). Journal of Molecular
448 evolution 59:1-10.
449 Drábková L, Kirschner J, Vlček Č. 2006. Phylogenetic relationships within Luzula DC. and
450 Juncus L. (Juncaceae): A comparison of phylogenetic signals of trnL-trnF intergenic
451 spacer, trnL intron and rbcL plastome sequence data. Cladistics 22:132-143.
452 Drábková L, Vlček Č. 2007. The phylogenetic position of Oxychloë (Juncaceae): evidence from
453 one nuclear, three plastid regions and morphology. Taxon 56:95-102.
454 Erkens RHJ, Cross H, Maas JW, Hoenselaar K, Chatrou LW. 2008. Assessment of age and
455 greenness of herbarium specimens as predictors for successful extraction and
456 amplification of DNA. Blumea 53:407-428.
457 Goloboff PA. 1999. Analyzing large data sets in reasonable times: solutions for composite
458 optima. Cladistics 15:415–428.
459 Hall TA. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis
460 program for Windows 95/98/NT. Nucleic Acids Symposium Series 41:95–98.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
461 Hollingsworth PM et al. 2009. A DNA barcode for land plants. Proceedings of the National
462 Academy of Sciences of the United States of America 106:12794–12797.
463 Hollingsworth PM, Graham SW, Little DP. 2011. Choosing and using a plant DNA barcode.
464 PLoS ONE 6:e19254.
465 Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for rapid multiple
466 sequence alignment based on fast Fourier transform. Nucleic Acids Research 30:3059–
467 3066.
468 Kristiansen K, Cilieborg M, Drábková L, Jørgensen T, Petersen G, Seberg O. 2004. DNA
469 taxonomy – the riddle of Oxychloë. Systematic Botany 30:284-289.
470 Kuzmina ML, Johnson KL, Barron HR, Hebert PDH. 2012. Identification of the vascular
471 plants of Churchill, Manitoba, using a DNA barcode library. BMC Ecology 12:25.
472 Le Clerc-Blain J, Staar RJ, Bull RD, Saarela JM. 2010. A regional approach to plant DNA
473 barcoding provides high species resolution of sedges (Carex and Kobresia, Cyperaceae)
474 in the Canadian Arctic Archipelago. Molecular Ecology Resources 10:69–91.
475 Nixon KC 1999. The parsimony rachet, a new method for rapid phylogenetic analysis.
476 Cladistics 15:407–414.
477 Nixon KC. 2002. WinClada ver. 1.00.08. Published by the author, Ithaca, NY.
478 Petersen G, Seberg O. 2003. Phylogenetic analyses of the diploid species of Hordeum
479 (Poaceae) and a revised classification of the genus. Systematic Botany 28:293–306.
480 Roalson EH, Columbus J., Friar EA. 2001. Phylogenetic relationships in Cariceae
481 (Cyperaceae) based on ITS (nrDNA) and trnL-F region sequences: assessment of
482 subgeneric and sectional relationships in Carex with emphasis on section Acrocystis.
483 Systematic Botany 26:318–341.
484 Rogers SO. 1994. Phylogenetic and taxonomic information from herbarium and mumified
485 DNA. In: Adams RP et al (eds) Conservation of plant genes II.: utilization of ancient and
486 modern DNA. Missouri Botanical Gradens, Monographs, Missouri Botanical Garden
487 Press, St. Louis, vol 48.
488 Sass C, Little D, Stevenson DW, Specht CD. 2007. DNA Barcoding in the Cycadales: Testing
489 the Potential of Proposed Barcoding Markers for Species Identification of Cycads. PLoS
490 ONE 11:e1154.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
491 Savolainen V, Cuénoud P, Spichiger R, Martinez MDP, Crèvecoeur M et al. 1995. The use
492 of herbarium specimens in DNA phylogenetics: evaluation and improvement Plant
493 Systematics and Evolution 197:87-98.
494 Särkinen T, Staats M, Richardon JE, Cowan RS, Baker FT. 2012. How to open the treasure
495 chest? Optimising DNA extraction from herbarium specimens. PLoS ONE 7: e43808.
496 Staats M, Cuenca A, Richardson JE, Vrielink-van Ginkel R, Petersen G, et al. 2011. DNA
497 Damage in Plant Herbarium Tissue. PLoS ONE 6:e28448.
498 Taberlet P, Gielly L, Bouvet J. 1991 Universal primers for amplification of three non-coding
499 regions of chloroplast DNA. Plant Molecular Biology 17:1105–1109.
500 Tailor JW, Swann EC. 1994. Dried samples: soft tissues, DNA from herbarium specimens. In:
501 Herrmann B, Hummel S (eds) Ancient DNA. Springer, Verlag.
502 Thiers B. [continuously updated]. Index Herbariorum: A global directory of public herbaria and
503 associated staff. New York Botanical Garden's Virtual Herbarium.
504 http://sweetgum.nybg.org/ih/
505 Xu C, Dong W, Shi S, Cheng T, Li C, Liu Y, Wu P, Wu H, Gao P, Zhou S. 2015.
506 Accelerating plant DNA barcode reference library construction using herbarium
507 specimens: improved experimental techniques. Molecular Ecology Resources
508 10.1111/1755-0998.12413.
509 Záveská Drábková L. 2014. DNA extraction from herbarium specimens. In: Besse P. (ed.)
510 Molecular Plant Taxonomy: Methods and protocols, Methods in Molecular Biology, Vol.
511 1115, pp 67-84, Springer Science+Business Media New York, DOI 10.1007/978-1-
512 62703-767-9_4.
513 Záveská Drábková L, Vlček Č. 2010. Molecular phylogeny of the genus Luzula DC.
514 (Juncaceae, Monocotyledones) based on plastome and nuclear ribosomal regions: A case
515 of incongruence, incomplete lineage sorting and hybridisation. Molecular Phylogenetics
516 and Evolution 57:536-551.
517 Záveská Drábková L. 2010. Phylogenetic Relationships within Juncaceae: Evidence from all
518 three Genomic Compartments with Notes to the Morphology. In: Seberg, O., Petersen,
519 G., Barford and Davis: Diversity, phylogeny, and evolution in the monocotyledons, pp.
520 389-416.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
521 Záveská Drábková L, Vlček Č. 2009. DNA variation within Juncaceae: comparison of impact
522 of organelles regions on phylogeny. Plant Systematics and Evolution 278:169–186.
523
524 Author Contributions
525
526 L.Z.D. designed the study, sampled specimens, performed molecular analyses of most of the
527 data, analyzed the data, and wrote the manuscript. D.D. was responsible for molecular analyses
528 of rpoC1 and psbA-trnH regions, helped with data arrangement and writing the manuscript.
529
530 Data Accessibility
531
532 Data sequences: The list of 463 sequences used in our analyses is presented in supporting
533 information Table S1 and available at GenBank.
534
535 Tables
536
537 Table 1 List of herbarium specimens with the year of the collection, DNA extraction and
538 herbarium source (sorted by collection year). Specimens with unknown collection year were not
539 included (nine samples).
540
541 Table 2 Regions and primers used for testing DNA quality through PCR amplification.
542
543 Table 3 Summary of the fragment sizes and PCR success for each primer set of seven regions.
544
545 Figures
546
547 Figure 1 Total comparison of PCR success of herbarium and fresh samples for six regions
548 (rpoC1, trnL-F intergenic spacer, trnL intron, psbA-trnH intergenic spacer, atp1 and ITS1-5.8S-
549 ITS2).
550
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
551 Figure 2 PCR success of herbarium and fresh samples of rbcL for each primer set RH1S-
552 J1352R: 1352 bp, RH1S-Hv362R: 362 bp, Hv234-J556R: 322 bp, Hv522-Cym821: 299 bp,
553 CE622-Cym821: 199 bp, Ce622-J949R: 327 bp, Hv890-Hv1204RS: 314 bp, Hv890-J1352R:
554 462 bp. Numbers above columns indicate percentage from total herbarium or fresh samples.
555
556 Figure 3 PCR success and number of optimization steps for herbarium and fresh samples of six
557 regions from chloroplast A) rpoC1, B) trnL-F intergenic spacer C) trnL intron, D) psbA-trnH
558 intergenic spacer, mitochondria E) atp1 and nuclear F) ITS1-5.8S-ITS2 as DNA barcodes in
559 fresh versus herbarium samples. Numbers above columns indicate percentage from total
560 herbarium or fresh samples.
561
562 Figure 4 Effect of age of the specimen to PCR success. A) rpoC1, B) trnL-F, C) trnL, D) psbA-
563 trnH, E) atp1 and F) ITS1-5.8S-ITS2.
564
565 Figure 5 Effect of length to successfully amplified region with different age of the specimen A)
566 psbA-trnH, B) trnL and trnL-F.
567
568 Figure 6 Strict consensus tree of the Juncaceae (98 taxa) based on 1,290 most parsimonious trees
569 based on seven regions (7,256 bp). L = 6,576, Ci = 60, Ri = 82. Numbers above branches are
570 bootstrap values ˃50 %. Orange species originate from herbarium specimens. On the left side is
571 shown PCR success of six regions (number of PCR optimizations): rpoC1, trnL-F, trnL, psbA-
572 trnH, atp1 and ITS1-5.8S-ITS2. A free space means that for the species only rbcL region was
573 included.
574
575 Figure 7 Diagram of recommended the main PCR optimization steps for herbarium specimens
576 used in this study.
577
578 Supplemental data
579
580 Additional Supporting Information may be found in the online version of this article:
581
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
582 Figure S1 Organization of used primers of the rbcL, rpoC1, trnL-F, psbA-trnH, atp1, ITS1-
583 5.8S-ITS2, rpoB, psbK-I and atpF-H regions. Arrows indicate orientation and approximate
584 position of primer sites.
585
586 Table S1 Genbank accession numbers for the regions used in this study (rbcL, rpoC1, trnL-F,
587 psbA-trnH, atp1 and ITS1-5.8S-ITS2).
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
Figure 1(on next page)
Figure 1
Total comparison of PCR success of herbarium and fresh samples for six regions (rpoC1,
trnL-F intergenic spacer, trnL intron, psbA-trnH intergenic spacer, atp1 and ITS1-5.8S-ITS2)
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
Figure 2(on next page)
Figure 2
PCR success of herbarium and fresh samples of rbcL for each primer set RH1S-J1352R: 1352
bp, RH1S-Hv362R: 362 bp, Hv234-J556R: 322 bp, Hv522-Cym821: 299 bp, CE622-Cym821:
199 bp, Ce622-J949R: 327 bp, Hv890-Hv1204RS: 314 bp, Hv890-J1352R: 462 bp. Numbers
above columns indicate percentage from total herbarium or fresh samples.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
Figure 3(on next page)
Figure 3
PCR success and number of optimization steps for herbarium and fresh samples of six
regions from chloroplast A) rpoC1, B) trnL-F intergenic spacer C) trnL intron, D) psbA-trnH
intergenic spacer, mitochondria E) atp1 and nuclear F) ITS1-5.8S-ITS2 as DNA barcodes in
fresh versus herbarium samples. Numbers above columns indicate percentage from total
herbarium or fresh samples.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
Figure 4(on next page)
Figure 4
Effect of age of the specimen to PCR success. A) rpoC1, B) trnL-F, C) trnL, D) psbA-trnH, E)
atp1 and F) ITS1-5.8S-ITS2.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
Figure 5(on next page)
Figure 5
Effect of length to successfully amplified region with different age of the specimen A)
psbA-trnH, B) trnL and trnL-F.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
Figure 6(on next page)
Figure 6
Strict consensus tree of the Juncaceae (98 taxa) based on 1,290 most parsimonious trees
based on seven regions (7,256 bp). L = 6,576, Ci = 60, Ri = 82. Numbers above branches are
bootstrap values ˃50 %. Orange species originate from herbarium specimens. On the left side
is shown PCR success of six regions (number of PCR optimizations): rpoC1, trnL-F, trnL,
psbA-trnH, atp1 and ITS1-5.8S-ITS2. A free space means that for the species only rbcL region
was included.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
Figure 7(on next page)
Figure 7
Diagram of recommended the main PCR optimization steps for herbarium specimens used in
this study.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
Table 1(on next page)
Table 1
List of herbarium specimens with the year of the collection, DNA extraction and herbarium
source (sorted by collection year). Specimens with unknown collection year were not
included (nine samples).
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
1Table 1 List of herbarium specimens with the year of samples collection, DNA extraction and
2 herbarium source (sorted by collection year).
3
Species
Collection year*
Year of DNA extraction
Herbarium source (acronym)
Luzula capitata
1861
2005
BM
Luzula seubertii
1892
2001
BM
Luzula abyssinica
1909
2005
BM
Luzula luzulina
1910
2002
BM
Juncus nevadensis
1915
2001
AAU
Luzula hawaiiensis
1920
2002
AAU
Juncus repens
1935
2005
BM
Luzula caespitosa
1935
2005
BM
Juncus hallii
1936
2001
C
Luzula johnstonii
1936
2001
BM
Juncus imbricatus
1937
2001
BM
Juncus interior
1937
2005
AAU
Juncus drummondii
1942
2005
AAU
Luzula effusa
1949
2001
BM
Luzula echinata
1949
2005
BM
Juncus pauciflorus
1951
2001
BM
Luzula lutescens
1952
2005
UPS
Luzula kjellmaniana
1952
2001
BM
Luzula acuminata
1954
2001
BM
Juncus longistylis
1954
2005
KRAM
Juncus uncialis
1954
2003
RSA-POM
Luzula modesta
1957
2002
UPS
Luzula flaccida
1957
2005
UPS
Rostkovia magellanica
1957
2002
BM
Luzula traversii
1958
2001
UPS
Luzula racemosa
1958
2001
BM
Luzula rufa
1959
2001
UPS
Luzula acutiflora
1959
2005
UPS
Marsippospermum grandiflorum
1959
2002
BM
Luzula pediformis
1961
2002
BM
Luzula divaricata
1961
2001
BM
Luzula subcongesta
1962
2001
K
Luzula comosa var. laxa
1966
2005
PRA
Juncus dichotomus
1968
2001
BM
Juncus covillei var. obtusatus
1970
2005
PRA
Juncus coriaceus
1972
2005
AAU
Luzula pedemontana
1973
2001
BM
Juncus vaginatus
1975
2001
K
Juncus acutus
1976
2005
KRAM
Luzula alpinopilosa
1977
2004
AAU
Luzula pindica
1978
2005
BM
Juncus kraussii
1978
2001
K
Juncus bengalensis
1979
2001
K
Luzula piperi
1979
2001
BM
Juncus polycephalus
1980
2005
AAU
Luzula atlantica
1981
2001
E
Juncus oxycarpus
1982
2001
K
Juncus brevicaudatus
1983
2001
BM
Juncus parryi
1983
2001
BM
Juncus dudleyi
1983
2005
PRA
Oxychloë haumaniana
1983
2001
AAU
Juncus amplifolius
1984
2005
BM
Juncus himalensis
1984
2001
K
Juncus nodosus
1984
2005
PRA
Distichia acicularis
1984
2005
NY
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
Luzula lutea
1985
2001
BM
Luzula parviflora
1985
2001
PRA
Juncus debilis
1986
2005
BM
Luzula confusa
1987
2005
BM
Juncus capillaceus
1988
2001
AAU
Luzula congesta
1989
2002
AAU
Juncus regelii
1991
2003
NY
Juncus torreyi
1993
2005
NY
Luzula novae-cambriae
1994
2001
K
Luzula pallescens
1994
2001
PRA
Juncus cooperi
1994
2003
RSA-POM
Juncus potaninii
1994
2003
PRA
Juncus allioides
1994
2003
PRA
Oxychloë castellanosii
1995
2001
AAU
Luzula mendocina
1995
2002
C
Juncus prominens
1995
2003
NY
Luzula meridionalis
1997
2001
K
Juncus pelocarpus
1997
2005
PRA
Luzula ulophylla
1998
2001
PRA
Luzula decipiens
1998
2001
PRA
Oreojuncus monanthos
1998
2003
PRA
Juncus jacquinii
1998
2004
PRA
Luzula bomiensis
1999
2005
PRA
Oxychloë andina
1999
2001
AAU
Oxychloë bisexualis
1999
2001
AAU
Luzula hitchcockii
1999
2005
KRA
Juncus scirpoides
1999
2005
NY
Juncus thomsonii
1999
2005
PRA
Juncus castaneus
2000
2001
PRA
Luzula wahlenbergii
2000
2001
PRA
Juncus sikkimensis
2000
2003
PRA
Juncus przewalskii
2000
2003
PRA
Juncus subsecundus
2001
2003
PRA
Juncus subulatus
2002
2005
PRA
Juncus alpinoarticulatus
2002
2005
PRA
Juncus gracilicaulis
2002
2003
PRA
Juncus minimus
2002
2003
PRA
Juncus mexicanus
2003
2003
PRA
Juncus turkestanicus
2003
2005
PRA
Juncus membranaceus
2004
2005
PRA
4
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
Table 2(on next page)
Table 2
Regions and primers used for testing DNA quality through PCR amplification.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
1Table 2 Regions and primers used for testing DNA quality through PCR amplification.
2
Candidate
barcodes
Primers
Sequence (5'-3')
Reference
rbcL
RH 1S
ATG TCA CCA CAA ACA GAA ACT
Petersen & Seberg, 2003
Hv 234
CGT TAC AAA GGA CGA TGC
Petersen & Seberg, 2003
Hv 362 R
TGA ACC CAA ATA CGT TAC CCA
Petersen & Seberg, 2003
Hv 522
TAA ACC AAA ATT GGG ATT ATC
CGC
Petersen & Seberg, 2003
J 556 R
ACA TTC ATA AAC TGC TCT ACC
Drábková et al., 2003
Ce 622
TCA CAA CCA TTT ATG CGT TG
Daugbjerg et al., 1994
Cym
821R
AAA CCA CCA GTT AGG TAG TC
Daugbjerg et al., 1994
Hv 890
TGC ATG CAG TTA TTG ATA GAC
Petersen & Seberg, 2003
J 949 R
CTC CAC CAG ACA TAC GTA ATG C
Drábková et al., 2003
J 1352 R
GCA GCA GCT AGT TCA GCA CTC C
Drábková et al., 2003
rpoC1
2f
GGC AAA GAG GGA AGA TTT CG
CBOL Plant Working Group,
2009
4r
CCA TAA GCA TAT CTT GAG TTG G
CBOL Plant Working Group,
2009
rpoB
2f
ATG CAA CGT CAA GCA GTT CC
CBOL Plant Working Group,
2009
3r
CCG TAT GTG AAA AGA AGT ATA
CBOL Plant Working Group,
2009
trnL-F
trnL_c
CGA AAT CGG TAG ACG CTA CG
Taberlet et al., 1991
trnL_d
GGG GAT AGA GGG ACT TGA AC
Taberlet et al., 1991
trnL_e
GGT TCA AGT CCC TCT ATC CC
Taberlet et al., 1991
trnF_f
ATT TGA ACT GGT GAC ACG AG
Taberlet et al., 1991
psbA-trnH
3´f
GTT ATG CAT GAA CGT AAT GCT C
CBOL Plant Working Group,
2009
f 05
CGC GCA TGG TGG ATT CAC AAT
CC
CBOL Plant Working Group,
2009
psbK-I
3f
TTA GCC TTT GTT TGG CAA G
CBOL Plant Working Group,
2009
r
AGA GTT TGA GAG TAA GCA T
CBOL Plant Working Group,
2009
atpF-H
f
ACT CGC ACA CAC TCC CTT TCC
CBOL Plant Working Group,
2009
r
GCT TTT ATG GAA GCT TTA ACA
AT
CBOL Plant Working Group,
2009
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
atp1
atpAF1.5
AGT TGG AGA TGG GAT TGC AC
Petersen et al., 2006
atpAB1.5
ATT GTG GTT GTT TGR GCA CT
Petersen et al., 2006
ITS
ITS4i
GGT AGT CCC GCC TGA CCT GG
Roalson et al., 2001
ITS5i
AGG TGA CCT GCG GAA GGA TCA
TT
Roalson et al., 2001
3
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
Table 3(on next page)
Table 3
Summary of the fragment sizes and PCR success for each primer set of seven regions.
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
1Table 3 Summary of the fragment sizes and PCR success for each primer set of nine regions.
2
DNA region
Primer set (5'-3')
Fragment size
(bp)
PCR
success
total (%)
PCR success
herbarium
specimens
(%)
PCR
success
fresh
material
(%)
rbcL
RH1S/J1352R
1352
84
*1
100
RH1S/Hv362R
362
82
80
*2
Hv234/J556R
322
84
87
*3
Hv522/Cym821
299
78
78
-
Ce622/Cym821
199
100
100
-
Ce622/J949R
327
85
85
-
Hv890/Hv1204
314
100
100
-
Hv890/J1352R
462
73
73
-
rpoC1
rpoC1_2f/rpoC1_2r
530
75
70
89
trnL-trnF
trnL_c/trnL_d
332-654
80
74
84
trnL_e/trnL_f
147-540
73
71
75
psbA-trnH
psbA_3f/trnH_f05
244-706
81
76
90
atp1
atpAF1.5/atpAB1.5
1110
68
69
68
ITS1-5.8S-ITS2
ITS4i/ITS5i
247-657
78
70
95
3
4*1 Only eight selected herbarium samples were tested for this region; *2 only three selected fresh samples were tested for this
5region; *3 only one selected fresh sample was tested for this region.
6
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2293v1 | CC BY 4.0 Open Access | rec: 15 Jul 2016, publ: 15 Jul 2016
Article
Full-text available
It is commonly difficult to extract and amplify DNA from herbarium samples as they are old and preserved using different compounds. In addition, such samples are subjected to the accumulation of intrinsically produced plant substances over long periods (up to hundreds of years). DNA extraction from desert flora may pause added difficulties as many contain high levels of secondary metabolites. Herbarium samples from the Biology Department (UAE University) plant collection and fresh plant samples, collected from around Al-Ain (UAE), were used in this study. The three barcode loci for the coding genes matK, rbcL and rpoC1-were amplified. Our results showed that T. terresteris, H. robustum,T. pentandrus and Z. qatarense were amplified using all three primers for both fresh and herbaium samples. Both fresh and herbarium samples of C. comosum, however, were not amplified at all, using the three primers. Herbarium samples from A. javanica, C. imbricatum, T. aucherana and Z. simplex were not amplified with any of the three primers. For fresh samples 90, 90 and 80% of the samples were amplified using matK, rbcL and rpoC1, respectively. In short, fresh samples were significantly better amplified than those from herbarium sources, using the three primers. Both fresh and herbarium samples from one species (C. comosum), however, were not successfully amplified. It is also concluded that the rbcL regions showed real potentials to distinguish the UAE species under investigation into the appropriate family and genus.
Article
Full-text available
More than 600 herbarium samples from four distantly related groups of flowering plants were used for DNA extraction and subsequent measurements of DNA purity and concentration. We did not find any significant relation between DNA purity and the age of the sample. However, DNA yields were different between plant groups studied. We believe that there there should be no reservations about "old" samples if the goal is to extract more DNA of better purity. We argue that the older herbarium samples are the mine for the future DNA studies, and have the value not less than the "fresh" specimens.
Article
Full-text available
With approximately 2,000 species, Carex is the largest genus in the Cyperaceae and is one of the most widespread genera in the world. Relationships within Carex and among the genera of the Cariceae (Carex, Cymophyllus, Kobresia, Schoenoxiphium, and Uncinia) are unclear. For this reason, a molecular phylogenetic study employing nrDNA ITS and cpDNA trnT-L-F spacer sequences was undertaken. In addition to creating hypotheses of relationship for the Cariceae and testing classifications of this tribe, a primary goal of this study was to assess relationships within Carex section Acrocystis and identify a monophyletic group for more detailed study. These analyses suggest that Cymophyllus, Kobresia, Schoenoxiphium, and Uncinia are nested within Carex. Three primary clades are suggested: a Carex subgenus Vignea clade, a clade including Carex subgenus Primocarex (for the most part) and the other genera of Cariceae, and a clade predominately comprised of Carex subgenera Carex and Indocarex. A large part of Carex section Acrocystis forms a monophyletic group but several Eurasian species are more closely related to other groups rather than to this core clade. Assessment of chromosome number variation across the Cariceae clade suggests that the ancestor of the Cariceae had a moderate to high chromosome number. In addition, these analyses suggest the sister group of the Cariceae is a clade including Scirpus sensu stricto, Amphiscirpus, and Dulichium.
Article
Full-text available
Background Because arctic plant communities are highly vulnerable to climate change, shifts in their composition require rapid, accurate identifications, often for specimens that lack diagnostic floral characters. The present study examines the role that DNA barcoding can play in aiding floristic evaluations in the arctic by testing the effectiveness of the core plant barcode regions (rbcL, matK) and a supplemental ribosomal DNA (ITS2) marker for a well-studied flora near Churchill, Manitoba. Results This investigation examined 900 specimens representing 312 of the 354 species of vascular plants known from Churchill. Sequencing success was high for rbcL: 95% for fresh specimens and 85% for herbarium samples (mean age 20 years). ITS2 worked equally well for the fresh and herbarium material (89% and 88%). However, sequencing success was lower for matK, despite two rounds of PCR amplification, which reflected less effective primer binding and sensitivity to the DNA degradation (76% of fresh, 45% of herbaria samples). A species was considered as taxonomically resolved if its members showed at least one diagnostic difference from any other taxon in the study and formed a monophyletic clade. The highest species resolution (69%) was obtained by combining information from all three genes. The joint sequence information for rbcL and matK distinguished 54% of 286 species, while rbcL and ITS2 distinguished 63% of 285 species. Discrimination of species within Salix, which constituted 8% of the flora, was particularly problematic. Despite incomplete resolution, the barcode results revealed 22 misidentified herbarium specimens, and enabled the identification of field specimens which were otherwise too immature to identify. Although seven cases of ITS2 paralogy were noted in the families Cyperaceae, Juncaceae and Juncaginaceae, this intergenic spacer played an important role in resolving congeneric plant species at Churchill. Conclusions Our results provided fast and cost-effective solution to create a comprehensive, effective DNA barcode reference library for a local flora.
Article
Full-text available
Cladistic analysis of rbcL nucleotide sequences was applied to 58 taxa representing most subgenera and sections of Luzula and Juncus, chosen to reflect morphological and geographical diversity of both genera. Additionally, representatives of all other genera of the Juncaceae and two taxa from the Cyperaceae were included. Phylogenetic trees were constructed using parsimony with Prionium serratum as outgroup. The dataset has 190 parsimony informative sites. The analysis yielded more than 332,400 equally parsimonious trees (length 620, CI=0.47, RI=0.82). A jackknife analysis revealed several well-supported clades. Luzula is monophyletic and Juncus is non-monophyletic. Each of the generally accepted subgenera of Juncus, subg. Juncus and subg. Agathryon, form a clade, but their circumscription differs from the traditional views. The subgenera recognized in Luzula remain mainly unresolved. A well-supported clade is represented by an assemblage of representatives of five genera and species distributed in the Southern Hemisphere: Juncus capensis and J. lomatophyllus (both from section Graminifolii), Rostkovia, Distichia, Marsippospermum, and Patosia.
Article
Full-text available
Use of molecular data for phylogenetic studies in Juncaceae/Cyperaceae beginning in 1993 has resulted in an uncertain taxonomic position of the Andean Oxychloë Phil. Based on rbcL data Duvall & al. (1993), Plunkett & al. (1995) and Muasya (1998, 2000) suggested that the Andean Oxychloë was placed inside Cyperaceae or as their sister group, however, Oxychloë has many typically juncaceous morphological features (e.g., spiro- or ortostichous leaves, many ovules, prominent tepals) that differentiate it from Cyperaceae. The paper reveals relationships among the genera Oxychloë, Patosia, Distichia, Marsippospermum and Rostkovia based on both morphological and molecular data (cpDNA: rbcL, trnL intron and trnL-trnF intergenic spacer, matK and nrDNA: ITS1 & 2). Oxychloë is undoubtedly a member of Juncaceae based on both sources of data. Oxychloë andina and O. bisexualis are closely related and form a sister group to O. haumaniana and O. castellanosii based on strict consensus of molecular data, but not on morphological data by which their position remain unresolved. Close relationships of Patosia with Distichia and Rostkovia with Marsippospermum were confirmed.
Article
Full-text available
Recently, advocates of DNA taxonomy have complained that there is inadequate control of the taxonomy in databases such as GenBank. This is correct, but the uncertainty may be extended to the sequences themselves. The present study shows that as long as vouchers are available neither problem is fatal, but if no voucher exists, bad sequences and bad taxonomy may be forever linked. Previous phylogenetic analyses of rbcL sequences have indicated that the small, south hemisphere, genus Oxychloë (Juncaceae) surprisingly either is embedded within or is a sister group to the Cyperaceae. This is not in accordance with traditional or current morphological data. By studying five new accessions, representing four species of Oxychloë, and re-examining the two vouchers of O. andina that were used in previous phylogenies, it has been possible to show that these two sequences are erroneous. One is a chimeric sequence and the other is most likely “a contaminant.”
Article
All diploid species of Hordeum have been included in phylogenetic analyses of four molecular data sets supposedly from three different linkage groups. Two data sets stem from the nuclear genome: partial DMC1 (disrupted meiotic cDNA1) sequences (chromosome 3 in H. vulgare) and partial EF-G (elongation factor G) sequences (chromosome 2 in H. vulgare). The other two data sets are RFLP data and sequence data, rbcL (ribulose-1,5-bisphosphate carboxylase/oxygenase), from the plastid genome. Incongruence length difference tests show that the two nuclear and the two plastid data sets, respectively, are congruent, whereas the nuclear data and the plastid data are incongruent. The greatest incongruence is caused by the two Eurasian subspecies of H. marinum. The nuclear data support monophyly of the species, but the plastid data group one of the subspecies among the American taxa. The explanation is most likely lineage sorting. Based on a combined analysis of all data sets a new infrageneric classification of Hordeum including only monophyletic groups is presented. One new taxon, Hordeum L. sect. Sibirica (Nevski) G. Petersen & Seberg, stat. nov., is proposed. The distribution of selected morphological characters is discussed.
Article
The Parsimony Ratchet1 is presented as a new method for analysis of large data sets. The method can be easily implemented with existing phylogenetic software by generating batch command files. Such an approach has been implemented in the programs DADA (Nixon, 1998) and Winclada (Nixon, 1999). The Parsimony Ratchet has also been implemented in the most recent versions of NONA (Goloboff, 1998). These implementations of the ratchet use the following steps: (1) Generate a starting tree (e.g., a “Wagner” tree followed by some level of branch swapping or not). (2) Randomly select a subset of characters, each of which is given additional weight (e.g., add 1 to the weight of each selected character). (3) Perform branch swapping (e.g., “branch-breaking” or TBR) on the current tree using the reweighted matrix, keeping only one (or few) tree. (4) Set all weights for the characters to the “original” weights (typically, equal weights). (5) Perform branch swapping (e.g., branch-breaking or TBR) on the current tree (from step 3) holding one (or few) tree. (6) Return to step 2. Steps 2–6 are considered to be one iteration, and typically, 50–200 or more iterations are performed. The number of characters to be sampled for reweighting in step 2 is determined by the user; I have found that between 5 and 25% of the characters provide good results in most cases. The performance of the ratchet for large data sets is outstanding, and the results of analyses of the 500 taxon seed plant rbcL data set (Chase et al., 1993) are presented here. A separate analysis of a three-gene data set for 567 taxa will be presented elsewhere (Soltis et al., in preparation) demonstrating the same extraordinary power. With the 500-taxon data set, shortest trees are typically found within 22 h (four runs of 200 iterations) on a 200-MHz Pentium Pro. These analyses indicate efficiency increases of 20×–80× over “traditional methods” such as varying taxon order randomly and holding few trees, followed by more complete analyses of the best trees found, and thousands of times faster than nonstrategic searches with PAUP. Because the ratchet samples many tree islands with fewer trees from each island, it provides much more accurate estimates of the “true” consensus than collecting many trees from few islands. With the ratchet, Goloboff's NONA, and existing computer hardware, data sets that were previously intractable or required months or years of analysis with PAUP* can now be adequately analyzed in a few hours or days.
Article
New methods for parsimony analysis of large data sets are presented. The new methods are sectorial searches, tree-drifting, and tree-fusing. For Chase et al.'s 500-taxon data set these methods (on a 266-MHz Pentium II) find a shortest tree in less than 10 min (i.e., over 15,000 times faster than PAUP and 1000 times faster than PAUP*). Making a complete parsimony analysis requires hitting minimum length several times independently, but not necessarily all “islands” for Chase et al.'s data set, this can be done in 4 to 6 h. The new methods also perform well in other cases analyzed (which range from 170 to 854 taxa).
Article
A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homo logous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.