ArticlePDF Available

Abstract

GenBank, the public repository for nucleotide and protein sequences, is a critical resource for molecular biology, evolutionary biology, and ecology. While some attention has been drawn to sequence errors ([1][1]), common annotation errors also reduce the value of this database. In fact, for
www.sciencemag.org/cgi/content/full/319/5870/1616a/DC1
Supporting Online Material for
Preserving Accuracy in GenBank
M. I. Bidartondo et al.
E-mail: m.bidartondo@imperial.ac.uk
Published 21 March, Science 319, 1616 (2008)
DOI: 10.1126/science.319.5870.1616a
This PDF file includes:
Full author list
Thomas D. Bruns,
1
Meredith Blackwell,
2
Ivan Edwards,
3
Andy F. S. Taylor,
4
Thomas Horton,
5
Ning Zhang,
6
Urmas Kõljalg,
7
Georgiana May,
8
Thomas W.
Kuyper,
9
James D. Bever,
10
Gregory Gilbert,
11
John W. Taylor,
12
Todd Z.
DeSantis,
13
Anne Pringle,
14
James Borneman,
15
Greg Thorn,
16
Mary Berbee,
17
Gregory M. Mueller,
18
Gary L. Andersen,
19
Else C. Vellinga,
20
Sara Branco,
21
Ian
Anderson,
22
Ian A. Dickie,
23
Peter Avis,
24
Sari Timonen,
25
Rasmus Kjøller,
26
D. J.
Lodge,
27
Richard M. Bateman,
28
Andy Purvis,
29
Pedro W. Crous,
30
Christine
Hawkes,
31
Tim Barraclough,
32
Austin Burt,
33
R. H. Nilsson,
34
Karl-Henrik Larsson,
35
Ian Alexander,
36
Jean-Marc Moncalvo,
37
Jean Berube,
38
Joseph Spatafora,
39
H.
Thorsten Lumbsch,
40
Jaime E. Blair,
41
Sung-Oui Suh,
42
Donald H. Pfister,
43
Manfred
Binder,
44
Eric W. Boehm,
45
Linda Kohn,
46
Juan L. Mata,
47
Paul Dyer,
48
Gi-Ho
Sung,
49
Bryn Dentinger,
50
Emory G. Simmons,
51
Richard E. Baird,
52
Thomas J.
Volk,
53
Brian A. Perry,
54
Richard W. Kerrigan,
55
Jinx Campbell,
56
Jeewon Rajesh,
57
Don R. Reynolds,
58
David Geiser,
59
Richard A. Humber,
60
Natasha Hausmann,
61
Tim Szaro,
62
Jason Stajich,
63
Allen Gathman,
64
Kabir G. Peay,
65
Terry Henkel,
66
Clare H. Robinson,
67
Patricia J. Pukkila,
68
Nhu H. Nguyen,
69
Christopher Villalta,
70
Peter Kennedy,
71
Sarah Bergemann,
72
M. Catherine Aime,
73
Frank Kauff,
74
Andrea
Porras-Alfaro,
75
Cecile Gueidan,
76
Andreas Beck,
77
Birgitte Andersen,
78
Stephen
Marek,
79
Jo A. Crouch,
80
Julia Kerrigan,
81
Jean Beagle Ristaino,
82
Kathie T.
Hodge,
83
Gretchen Kuldau,
84
Gary J. Samuels,
85
Huzefa A. Raja,
86
Hermann
Voglmayr,
87
Monique Gardes,
88
David P. Janos,
89
Jack D. Rogers,
90
Paul Cannon,
91
Sandra W. Woolfolk,
92
H. C. Kistler,
93
Michael A. Castellano,
94
Sandra L.
Maldonado-Ramírez,
95
Paul M. Kirk,
96
James J. Farrar,
97
Todd Osmundson,
98
Randolph S. Currah,
99
Vladimir Vujanovic,
100
Weidong Chen,
101
Richard P. Korf,
102
Zahi K. Atallah,
103
Ken J. Harrison,
104
Josep Guarro,
105
Scott T. Bates,
106
Pierluigi
(Enrico) Bonello,
107
Paul Bridge,
108
Wiley Schell,
109
Walter Rossi,
110
Jan Stenlid,
111
Jens C. Frisvad,
112
R. M. Miller,
113
Scott E. Baker,
114
Heather E. Hallen,
115
Jeffrey E.
Janso,
116
Andrew W. Wilson,
117
Kenneth E. Conway,
118
Louise Egerton-
Warburton,
119
Zheng Wang,
120
Darin Eastburn,
121
Wellcome W. Hong Ho,
122
Scott
Kroken,
123
Marc Stadler,
124
Gillian Turgeon,
125
Robert W. Lichtwardt,
126
Elwin L.
Stewart,
127
Mats Wedin,
128
De-Wei Li,
129
Janice Y. Uchida,
130
Ari Jumpponen,
131
Ron
J. Deckert,
132
Henry J. Beker,
133
Scott O. Rogers,
134
Jianping Xu,
135
Peter
Johnston,
136
R.A. Shoemaker,
137
Miao Liu,
138
G. Marques,
139
Brett Summerell,
140
Serge Sokolski,
141
Ulf Thrane,
142
Paul Widden,
143
Johann N. Bruhn,
144
Virginia
Bianchinotti,
145
Dorothy Tuthill,
146
Timothy J. Baroni,
147
George Barron,
148
Kentaro
Hosaka,
149
Kelsea Jewell,
150
Meike Piepenbring,
151
Raymond Sullivan,
152
Gareth W.
Griffith,
153
S. G. Bradley,
154
Takayuki Aoki,
155
Wendy T. Yoder,
156
Yu-Ming Ju,
157
Shannon M. Berch,
158
Matt Trappe,
159
Weijun Duan,
160
Gregory Bonito,
161
Ruth A.
Taber,
162
Gilberto Coelho,
163
Gerald Bills,
164
Austen Ganley,
165
Reinhard Agerer,
166
László Nagy,
167
Barbara A. Roy,
168
Thomas Læssøe,
169
Nils Hallenberg,
170
Hans-
Volker Tichy,
171
Joost Stalpers,
172
Ewald Langer,
173
Markus Scholler,
174
Dirk
Krueger,
175
Giovanni Pacioni,
176
Reinhold Pöder,
177
Taina Pennanen,
178
Marina
Capelari,
179
Karen Nakasone,
180
J.P. Tewari,
181
Andrew N. Miller,
182
Cony Decock,
183
Sabine Huhndorf,
184
Mark Wach,
185
Helen S. Vishniac,
186
David S. Yohalem,
187
Matthew E. Smith,
188
Anthony E. Glenn,
189
Martin Spiering,
190
Daniel L. Lindner,
191
Conrad Schoch,
192
Scott A. Redhead,
193
Kelly Ivors,
194
Steven N. Jeffers,
195
József
Geml,
196
Florence Okafor,
197
Frederick W. Spiegel,
198
Damon Dewsbury,
199
Juliet
Carroll,
200
Terri M. Porter,
201
Catherine Pashley,
202
Steven E. Carpenter,
203
Gloria
Abad,
204
Kerstin Voigt,
205
Brett Arenz,
206
Andrew S. Methven,
207
Shannon
Schechter,
208
Paula Vance,
209
Dan Mahoney,
210
Seogchan Kang,
211
John P.
Rheeder,
212
James Mehl,
213
Matthew Greif,
214
George Ndzi Ngala,
215
Joe
Ammirati,
216
Masako Kawasaki,
217
Yuan Gwo-Fang,
218
Tadahiko Matsumoto,
219
David Smith,
220
Gina Koenig,
221
Daniel Luoma,
222
Tom May,
223
Marco Leonardi,
224
Lynne Sigler,
225
D. L. Taylor,
226
Cara Gibson,
227
Thomas Sharpton,
228
David L.
Hawksworth,
229
Jose Carmine Dianese,
230
Steven A. Trudell,
231
Barbara Paulus,
232
Mahajabeen Padamsee,
233
Philippe Callac,
234
Nelson Lima,
235
Merlin White,
236
C.
Barreau,
237
Juncai M. A.,
238
Bart Buyck,
239
Richard K. Rabeler,
240
Mark R. Liles,
241
Dwayne Estes,
242
Richard Carter,
243
J. M. Herr Jr.,
244
Gregory Chandler,
245
Jennifer
Kerekes,
246
Jennifer Cruse-Sanders,
247
R. Galán Márquez,
248
Egon Horak,
249
Michael Fitzsimons,
250
Heidi Döring,
251
Su Yao,
252
Nicole Hynson,
253
Martin
Ryberg,
254
A. E. Arnold,
255
Karen Hughes,
256
.
1
University of California, Berkeley, 94720, USA.
2
Louisiana State University, Baton Rouge, 70803, USA.
3
University of Michigan, Ann Arbor, 48109, USA.
4
Swedish University of Agricultural Sciences, Uppsala, 75007, Sweden.
5
SUNY-ESF, Ithaca, 13210, USA.
6
Cornell University, Ithaca, 14853, USA.
7
University of Tartu, Tartu, 51005, Estonia.
8
University of Minnesota, Twin Cities, 55108, USA.
9
Wageningen University, Wageningen, 6708, Netherlands.
10
Indiana University, Bloomington, 47405, USA.
11
University of California, Santa Cruz, 95064, USA.
12
University of California, Berkeley, 94720, USA.
13
Lawrence Berkeley National Laboratory, Berkeley, 94720, USA.
14
Harvard University, Cambridge, 2138, USA.
15
University of California, Riverside, 92521, USA.
16
University of Western Ontario, London, N6A 5B8, Canada.
17
University of British Columbia, Vancouver, V6T 1Z4, Canada.
18
Field Museum of Natural History, Chicago, 60605, USA.
19
Lawrence Berkeley National Laboratory, Berkeley, 94720, USA.
20
University of California, Berkeley, 94720, USA.
21
University of Chicagogo, Chicago, 60637, USA.
22
University of Western Sydney, Penrith South, NSW 1797, Australia.
23
Landcare Research, Lincoln, 7640, New Zealand.
24
Indiana U. Northwest and The Field Museum, Gary, 46408, USA.
25
University of Helsinki, Helsinki, 14, Finland.
26
Biological Institute U. of Copenhagen, Copenhagen, 1353, Denmark.
27
USDA Forest Service, Luquillo, 931, USA.
28
Royal Botanic Gardens, Kew, TW9 3DS, England.
29
Imperial College London, London, SW7 2AZ, England.
30
CBS Fungal Biodiversity Centre, Utrecht, 3584, Netherlands.
31
University of Texas at Austin, Austin, 78712, USA.
32
Imperial College London, London, SW7 2AZ, England.
33
Imperial College London, London, SW7 2AZ, England.
34
Göteborg University, Göteborg, 40530, Sweden.
35
Göteborg University, Göteborg, 40530, Sweden.
36
University of Aberdeen, Aberdeen, AB24 3UU, Scotland.
37
Royal Ontario Museum & U. of Toronto, Toronto, M5S 2C6, Canada.
38
Canadian Forest Service, Québec, G1V 4C7, Canada.
39
Oregon State University, Corvallis, 97331, USA.
40
Field Museum of Natural History, Chicago, 60605, USA.
41
Amherst College, Amherst, 1002, USA.
42
American Type Culture Collection, Manassas , 20110, USA.
43
Harvard University, Cambridge, 2138, USA.
44
Clark University, Worcester, 1610, USA.
45
Kean University, Union, 7083, USA.
46
University of Toronto, Toronto, L5L 1C6, Canada.
47
University of South Alabama, Mobile, 36688, USA.
48
University of Nottingham, Nottingham, NG7 2RD, England.
49
Oregon State University, Corvallis, 97331, USA.
50
Royal Ontario Museum/University of Toronto, Toronto, M5S 2C6, Canada.
51
Wabash College, Crawfordsville, 47933, USA.
52
Mississippi State University, Mississippi State, 39762, USA.
53
University of Wisconsin, La Crosse, 54601, USA.
54
San Francisco State University, San Francisco, 94132, USA.
55
Sylvan Research, Kittanning, 16201, USA.
56
University of Southern Mississippi, Hattiesburg, 39406, USA.
57
University of Hong Kong, Hong Kong., ,
58
University of California Herbarium, Berkeley, 94720, USA.
59
Penn State University, University Park, 16802, USA.
60
USDA-ARS Biological IPM Research, Ithaca, 14850, USA.
61
University of California, Berkeley, 94720, USA.
62
University of California, Berkeley, 94720, USA.
63
University of California, Berkeley, 94720, USA.
64
Southeast Missouri St. U., Cape Girardeau, 63701, USA.
65
University of California, Berkeley, 94720, USA.
66
Humboldt State University, Humboldt, 95521, USA.
67
University of Manchester, Manchester, M13 9PL, England.
68
University of North Carolina, Chapel Hill, 27514, USA.
69
University of California, Berkeley, 94720, USA.
70
University of California, Berkeley, 94720, USA.
71
Lewis and Clark College, Portland, 97219, USA.
72
Middle Tennessee State University, Murfreesboro, 37129, USA.
73
Louisiana State U. Agricultural Center, Baton Rouge, 70803, USA.
74
University of Kaiserslautern, Kaiserslautern, 67653, Germany.
75
University of New Mexico, Albuquerque, 87131, USA.
76
Duke University, Durham, 27708, USA.
77
Botanische Staatssammlung Muenchen, Munich, 80638, Germany.
78
Technical University of Denmark, Lyngby, 2800, Denmark.
79
Oklahoma State University, Stillwater, 74078, USA.
80
Rutgers University, New Brunswick, 8901, USA.
81
Clemson University, Clemson, 29634, USA.
82
North Carolina State University, Raleigh, 27695, USA.
83
Cornell University, Ithaca, 14853, USA.
84
Penn State University, University Park, 16802, USA.
85
USDA Systematic Mycology & Microbiology, Beltsville, 10300, USA.
86
University of Illinois, Urbana-Champaign, 61820, USA.
87
University of Vienna, Vienna, 1030, Austria.
88
University Paul Sabatier - Toulouse 3, Toulouse, 31062, France.
89
University of Miami, Oxford, 45056, USA.
90
Washington State University, Pullman, 99164, USA.
91
CABI, Egham, TW20 9TY, England.
92
Mississippi State University, Mississippi State, 39762, USA.
93
University of Minnesota, Saint Paul, 55108, USA.
94
USDA Forest Service, Corvallis, 97331, USA.
95
University of Puerto Rico, Mayaguez, 681, USA.
96
CABI, Egham, TW20 9TY, England.
97
California State University, Fresno, 93740, USA.
98
Columbia U. & New York Botanical Garden, New York, 10458, USA.
99
University of Alberta, Edmonton, T6G 2R3, Canada.
100
University of Saskatchewan, Saskatoon, S7N 5C9, Canada.
101
USDA-ARS, Washington State University, Pullman 99164, USA.
102
Cornell University, Ithaca, 14853, USA.
103
University of Wisconsin-Madison, Madison, 53706, USA.
104
Canadian Forest Service, Fredericton, E3B 5P7, Canada.
105
Rovira i Virgili University, Reus, 43201, Spain.
106
Arizona State University, Tempe, 85287, USA.
107
Ohio State University, Columbus, 43210, USA.
108
British Antarctic Survey, Cambridge, CB3 0ET, England.
109
Duke University, Durham, 27708, USA.
110
Università dell'Aquila, L'Aquila, 67040, Italy.
111
Swedish University of Agricultural Sciences, Uppsala, 75007, Sweden.
112
Technical University of Denmark, Lyngby, 2800, Denmark.
113
Argonne National Laboratory, Argonne, 60439, USA.
114
Pacific Northwest National Laboratory, Richland, 99352, USA.
115
Michigan State University, East Lansing, 48824, USA.
116
Wyeth Research, Pearl River, 10965, USA.
117
Clark University, Worcester, 1610, USA.
118
Oklahoma State University, Stillwater, 74078, USA.
119
Chicago Botanic Garden, Chicago, 60022, USA.
120
Yale University, New Haven, 6520, USA.
121
University of Illinois, Urbana-Champaign, 61820, USA.
122
MAF Biosecurity New Zealand, Wellington, , New Zealand.
123
University of Arizona, Tucson, 85721, USA.
124
U. of Bayreuth & InterMed Discovery GmbH, Dortmund, 44227, Germany.
125
Cornell University, Ithaca, 14853, USA.
126
University of Kansas, Lawrence, 66045, USA.
127
Penn State University, University Park, 16802, USA.
128
Swedish Museum of Natural History, Stockholm, 10405, Sweden.
129
Connecticut Agricultural Experiment Station, New Haven, 6511, USA.
130
University of Hawaii, Honolulu, 96822, USA.
131
Kansas State University, Manhattan, 66506, USA.
132
Weber State University, Ogden, 84408, USA.
133
Royal Holloway University of London, London, TW20 0EX, England.
134
Bowling Green State University, Bowling Green, 43403, USA.
135
McMaster University, Hamilton, L8S 4L8, Canada.
136
Landcare Research, Auckland, 1072, New Zealand.
137
Agriculture Canada (Emeritus), Ottawa, K1A 0C6, Canada.
138
Eastern Cereal and Oilseed Research Center, Ottawa, K1A 0C6, Canada.
139
University of Tras-os-Montes e Alto Douro, Vila Real, 5001801, Portugal.
140
National Herbarium of New South Wales, Sydney, 2000, Australia.
141
Université Laval, Québec, G1K 7P4, Canada.
142
Technical University of Denmark, Lyngby, 2800, Denmark.
143
Concordia University, Montréal, H3G 1M8, Canada.
144
University of Missouri, Columbia, 65211, USA.
145
Universidad Nacional del Sur, Bahia Blanca, B8000, Argentina.
146
University of Wyoming, Laramie, 82071, USA.
147
State University of New York, Cortland, 13045, USA.
148
University of Guelph, Guelph, N1G 2W1, Canada.
149
Field Museum of Natural History, Chicago, 60605, USA.
150
Seattle Children's Hospital Research Institute, Seattle, 98101, USA.
151
J.W. Goethe Universitaet, Frankfurt am Main, 60325, Germany.
152
Rutgers University, New Brunswick, 8901, USA.
153
Aberystwyth University, Aberystwyth, SY23 3DA, Wales.
154
Penn State College of Medicine, Hershey, 17033, USA.
155
National Institute of Agrobiological Sciences, Tsukuba, 3058602, Japan.
156
Novozymes Inc., Davis, 95618, USA.
157
Academia Sinica, Taipei, 115, Taiwan.
158
Ministry of Forests and Range, Victoria, V8W 9C4, Canada.
159
Oregon State University, Corvallis, 97331, USA.
160
Chinese Academy of Sciences, Beijing, 100864, China.
161
Duke University, Durham, 27708, USA.
162
Texas A & M University (Emeritus), College Station, 77843, USA.
163
Universidade Federal, Santa Maria, 97105, Brazil.
164
Merck Sharp & Dohme de España S.A., Madrid, 28027, Spain.
165
Massey University, Albany, , New Zealand.
166
University of Munich, Munich, 80638, Germany.
167
University of Szeged, Szeged, 6726, Hungary.
168
University of Oregon, Eugene, 97403, USA.
169
Copenhagen University, Copenhagen, 1353, Denmark.
170
Göteborg University, Göteborg, 40530, Sweden.
171
LUFA-ITL GmbH, Kiel, 24107, Germany.
172
CBS Fungal Biodiversity Centre, Utrecht, 3584, Netherlands.
173
University Kassel, Kassel, 34109, Germany.
174
Naturkundemuseum, Karlsruhe, 76133, Germany.
175
Umweltforschungszentrum, Halle, 4318, Germany.
176
University of L'Aquila, L'Aquila, 67040, Italy.
177
University of Innsbruck, Innsbruck, 6020, Austria.
178
Finnish Forest Research Institute, Helsinki, 170, Finland.
179
Instituto de Botânica, São Paulo, 4301, Brazil.
180
Northern Research Station USDA Forest Service, St. Paul, 55108, USA.
181
University of Alberta, Edmonton, T6G 2R3, Canada.
182
Illinois Natural History Survey, Champaign, 61820, USA.
183
MUCL, Louvain-la-Neuve, 1348, Belgium.
184
Field Museum of Natural History, Chicago, 60605, USA.
185
Sylvan Research, Kittanning, 16201, USA.
186
Oklahoma State University, Stillwater, 74078, USA.
187
East Malling Research, East Malling, ME19 6BJ, England.
188
Harvard University, Cambridge, 2138, USA.
189
USDA Toxicology & Mycotoxin Research, College Station, 77843, USA.
190
University of Dublin, Trinity College, Dublin, Ireland.
191
USDA-Forest Service, Madison, 53726, USA.
192
Oregon State University, Corvallis, 97331, USA.
193
National Mycological Herbarium, Ottawa, K1A 0C6, Canada.
194
North Carolina State University, Fletcher, 28732, USA.
195
Clemson University, Clemson, 29634, USA.
196
University of Alaska, Fairbanks, 99709, USA.
197
Alabama A&M University, Normal, 35762, USA.
198
University of Arkansas, Fayetteville, 72701, USA.
199
University of Toronto, Toronto, L5L 1C6, Canada.
200
Cornell University, Ithaca, 14853, USA.
201
University of Toronto, Toronto, L5L 1C6, Canada.
202
University of Leicester, Leicester, LE1 7RH, England.
203
Abbey Lane Laboratory, Philomath, 97370, USA.
204
USDA-APHIS-PPQ-PHP-PSPI-NIS, Beltsville, 10300, USA.
205
Fungal Reference Centre University Jena, Jena, 7745, Germany.
206
University of Minnesota, St. Paul, 55108, USA.
207
Eastern Illinois University, Charleston, 61920, USA.
208
University of California, Berkeley, 94720, USA.
209
Microbiology Specialists Inc., Houston, 77054, USA.
210
Private Mycological Research, Lower Hutt, , New Zealand.
211
Penn State University, University Park, 16802, USA.
212
Medical Research Council, Tygerberg, 7505, South Africa.
213
FABI University of Pretoria, Pretoria, 0002, South Africa.
214
Field Museum of Natural History, Chicago, 60605, USA.
215
Bamenda University, Bamenda, , Cameroon.
216
University of Washington, Seattle, 98195, USA.
217
Kanazawa Medical University, Kanazawa, 9200293, Japan.
218
Bioresource Collection & Research Center FIRDI, Hsinchu, 300, Taiwan.
219
Juntendo Univ. and Kurume Univ., Tokyo, 1130033, Japan.
220
World Federation for Culture Collections, Egham, TW20 9TY, England.
221
Roche Molecular Systems, Alameda, 94501, USA.
222
Oregon State University, Corvallis, 97331, USA.
223
Royal Botanic Gardens, Melbourne, VIC3004, Australia.
224
Università degli Studi di L'Aquila, L'Aquila, 67040, Italy.
225
University of Alberta, Edmonton, T6G 2R3, USA.
226
University of Alaska, Fairbanks, 99709, USA.
227
University of Arizona, Tucson, 85721, USA.
228
University of California, Berkeley, 94720, USA.
229
Natural History Museum, Madrid, 28006, Spain.
230
Universidade de Brasilia, Brasilia, 70910, Brazil.
231
University of Washington, Seattle, 98195, USA.
232
Landcare Research, Auckland, 1072, New Zealand.
233
University of Minnesota, Saint Paul, 55108, USA.
234
INRA, Bordeaux, 33140, France.
235
Micoteca da Universidade do Minho, Braga, 4710, Portugal.
236
Boise State University, Boise, 83725, USA.
237
CNRS/INRA, Bordeaux, 33140, France.
238
Chinese Academy of Sciences, Beijing, 100864, China.
239
National Museum of Natural History, Paris, 75005, France.
240
University of Michigan, Ann Arbor, 48109, USA.
241
Auburn University, Auburn, 36849, USA.
242
Austin Peay State University, Clarksville, 37044, USA.
243
Valdosta State University, Valdosta, 31698, USA.
244
University of South Carolina, Columbia, 29208, USA.
245
University of North Carolina, Wilmington, 28403, USA.
246
University of California, Berkeley, 94720, USA.
247
Salem College Herbarium, Salem, 27101, USA.
248
Alcalá University, Madrid, 28801, Spain.
249
Zurich Herbarium, Zurich, 8008, Switzerland.
250
University of Chicago, Chicago, 60637, USA.
251
Royal Botanic Gardens, Kew, TW9 3DS, England.
252
China Center of Industrial Culture Collection, Beijing, 100027, China.
253
University of California, Berkeley, 94720, USA.
254
Göteborg University, Göteborg, 40530, Sweden.
255
University of Arizona, Tucson, 85721, USA.
256
University of Tennessee, Knoxville, 37996, USA.
... Taxonomy Browser (NCBI) (Federhen, 2012), which uses BLAST + (Camacho et al., 2009), was then used to verify the correct taxonomic names -those entered by the researchers who deposited the sequences to NCBI -for each of the public sequences. We did this to remove sequences that were deposited to NCBI under an incorrect genus-level designation (Bruns et al., 2008). The feature for geographic region (such as "country") was also extracted for each sequence and any sequences without this corresponding metadatum were excluded. ...
... Both the ITS1 and ITS2 marker regions failed to resolve these taxa with any certainty. However, the other possibility for this observation is not because of poor resolution from the markers, but because of a misapplication of names deposited into GenBank (Bruns et al., 2008). In other instances, the ITS regions are also too variable and may lead to an overestimation of individuals in other species complexes (Dunne et al., 2002). ...
Article
Full-text available
Armillaria is a globally distributed fungal genus most notably composed of economically important plant pathogens that are found predominantly in forest and agronomic systems. The genus sensu lato has more recently received attention for its role in woody plant decomposition and in mycorrhizal symbiosis with specific plants. Previous phylogenetic analyses suggest that around 50 species are recognized globally. Despite this previous work, no studies have analyzed the global species richness and distribution of the genus using data derived from fungal community sequencing datasets or barcoding initiatives. To assess the global diversity and species richness of Armillaria, we mined publicly available sequencing datasets derived from numerous primer regions for the ribosomal operon, as well as ITS sequences deposited on Genbank, and clustered them akin to metabarcoding studies. Our estimates reveal that species richness ranges from 50 to 60 species, depending on whether the ITS1 or ITS2 marker is used. Eastern Asia represents the biogeographic region with the highest species richness. We also assess the overlap of species across geographic regions and propose some hypotheses regarding the drivers of variability in species diversity and richness between different biogeographic regions.
... Consensus sequences were manually checked for the insertion, deletion and repetition regions to ensure that the sequence difference did not expand the divergence or reduce the identity score. Consensus sequences of each sequence group were compared (using BLASTn) to the NCBI Nucleotide database to identify species, and were further compared to the voucher sequences and primers used in diagnostic PCR [49,57,[66][67][68] in order to avoid referencing improperly presented or erroneous sequences submitted to GenBank [69,70]. The keywords "(species name) and ITS2/COII" were used to search the ITS2 or COII sequences of the 13 Anopheles species deposited in GenBank. ...
Article
Full-text available
Background To develop an effective malaria vector intervention method in forested international border regions within the Greater Mekong Subregion (GMS), more in-depth studies should be conducted on local Anopheles species composition and bionomic features. There is a paucity of comprehensive surveys of biodiversity integrating morphological and molecular species identification conducted within the border of Laos and Cambodia. Methods A total of 2394 adult mosquitoes were trapped in the Cambodia–Laos border region. We first performed morphological identification of Anopheles mosquitoes and subsequently performed molecular identification using 412 recombinant DNA–internal transcribed spacer 2 (rDNA-ITS2) and 391 mitochondrial DNA–cytochrome c oxidase subunit 2 (mtDNA-COII) sequences. The molecular and morphological identification results were compared, and phylogenetic analysis of rDNA-ITS2 and mtDNA-COII was conducted for the sequence divergence among species. Results Thirteen distinct species of Anopheles were molecularly identified in a 26,415 km² border region in Siem Pang (Cambodia) and Pathoomphone (Laos). According to the comparisons of morphological and molecular identity, the interpretation of local species composition for dominant species in the Cambodia–Laos border (An. dirus, An. maculatus, An. philippinensis, An. kochi and An. sinensis) achieved the highest accuracy of morphological identification, from 98.37 to 100%. In contrast, the other species which were molecularly identified were less frequently identified correctly (0–58.3%) by morphological methods. The average rDNA-ITS2 and mtDNA-COII interspecific divergence was respectively 318 times and 15 times higher than their average intraspecific divergence. The barcoding gap ranged from 0.042 to 0.193 for rDNA-ITS2, and from 0.033 to 0.047 for mtDNA-COII. Conclusions The Cambodia–Laos border hosts a high diversity of Anopheles species. The morphological identification of Anopheles species provides higher accuracy for dominant species than for other species. Molecular methods combined with morphological analysis to determine species composition, population dynamics and bionomic characteristics can facilitate a better understanding of the factors driving malaria transmission and the effects of interventions, and can aid in achieving the goal of eliminating malaria. Graphical Abstract
... Although it has been suggested that the internal transcribed spacer (ITS) region of the rRNA gene should be adopted as the universal fungal marker (Schoch et al. 2012;Lindahl et al. 2013), for the AMF in Glomeromycota this region is not optimal for two major reasons (Stockinger et al. 2010;Schoch et al. 2012). First, the sequence matching approach used for ITS sequences with other fungi is of limited utility for AMF because of the poor representation and poor curation of AMF sequences in ITS sequence databases (Bidartondo 2008;Stockinger et al. 2010). This database problem cannot be easily addressed because a high proportion of AMF encountered in environmental samples are undescribed. ...
Article
Full-text available
Arbuscular mycorrhizal fungi (AMF; Glomeromycota) are difficult to culture; therefore, establishing a robust amplicon-based approach to taxa identification is imperative to describe AMF diversity. Further, due to low and biased sampling of AMF taxa, molecular databases do not represent the breadth of AMF diversity, making database matching approaches suboptimal. Therefore, a full description of AMF diversity requires a tool to determine sequence-based placement in the Glomeromycota clade. Nonetheless, commonly used gene regions, including the SSU and ITS, do not enable reliable phylogenetic placement. Here, we present an improved database and pipeline for the phylogenetic determination of AMF using amplicons from the large subunit (LSU) rRNA gene. We improve our database and backbone tree by including additional outgroup sequences. We also improve an existing bioinformatics pipeline by aligning forward and reverse reads separately, using a universal alignment for all tree building, and implementing a BLAST screening prior to tree building to remove non-homologous sequences. Finally, we present a script to extract AMF belonging to 11 major families as well as an amplicon sequencing variant (ASV) version of our pipeline. We test the utility of the pipeline by testing the placement of known AMF, known non-AMF, and Acaulospora sp. spore sequences. This work represents the most comprehensive database and pipeline for phylogenetic placement of AMF LSU amplicon sequences within the Glomeromycota clade.
... Taxonomic variants such as synonyms and alternate representation designating the same taxon, are an additional source of mismatches (e.g., Magallana gigas and its alternate representation, but widely used name, Crassostrea gigas, Salvi et al. 2014, Bayne et al. 2017, Backeljau 2018. These types of inaccuracies and limitations are customarily shared and experienced by biodiversity databases (Bidartondo 2008, Patterson et al. 2010, Meiklejohn et al. 2019). Data discordances due to operational errors are also known to arise during collection, sampling and laboratory procedures, such as specimen and/or tissue sample mislabeling, cross-contamination, or non-targeted PCR amplification (Buhay 2009, Siddall et al. 2009, Evans and Paulay 2012. ...
Article
Full-text available
The accuracy of specimen identification through DNA barcoding and metabarcoding relies on reference libraries containing records with reliable taxonomy and sequence quality. The considerable growth in barcode data requires stringent data curation, especially in taxonomically difficult groups such as marine invertebrates. A major effort in curating marine barcode data in the Barcode of Life Data Systems (BOLD) was undertaken during the 8 th International Barcode of Life Conference (Trondheim, Norway, 2019). Major taxo-nomic groups (crustaceans, echinoderms, molluscs, and polychaetes) were reviewed to identify those which had disagreement between Linnaean names and Barcode Index Numbers (BINs). The records with disagreement were annotated with four tags: a) MIS-ID (mis-identified, mislabeled, or contaminated records), b) AMBIG (ambiguous records unresolved with the existing data), c) COMPLEX (species names occurring in multiple BINs), and d) SHARE (barcodes shared between species). A total of 83,712 specimen records corresponding to 7,576 species were reviewed and 39% of the species were tagged (7% MIS-ID, 17% AMBIG, 14% COMPLEX, and 1% SHARE). High percentages (>50%) of AMBIG tags were recorded in gastropods, whereas COMPLEX tags dominated in crustaceans and polychaetes. The high proportion of tagged species reflects either flaws in the barcoding workflow (e.g., misidentification, cross-contamination) or taxonomic difficulties (e.g., synonyms, undescribed species). Although data curation is essential for barcode applications, such manual attempts to examine large datasets are unsustainable and automated solutions are extremely desirable.
... If the now mislabelled query is then deposited in GenBank, the error is compounded and perpetuated. Unfortunately, reference sequences in GenBank can only be deleted or renamed by the original author(s), making it hard to correct errors (Bidartondo et al., 2008). Consequently, some query sequences may be incorrectly identified with high confidence, or the opposite: assigned a scientific name with low confidence (because of a conflicting match with an incorrect name) when it is actually the correct identification (Lücking et al., 2020). ...
Article
Full-text available
Simple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an environmental metabarcoding dataset of partial large subunit rRNA sequences of Basidiomycota and basal fungi. We employed the simple matching method using the QIIME 2 classifier and the RDP Classifier in conjunction with the latest releases of the SILVA (138.1, 2020) and RDP (11, 2014) reference datasets and then compared the results with a manual phylogenetic binning approach. Of the 71 query sequences tested, 21 and 42% were misidentified using QIIME 2 and the RDP Classifier, respectively. Of these simple matching misidentifications, more than half resulted from the underrepresentation of various groups of fungi in the SILVA and RDP reference datasets. More comprehensive reference datasets with fewer misidentified sequences will increase the accuracy of simple matching identifications. However, we argue that the phylogenetic binning approach is a better alternative to simple matching since, in addition to better accuracy, it provides evolutionary information about query sequences.
... Even though broad-scale analyses suggest that these data are generally reliable (Leray et al. 2019), errors in the sequence itself (e.g. wrong nucleotide, or more complex errors like insertions, deletions, inversions, duplications or pseudogene sequences) and taxonomic mislabeling can occur in public sequence databases, especially for organisms which are difficult to identify based on morphology (Bridge et al. 2003, Bidartondo 2008, Valkiūnas et al. 2008, Mioduchowska et al. 2018. While the first type of error will affect within-species sequence similarity negatively, sometimes substantially, the effect of the second type is more diffuse. ...
Preprint
Clustering approaches are pivotal to handle the many sequence variants obtained in DNA metabarcoding datasets, therefore they have become a key step of metabarcoding analysis pipelines. Clustering often relies on a sequence similarity threshold to gather sequences in Molecular Operational Taxonomic Units (MOTUs) that ideally each represent a homogeneous taxonomic entity, e.g. a species or a genus. However, the choice of the clustering threshold is rarely justified, and its impact on MOTU over-splitting or over-merging even less tested. Here, we evaluated clustering threshold values for several metabarcoding markers under different criteria: limitation of MOTU over-merging, limitation of MOTU over-splitting, and trade-off between over-merging and over-splitting. We extracted sequences from a public database for eight markers, ranging from generalist markers targeting Bacteria or Eukaryota, to more specific markers targeting a class or a subclass (e.g. Insecta, Oligochaeta). Based on the distributions of pairwise sequence similarities within species and within genera and on the rates of over-splitting and over-merging across different clustering thresholds, we were able to propose threshold values minimizing the risk of over-splitting, that of over-merging, or offering a trade-off between the two risks. For generalist markers, high similarity thresholds (0.96-0.99) are generally appropriate, while more specific markers require lower values (0.85-0.96). These results do not support the use of a fixed clustering threshold (e.g. 0.97). Instead, we advocate a careful examination of the most appropriate threshold based on the research objectives, the potential costs of over-splitting and over-merging, and the features of the studied markers.
Article
The advancement in high-throughput sequencing (HTS) technology allows the detection of pathogens without the need for isolation or template amplification. Plant regulatory agencies worldwide are adopting HTS as a pre-screening tool for plant pathogens in imported plant germplasm. The technique is a multipronged process, and often the bioinformatic analysis complicates detection. Previously we developed E-probe Diagnostic Nucleic acid Analysis (EDNA), a bioinformatic tool that detects pathogens in HTS data. EDNA uses custom databases of signature nucleic acid sequences (e-probes) to reduce computational effort and subjectivity when determining pathogen presence in a sample. E-probes of Pythium ultimum (Trow) and Phytophthora ramorum (Werres, De Cock & Man in’t Veld) were previously validated only using simulated HTS data. However, HTS samples generated from infected hosts or pure culture may vary in pathogen concentration, sequencing bias, and data quality, suggesting that each pathosystem requires further validation. Here we used metagenomic and genomic HTS data generated from infected hosts and pure culture respectively, to further validate and curate e-probes of Py. ultimum and Ph. ramorum. E-probe length was found to be a determinant of diagnostic sensitivity and specificity; 80-nucleotides e-probes increased the diagnostic specificity to 100%. Curating e-probes to increase specificity affected diagnostic sensitivity only for 80-nucleotides Py. ultimum e-probes. Comparing e-probes with alternative databases and bioinformatic tools in their speed and ability to find Py. ultimum and Ph. ramorum demonstrated that while pathogen sequence reads were detected by other methods, they were less specific and slower when compared with e-probes.
Article
Sexual reproduction is the basic way to form high genetic diversity and it is beneficial in evolution and speciation of fungi. The global diversity of teleomorphic species in Ascomycota has not been estimated. This paper estimates the species number for sexual ascomycetes based on five different estimation approaches, viz. by numbers of described fungi, by fungus:substrate ratio, by ecological distribution, by meta-DNA barcoding or culture-independent studies and by previous estimates of species in Ascomycota. The assumptions were made with the currently most accepted, “2.2–3.8 million” species estimate and results of previous studies concluding that 90% of the described ascomycetes reproduce sexually. The Catalogue of Life, Species Fungorum and published research were used for data procurement. The average value of teleomorphic species in Ascomycota from all methods is 1.86 million, ranging from 1.37 to 2.56 million. However, only around 83,000 teleomorphic species have been described in Ascomycota and deposited in data repositories. The ratio between described teleomorphic ascomycetes to predicted teleomorphic ascomycetes is 1:22. Therefore, where are the undiscovered teleomorphic ascomycetes? The undescribed species are no doubt to be found in biodiversity hot spots, poorly-studied areas and species complexes. Other poorly studied niches include extremophiles, lichenicolous fungi, human pathogens, marine fungi, and fungicolous fungi. Undescribed species are present in unexamined collections in specimen repositories or incompletely described earlier species. Nomenclatural issues, such as the use of separate names for teleomorph and anamorphs, synonyms, conspecific names, illegitimate and invalid names also affect the number of described species. Interspecies introgression results in new species, while species numbers are reduced by extinctions.
Article
Positive interactions between non-native species can accelerate their invasion rate and exacerbate their impacts. This has been shown for non-native mammals that disperse invasive ectomycorrhizal fungi (EMF), in turn facilitating the invasion of non-native tree species. Mammal-mediated dispersion is assumed to be the main mechanism of EMF long distance dispersal, being particularly critical for truffle-like EMF species. We asked whether the absence of non-native mammals is an obstacle for Pinaceae invasion given the lack of invasive EMF being dispersed. We studied EMF species colonization and Pseudotsuga menziesii (Douglas-fir) trees’ growth in soil from mainland sites where non-native mammals are highly abundant, and lake islets in which they have been historically absent. Contrary to what we expected, we found invasive EMF, including truffle-like species, in sites where invasive mammals have been historically absent. Douglas-fir trees grew equally well and had the same EMF colonization in soil from mainland and islets. Alternative mechanisms of EMF dispersal, such as saltation, bird dispersal, or human dispersal, can be involved in their arrival to native stands. The presence of invasive EMF makes native sites vulnerable to Pinaceae invasion, even in the absence of mammalian dispersers.
Article
Full-text available
Wildlife forensic analyses are frequently concerned with taxonomic identification, and very often employ amplification and Sanger sequencing of informative regions of the genome to achieve this. The material submitted to wildlife forensic laboratories for taxonomic identification span a wide scope, from plant and animal parts in trade to assemblages of incidental biota at crime scenes. As these analyses take place within the context of legal proceedings, the wildlife forensic community is subject to unique requirements and considerations. These requirements and considerations are quite different from those of human forensic DNA, and have driven standardization in this field. While there has been extensive debate over the use of DNA-based methods for taxonomic identification of a wide variety of biota in research settings, there has been little discussion on the issues associated with this approach in the high scrutiny environment of forensic science. This review outlines: key procedural and biological factors that may impact the accuracy of interpretation and reporting taxonomic identifications; resulting conventions employed by the wildlife forensics community; and implications for the use of emergent DNA sequencing technologies in taxonomic identification of wildlife in casework.
Article
Full-text available
DNA sequences are increasingly seen as one of the primary information sources for species identification in many organism groups. Such approaches, popularly known as barcoding, are underpinned by the assumption that the reference databases used for comparison are sufficiently complete and feature correctly and informatively annotated entries. The present study uses a large set of fungal DNA sequences from the inclusive International Nucleotide Sequence Database to show that the taxon sampling of fungi is far from complete, that about 20% of the entries may be incorrectly identified to species level, and that the majority of entries lack descriptive and up-to-date annotations. The problems with taxonomic reliability and insufficient annotations in public DNA repositories form a tangible obstacle to sequence-based species identification, and it is manifest that the greatest challenges to biological barcoding will be of taxonomical, rather than technical, nature.
Article
Full-text available
Public sequence databases contain information on the sequence, structure and function of proteins. Genome sequencing projects have led to a rapid increase in protein sequence information, but reliable, experimentally verified, information on protein function lags a long way behind. To address this deficit, functional annotation in protein databases is often inferred by sequence similarity to homologous, annotated proteins, with the attendant possibility of error. Now, the functional annotation in these homologous proteins may itself have been acquired through sequence similarity to yet other proteins, and it is generally not possible to determine how the functional annotation of any given protein has been acquired. Thus the possibility of chains of misannotation arises, a process we term ‘error percolation’. With some simple assumptions, we develop a dynamical probabilistic model for these misannotation chains. By exploring the consequences of the model for annotation quality it is evident that this iterative approach leads to a systematic deterioration of database quality. Contact: WRG: wally.gilks@mrc-bsu.cam.ac.uk; BA and CAO: audit@ebi.ac.uk; ouzounis@ebi.ac.uk * To whom correspondence should be addressed. † Both these authors contributed equally to this work.
Article
Sequencing mitochondrial DNA (mtDNA) is now a routine laboratory procedure. Most journals insist that published sequences be submitted to data bases such as GenBank, where they are publicly available. But quality control of the raw data often depends solely on the original scientists. So just how reliable are the sequences in the data bases? According to a new paper by Forster in the Annals of Human Genetics, more than half of all published human mtDNA studies contain mistakes, a level so high that geneticists could be drawing incorrect conclusions in population and evolutionary studies. Much greater controls are needed, both from journals and from individual scientists. Fortunately, some new methods for detecting errors using phylogenetic networks have recently been proposed. How effective these are remains to be tested.
Article
Guidelines for submitting commentsPolicy: Comments that contribute to the discussion of the article will be posted within approximately three business days. We do not accept anonymous comments. Please include your email address; the address will not be displayed in the posted comment. Cell Press Editors will screen the comments to ensure that they are relevant and appropriate but comments will not be edited. The ultimate decision on publication of an online comment is at the Editors' discretion. Formatting: Please include a title for the comment and your affiliation. Note that symbols (e.g. Greek letters) may not transmit properly in this form due to potential software compatibility issues. Please spell out the words in place of the symbols (e.g. replace “α” with “alpha”). Comments should be no more than 8,000 characters (including spaces ) in length. References may be included when necessary but should be kept to a minimum. Be careful if copying and pasting from a Word document. Smart quotes can cause problems in the form. If you experience difficulties, please convert to a plain text file and then copy and paste into the form.
  • J D Harris
J. D. Harris, Trends Ecol. Evol. 18, 317 (2003).
  • R H Nilsson
R. H. Nilsson et al., PLoS ONE 1, e59 (2006).
  • W R Gilks
W. R. Gilks et al., Bioinformatics 18, 1641 (2002).
139 University of Tras-os-Montes e Alto Douro
  • Eastern Cereal
  • Oilseed Research
  • Center
Eastern Cereal and Oilseed Research Center, Ottawa, K1A 0C6, Canada. 139 University of Tras-os-Montes e Alto Douro, Vila Real, 5001801, Portugal.
41 Sung-Oui Suh, 42 Donald H. Pfister, 43 Manfred Binder
  • Thorsten Lumbsch
  • Jaime E Blair
Thorsten Lumbsch, 40 Jaime E. Blair, 41 Sung-Oui Suh, 42 Donald H. Pfister, 43 Manfred Binder, 44 Eric W. Boehm, 45 Linda Kohn, 46 Juan L. Mata, 47 Paul Dyer, 48 Gi-Ho Sung, 49 Bryn Dentinger, 50 Emory G. Simmons, 51 Richard E. Baird, 52 Thomas J.
Catherine Aime, 73 Frank Kauff, 74 Andrea Porras
  • Peter Kennedy
  • Sarah Bergemann
Peter Kennedy, 71 Sarah Bergemann, 72 M. Catherine Aime, 73 Frank Kauff, 74 Andrea Porras-Alfaro, 75 Cecile Gueidan, 76 Andreas Beck, 77 Birgitte Andersen, 78 Stephen Marek, 79 Jo A. Crouch, 80 Julia Kerrigan, 81 Jean Beagle Ristaino, 82 Kathie T.