ArticlePDF Available

Abstract

GenBank, the public repository for nucleotide and protein sequences, is a critical resource for molecular biology, evolutionary biology, and ecology. While some attention has been drawn to sequence errors ([1][1]), common annotation errors also reduce the value of this database. In fact, for
www.sciencemag.org/cgi/content/full/319/5870/1616a/DC1
Supporting Online Material for
Preserving Accuracy in GenBank
M. I. Bidartondo et al.
E-mail: m.bidartondo@imperial.ac.uk
Published 21 March, Science 319, 1616 (2008)
DOI: 10.1126/science.319.5870.1616a
This PDF file includes:
Full author list
Thomas D. Bruns,
1
Meredith Blackwell,
2
Ivan Edwards,
3
Andy F. S. Taylor,
4
Thomas Horton,
5
Ning Zhang,
6
Urmas Kõljalg,
7
Georgiana May,
8
Thomas W.
Kuyper,
9
James D. Bever,
10
Gregory Gilbert,
11
John W. Taylor,
12
Todd Z.
DeSantis,
13
Anne Pringle,
14
James Borneman,
15
Greg Thorn,
16
Mary Berbee,
17
Gregory M. Mueller,
18
Gary L. Andersen,
19
Else C. Vellinga,
20
Sara Branco,
21
Ian
Anderson,
22
Ian A. Dickie,
23
Peter Avis,
24
Sari Timonen,
25
Rasmus Kjøller,
26
D. J.
Lodge,
27
Richard M. Bateman,
28
Andy Purvis,
29
Pedro W. Crous,
30
Christine
Hawkes,
31
Tim Barraclough,
32
Austin Burt,
33
R. H. Nilsson,
34
Karl-Henrik Larsson,
35
Ian Alexander,
36
Jean-Marc Moncalvo,
37
Jean Berube,
38
Joseph Spatafora,
39
H.
Thorsten Lumbsch,
40
Jaime E. Blair,
41
Sung-Oui Suh,
42
Donald H. Pfister,
43
Manfred
Binder,
44
Eric W. Boehm,
45
Linda Kohn,
46
Juan L. Mata,
47
Paul Dyer,
48
Gi-Ho
Sung,
49
Bryn Dentinger,
50
Emory G. Simmons,
51
Richard E. Baird,
52
Thomas J.
Volk,
53
Brian A. Perry,
54
Richard W. Kerrigan,
55
Jinx Campbell,
56
Jeewon Rajesh,
57
Don R. Reynolds,
58
David Geiser,
59
Richard A. Humber,
60
Natasha Hausmann,
61
Tim Szaro,
62
Jason Stajich,
63
Allen Gathman,
64
Kabir G. Peay,
65
Terry Henkel,
66
Clare H. Robinson,
67
Patricia J. Pukkila,
68
Nhu H. Nguyen,
69
Christopher Villalta,
70
Peter Kennedy,
71
Sarah Bergemann,
72
M. Catherine Aime,
73
Frank Kauff,
74
Andrea
Porras-Alfaro,
75
Cecile Gueidan,
76
Andreas Beck,
77
Birgitte Andersen,
78
Stephen
Marek,
79
Jo A. Crouch,
80
Julia Kerrigan,
81
Jean Beagle Ristaino,
82
Kathie T.
Hodge,
83
Gretchen Kuldau,
84
Gary J. Samuels,
85
Huzefa A. Raja,
86
Hermann
Voglmayr,
87
Monique Gardes,
88
David P. Janos,
89
Jack D. Rogers,
90
Paul Cannon,
91
Sandra W. Woolfolk,
92
H. C. Kistler,
93
Michael A. Castellano,
94
Sandra L.
Maldonado-Ramírez,
95
Paul M. Kirk,
96
James J. Farrar,
97
Todd Osmundson,
98
Randolph S. Currah,
99
Vladimir Vujanovic,
100
Weidong Chen,
101
Richard P. Korf,
102
Zahi K. Atallah,
103
Ken J. Harrison,
104
Josep Guarro,
105
Scott T. Bates,
106
Pierluigi
(Enrico) Bonello,
107
Paul Bridge,
108
Wiley Schell,
109
Walter Rossi,
110
Jan Stenlid,
111
Jens C. Frisvad,
112
R. M. Miller,
113
Scott E. Baker,
114
Heather E. Hallen,
115
Jeffrey E.
Janso,
116
Andrew W. Wilson,
117
Kenneth E. Conway,
118
Louise Egerton-
Warburton,
119
Zheng Wang,
120
Darin Eastburn,
121
Wellcome W. Hong Ho,
122
Scott
Kroken,
123
Marc Stadler,
124
Gillian Turgeon,
125
Robert W. Lichtwardt,
126
Elwin L.
Stewart,
127
Mats Wedin,
128
De-Wei Li,
129
Janice Y. Uchida,
130
Ari Jumpponen,
131
Ron
J. Deckert,
132
Henry J. Beker,
133
Scott O. Rogers,
134
Jianping Xu,
135
Peter
Johnston,
136
R.A. Shoemaker,
137
Miao Liu,
138
G. Marques,
139
Brett Summerell,
140
Serge Sokolski,
141
Ulf Thrane,
142
Paul Widden,
143
Johann N. Bruhn,
144
Virginia
Bianchinotti,
145
Dorothy Tuthill,
146
Timothy J. Baroni,
147
George Barron,
148
Kentaro
Hosaka,
149
Kelsea Jewell,
150
Meike Piepenbring,
151
Raymond Sullivan,
152
Gareth W.
Griffith,
153
S. G. Bradley,
154
Takayuki Aoki,
155
Wendy T. Yoder,
156
Yu-Ming Ju,
157
Shannon M. Berch,
158
Matt Trappe,
159
Weijun Duan,
160
Gregory Bonito,
161
Ruth A.
Taber,
162
Gilberto Coelho,
163
Gerald Bills,
164
Austen Ganley,
165
Reinhard Agerer,
166
László Nagy,
167
Barbara A. Roy,
168
Thomas Læssøe,
169
Nils Hallenberg,
170
Hans-
Volker Tichy,
171
Joost Stalpers,
172
Ewald Langer,
173
Markus Scholler,
174
Dirk
Krueger,
175
Giovanni Pacioni,
176
Reinhold Pöder,
177
Taina Pennanen,
178
Marina
Capelari,
179
Karen Nakasone,
180
J.P. Tewari,
181
Andrew N. Miller,
182
Cony Decock,
183
Sabine Huhndorf,
184
Mark Wach,
185
Helen S. Vishniac,
186
David S. Yohalem,
187
Matthew E. Smith,
188
Anthony E. Glenn,
189
Martin Spiering,
190
Daniel L. Lindner,
191
Conrad Schoch,
192
Scott A. Redhead,
193
Kelly Ivors,
194
Steven N. Jeffers,
195
József
Geml,
196
Florence Okafor,
197
Frederick W. Spiegel,
198
Damon Dewsbury,
199
Juliet
Carroll,
200
Terri M. Porter,
201
Catherine Pashley,
202
Steven E. Carpenter,
203
Gloria
Abad,
204
Kerstin Voigt,
205
Brett Arenz,
206
Andrew S. Methven,
207
Shannon
Schechter,
208
Paula Vance,
209
Dan Mahoney,
210
Seogchan Kang,
211
John P.
Rheeder,
212
James Mehl,
213
Matthew Greif,
214
George Ndzi Ngala,
215
Joe
Ammirati,
216
Masako Kawasaki,
217
Yuan Gwo-Fang,
218
Tadahiko Matsumoto,
219
David Smith,
220
Gina Koenig,
221
Daniel Luoma,
222
Tom May,
223
Marco Leonardi,
224
Lynne Sigler,
225
D. L. Taylor,
226
Cara Gibson,
227
Thomas Sharpton,
228
David L.
Hawksworth,
229
Jose Carmine Dianese,
230
Steven A. Trudell,
231
Barbara Paulus,
232
Mahajabeen Padamsee,
233
Philippe Callac,
234
Nelson Lima,
235
Merlin White,
236
C.
Barreau,
237
Juncai M. A.,
238
Bart Buyck,
239
Richard K. Rabeler,
240
Mark R. Liles,
241
Dwayne Estes,
242
Richard Carter,
243
J. M. Herr Jr.,
244
Gregory Chandler,
245
Jennifer
Kerekes,
246
Jennifer Cruse-Sanders,
247
R. Galán Márquez,
248
Egon Horak,
249
Michael Fitzsimons,
250
Heidi Döring,
251
Su Yao,
252
Nicole Hynson,
253
Martin
Ryberg,
254
A. E. Arnold,
255
Karen Hughes,
256
.
1
University of California, Berkeley, 94720, USA.
2
Louisiana State University, Baton Rouge, 70803, USA.
3
University of Michigan, Ann Arbor, 48109, USA.
4
Swedish University of Agricultural Sciences, Uppsala, 75007, Sweden.
5
SUNY-ESF, Ithaca, 13210, USA.
6
Cornell University, Ithaca, 14853, USA.
7
University of Tartu, Tartu, 51005, Estonia.
8
University of Minnesota, Twin Cities, 55108, USA.
9
Wageningen University, Wageningen, 6708, Netherlands.
10
Indiana University, Bloomington, 47405, USA.
11
University of California, Santa Cruz, 95064, USA.
12
University of California, Berkeley, 94720, USA.
13
Lawrence Berkeley National Laboratory, Berkeley, 94720, USA.
14
Harvard University, Cambridge, 2138, USA.
15
University of California, Riverside, 92521, USA.
16
University of Western Ontario, London, N6A 5B8, Canada.
17
University of British Columbia, Vancouver, V6T 1Z4, Canada.
18
Field Museum of Natural History, Chicago, 60605, USA.
19
Lawrence Berkeley National Laboratory, Berkeley, 94720, USA.
20
University of California, Berkeley, 94720, USA.
21
University of Chicagogo, Chicago, 60637, USA.
22
University of Western Sydney, Penrith South, NSW 1797, Australia.
23
Landcare Research, Lincoln, 7640, New Zealand.
24
Indiana U. Northwest and The Field Museum, Gary, 46408, USA.
25
University of Helsinki, Helsinki, 14, Finland.
26
Biological Institute U. of Copenhagen, Copenhagen, 1353, Denmark.
27
USDA Forest Service, Luquillo, 931, USA.
28
Royal Botanic Gardens, Kew, TW9 3DS, England.
29
Imperial College London, London, SW7 2AZ, England.
30
CBS Fungal Biodiversity Centre, Utrecht, 3584, Netherlands.
31
University of Texas at Austin, Austin, 78712, USA.
32
Imperial College London, London, SW7 2AZ, England.
33
Imperial College London, London, SW7 2AZ, England.
34
Göteborg University, Göteborg, 40530, Sweden.
35
Göteborg University, Göteborg, 40530, Sweden.
36
University of Aberdeen, Aberdeen, AB24 3UU, Scotland.
37
Royal Ontario Museum & U. of Toronto, Toronto, M5S 2C6, Canada.
38
Canadian Forest Service, Québec, G1V 4C7, Canada.
39
Oregon State University, Corvallis, 97331, USA.
40
Field Museum of Natural History, Chicago, 60605, USA.
41
Amherst College, Amherst, 1002, USA.
42
American Type Culture Collection, Manassas , 20110, USA.
43
Harvard University, Cambridge, 2138, USA.
44
Clark University, Worcester, 1610, USA.
45
Kean University, Union, 7083, USA.
46
University of Toronto, Toronto, L5L 1C6, Canada.
47
University of South Alabama, Mobile, 36688, USA.
48
University of Nottingham, Nottingham, NG7 2RD, England.
49
Oregon State University, Corvallis, 97331, USA.
50
Royal Ontario Museum/University of Toronto, Toronto, M5S 2C6, Canada.
51
Wabash College, Crawfordsville, 47933, USA.
52
Mississippi State University, Mississippi State, 39762, USA.
53
University of Wisconsin, La Crosse, 54601, USA.
54
San Francisco State University, San Francisco, 94132, USA.
55
Sylvan Research, Kittanning, 16201, USA.
56
University of Southern Mississippi, Hattiesburg, 39406, USA.
57
University of Hong Kong, Hong Kong., ,
58
University of California Herbarium, Berkeley, 94720, USA.
59
Penn State University, University Park, 16802, USA.
60
USDA-ARS Biological IPM Research, Ithaca, 14850, USA.
61
University of California, Berkeley, 94720, USA.
62
University of California, Berkeley, 94720, USA.
63
University of California, Berkeley, 94720, USA.
64
Southeast Missouri St. U., Cape Girardeau, 63701, USA.
65
University of California, Berkeley, 94720, USA.
66
Humboldt State University, Humboldt, 95521, USA.
67
University of Manchester, Manchester, M13 9PL, England.
68
University of North Carolina, Chapel Hill, 27514, USA.
69
University of California, Berkeley, 94720, USA.
70
University of California, Berkeley, 94720, USA.
71
Lewis and Clark College, Portland, 97219, USA.
72
Middle Tennessee State University, Murfreesboro, 37129, USA.
73
Louisiana State U. Agricultural Center, Baton Rouge, 70803, USA.
74
University of Kaiserslautern, Kaiserslautern, 67653, Germany.
75
University of New Mexico, Albuquerque, 87131, USA.
76
Duke University, Durham, 27708, USA.
77
Botanische Staatssammlung Muenchen, Munich, 80638, Germany.
78
Technical University of Denmark, Lyngby, 2800, Denmark.
79
Oklahoma State University, Stillwater, 74078, USA.
80
Rutgers University, New Brunswick, 8901, USA.
81
Clemson University, Clemson, 29634, USA.
82
North Carolina State University, Raleigh, 27695, USA.
83
Cornell University, Ithaca, 14853, USA.
84
Penn State University, University Park, 16802, USA.
85
USDA Systematic Mycology & Microbiology, Beltsville, 10300, USA.
86
University of Illinois, Urbana-Champaign, 61820, USA.
87
University of Vienna, Vienna, 1030, Austria.
88
University Paul Sabatier - Toulouse 3, Toulouse, 31062, France.
89
University of Miami, Oxford, 45056, USA.
90
Washington State University, Pullman, 99164, USA.
91
CABI, Egham, TW20 9TY, England.
92
Mississippi State University, Mississippi State, 39762, USA.
93
University of Minnesota, Saint Paul, 55108, USA.
94
USDA Forest Service, Corvallis, 97331, USA.
95
University of Puerto Rico, Mayaguez, 681, USA.
96
CABI, Egham, TW20 9TY, England.
97
California State University, Fresno, 93740, USA.
98
Columbia U. & New York Botanical Garden, New York, 10458, USA.
99
University of Alberta, Edmonton, T6G 2R3, Canada.
100
University of Saskatchewan, Saskatoon, S7N 5C9, Canada.
101
USDA-ARS, Washington State University, Pullman 99164, USA.
102
Cornell University, Ithaca, 14853, USA.
103
University of Wisconsin-Madison, Madison, 53706, USA.
104
Canadian Forest Service, Fredericton, E3B 5P7, Canada.
105
Rovira i Virgili University, Reus, 43201, Spain.
106
Arizona State University, Tempe, 85287, USA.
107
Ohio State University, Columbus, 43210, USA.
108
British Antarctic Survey, Cambridge, CB3 0ET, England.
109
Duke University, Durham, 27708, USA.
110
Università dell'Aquila, L'Aquila, 67040, Italy.
111
Swedish University of Agricultural Sciences, Uppsala, 75007, Sweden.
112
Technical University of Denmark, Lyngby, 2800, Denmark.
113
Argonne National Laboratory, Argonne, 60439, USA.
114
Pacific Northwest National Laboratory, Richland, 99352, USA.
115
Michigan State University, East Lansing, 48824, USA.
116
Wyeth Research, Pearl River, 10965, USA.
117
Clark University, Worcester, 1610, USA.
118
Oklahoma State University, Stillwater, 74078, USA.
119
Chicago Botanic Garden, Chicago, 60022, USA.
120
Yale University, New Haven, 6520, USA.
121
University of Illinois, Urbana-Champaign, 61820, USA.
122
MAF Biosecurity New Zealand, Wellington, , New Zealand.
123
University of Arizona, Tucson, 85721, USA.
124
U. of Bayreuth & InterMed Discovery GmbH, Dortmund, 44227, Germany.
125
Cornell University, Ithaca, 14853, USA.
126
University of Kansas, Lawrence, 66045, USA.
127
Penn State University, University Park, 16802, USA.
128
Swedish Museum of Natural History, Stockholm, 10405, Sweden.
129
Connecticut Agricultural Experiment Station, New Haven, 6511, USA.
130
University of Hawaii, Honolulu, 96822, USA.
131
Kansas State University, Manhattan, 66506, USA.
132
Weber State University, Ogden, 84408, USA.
133
Royal Holloway University of London, London, TW20 0EX, England.
134
Bowling Green State University, Bowling Green, 43403, USA.
135
McMaster University, Hamilton, L8S 4L8, Canada.
136
Landcare Research, Auckland, 1072, New Zealand.
137
Agriculture Canada (Emeritus), Ottawa, K1A 0C6, Canada.
138
Eastern Cereal and Oilseed Research Center, Ottawa, K1A 0C6, Canada.
139
University of Tras-os-Montes e Alto Douro, Vila Real, 5001801, Portugal.
140
National Herbarium of New South Wales, Sydney, 2000, Australia.
141
Université Laval, Québec, G1K 7P4, Canada.
142
Technical University of Denmark, Lyngby, 2800, Denmark.
143
Concordia University, Montréal, H3G 1M8, Canada.
144
University of Missouri, Columbia, 65211, USA.
145
Universidad Nacional del Sur, Bahia Blanca, B8000, Argentina.
146
University of Wyoming, Laramie, 82071, USA.
147
State University of New York, Cortland, 13045, USA.
148
University of Guelph, Guelph, N1G 2W1, Canada.
149
Field Museum of Natural History, Chicago, 60605, USA.
150
Seattle Children's Hospital Research Institute, Seattle, 98101, USA.
151
J.W. Goethe Universitaet, Frankfurt am Main, 60325, Germany.
152
Rutgers University, New Brunswick, 8901, USA.
153
Aberystwyth University, Aberystwyth, SY23 3DA, Wales.
154
Penn State College of Medicine, Hershey, 17033, USA.
155
National Institute of Agrobiological Sciences, Tsukuba, 3058602, Japan.
156
Novozymes Inc., Davis, 95618, USA.
157
Academia Sinica, Taipei, 115, Taiwan.
158
Ministry of Forests and Range, Victoria, V8W 9C4, Canada.
159
Oregon State University, Corvallis, 97331, USA.
160
Chinese Academy of Sciences, Beijing, 100864, China.
161
Duke University, Durham, 27708, USA.
162
Texas A & M University (Emeritus), College Station, 77843, USA.
163
Universidade Federal, Santa Maria, 97105, Brazil.
164
Merck Sharp & Dohme de España S.A., Madrid, 28027, Spain.
165
Massey University, Albany, , New Zealand.
166
University of Munich, Munich, 80638, Germany.
167
University of Szeged, Szeged, 6726, Hungary.
168
University of Oregon, Eugene, 97403, USA.
169
Copenhagen University, Copenhagen, 1353, Denmark.
170
Göteborg University, Göteborg, 40530, Sweden.
171
LUFA-ITL GmbH, Kiel, 24107, Germany.
172
CBS Fungal Biodiversity Centre, Utrecht, 3584, Netherlands.
173
University Kassel, Kassel, 34109, Germany.
174
Naturkundemuseum, Karlsruhe, 76133, Germany.
175
Umweltforschungszentrum, Halle, 4318, Germany.
176
University of L'Aquila, L'Aquila, 67040, Italy.
177
University of Innsbruck, Innsbruck, 6020, Austria.
178
Finnish Forest Research Institute, Helsinki, 170, Finland.
179
Instituto de Botânica, São Paulo, 4301, Brazil.
180
Northern Research Station USDA Forest Service, St. Paul, 55108, USA.
181
University of Alberta, Edmonton, T6G 2R3, Canada.
182
Illinois Natural History Survey, Champaign, 61820, USA.
183
MUCL, Louvain-la-Neuve, 1348, Belgium.
184
Field Museum of Natural History, Chicago, 60605, USA.
185
Sylvan Research, Kittanning, 16201, USA.
186
Oklahoma State University, Stillwater, 74078, USA.
187
East Malling Research, East Malling, ME19 6BJ, England.
188
Harvard University, Cambridge, 2138, USA.
189
USDA Toxicology & Mycotoxin Research, College Station, 77843, USA.
190
University of Dublin, Trinity College, Dublin, Ireland.
191
USDA-Forest Service, Madison, 53726, USA.
192
Oregon State University, Corvallis, 97331, USA.
193
National Mycological Herbarium, Ottawa, K1A 0C6, Canada.
194
North Carolina State University, Fletcher, 28732, USA.
195
Clemson University, Clemson, 29634, USA.
196
University of Alaska, Fairbanks, 99709, USA.
197
Alabama A&M University, Normal, 35762, USA.
198
University of Arkansas, Fayetteville, 72701, USA.
199
University of Toronto, Toronto, L5L 1C6, Canada.
200
Cornell University, Ithaca, 14853, USA.
201
University of Toronto, Toronto, L5L 1C6, Canada.
202
University of Leicester, Leicester, LE1 7RH, England.
203
Abbey Lane Laboratory, Philomath, 97370, USA.
204
USDA-APHIS-PPQ-PHP-PSPI-NIS, Beltsville, 10300, USA.
205
Fungal Reference Centre University Jena, Jena, 7745, Germany.
206
University of Minnesota, St. Paul, 55108, USA.
207
Eastern Illinois University, Charleston, 61920, USA.
208
University of California, Berkeley, 94720, USA.
209
Microbiology Specialists Inc., Houston, 77054, USA.
210
Private Mycological Research, Lower Hutt, , New Zealand.
211
Penn State University, University Park, 16802, USA.
212
Medical Research Council, Tygerberg, 7505, South Africa.
213
FABI University of Pretoria, Pretoria, 0002, South Africa.
214
Field Museum of Natural History, Chicago, 60605, USA.
215
Bamenda University, Bamenda, , Cameroon.
216
University of Washington, Seattle, 98195, USA.
217
Kanazawa Medical University, Kanazawa, 9200293, Japan.
218
Bioresource Collection & Research Center FIRDI, Hsinchu, 300, Taiwan.
219
Juntendo Univ. and Kurume Univ., Tokyo, 1130033, Japan.
220
World Federation for Culture Collections, Egham, TW20 9TY, England.
221
Roche Molecular Systems, Alameda, 94501, USA.
222
Oregon State University, Corvallis, 97331, USA.
223
Royal Botanic Gardens, Melbourne, VIC3004, Australia.
224
Università degli Studi di L'Aquila, L'Aquila, 67040, Italy.
225
University of Alberta, Edmonton, T6G 2R3, USA.
226
University of Alaska, Fairbanks, 99709, USA.
227
University of Arizona, Tucson, 85721, USA.
228
University of California, Berkeley, 94720, USA.
229
Natural History Museum, Madrid, 28006, Spain.
230
Universidade de Brasilia, Brasilia, 70910, Brazil.
231
University of Washington, Seattle, 98195, USA.
232
Landcare Research, Auckland, 1072, New Zealand.
233
University of Minnesota, Saint Paul, 55108, USA.
234
INRA, Bordeaux, 33140, France.
235
Micoteca da Universidade do Minho, Braga, 4710, Portugal.
236
Boise State University, Boise, 83725, USA.
237
CNRS/INRA, Bordeaux, 33140, France.
238
Chinese Academy of Sciences, Beijing, 100864, China.
239
National Museum of Natural History, Paris, 75005, France.
240
University of Michigan, Ann Arbor, 48109, USA.
241
Auburn University, Auburn, 36849, USA.
242
Austin Peay State University, Clarksville, 37044, USA.
243
Valdosta State University, Valdosta, 31698, USA.
244
University of South Carolina, Columbia, 29208, USA.
245
University of North Carolina, Wilmington, 28403, USA.
246
University of California, Berkeley, 94720, USA.
247
Salem College Herbarium, Salem, 27101, USA.
248
Alcalá University, Madrid, 28801, Spain.
249
Zurich Herbarium, Zurich, 8008, Switzerland.
250
University of Chicago, Chicago, 60637, USA.
251
Royal Botanic Gardens, Kew, TW9 3DS, England.
252
China Center of Industrial Culture Collection, Beijing, 100027, China.
253
University of California, Berkeley, 94720, USA.
254
Göteborg University, Göteborg, 40530, Sweden.
255
University of Arizona, Tucson, 85721, USA.
256
University of Tennessee, Knoxville, 37996, USA.
... We followed up by using BLAST distance trees to examine putative relationships among matches, and top sequence hits were examined in more detail, including but not limited to location of any publications utilizing those sequences and macromorphological comparison of our collections with images or other reference data (e.g., distribution, phylogenies). Comparison to primary literature is essential since GenBank does not permit non-author annotation (Bidartondo et al. 2008) and many fungal sequences are misidentified (Hofstetter et al. 2019). Current nomenclature was determined using Index Fungorum and Mycobank, except where contradicted by Jaklitsch et al. (2016) or agaric.us ...
... These methods are already being experimentally employed across biological taxonomy (Sun et al. 2017;Bambil et al. 2020;Mahmudul Hassan and Kumar Maji 2021;Høye et al. 2021), including fungi (Picek et al. 2022;Bartlett et al. 2022), and are poised to offer insights otherwise unattainable by existing taxonomic expertise. It is important to regard such innovations as individual components of the complete taxonomic toolkit, as overreliance on new and groundbreaking tools can have demonstrably deleterious effects, as has occurred with DNA sequencing (Bidartondo et al. 2008;Hofstetter et al. 2019). ...
Article
Full-text available
Background: Globally, many undescribed fungal taxa reside in the hyperdiverse, yet undersampled, tropics. These species are under increasing threat from habitat destruction by expanding extractive industry, in addition to global climate change and other threats. Reserva Los Cedros is a primary cloud forest reserve of ~ 5256 ha, and is among the last unlogged watersheds on the western slope of the Ecuadorian Andes. No major fungal survey has been done there, presenting an opportunity to document fungi in primary forest in an underrepresented habitat and location. Above-ground surveys from 2008 to 2019 resulted in 1760 vouchered collections, cataloged and deposited at QCNE in Ecuador, mostly Agaricales sensu lato and Xylariales. We document diversity using a combination of ITS barcode sequencing and digital photography, and share the information via public repositories (GenBank & iNaturalist). Results: Preliminary identifications indicate the presence of at least 727 unique fungal species within the Reserve, representing 4 phyla, 17 classes, 40 orders, 101 families, and 229 genera. Two taxa at Los Cedros have recently been recommended to the IUCN Fungal Red List Initiative (Thamnomyces chocöensis Læssøe and "Lactocollybia" aurantiaca Singer), and we add occurrence data for two others already under consideration (Hygrocybe aphylla Læssøe & Boertm. and Lamelloporus americanus Ryvarden). Conclusions: Plants and animals are known to exhibit exceptionally high diversity and endemism in the Chocó bioregion, as the fungi do as well. Our collections contribute to understanding this important driver of biodiversity in the Neotropics, as well as illustrating the importance and utility of such data to conservation efforts. Resumen: Antecedentes: A nivel mundial muchos taxones fúngicos no descritos residen en los trópicos hiper diversos aunque continúan submuestreados. Estas especies están cada vez más amenazadas por la destrucción del hábitat debido a la expansión de la industria extractivista además del cambio climático global y otras amenazas. Los Cedros es una reserva de bosque nublado primario de ~ 5256 ha y se encuentra entre las últimas cuencas hidrográficas no explotadas en la vertiente occidental de los Andes ecuatorianos. Nunca antes se ha realizado un estudio de diversidad micológica en el sitio, lo que significa una oportunidad para documentar hongos en el bosque primario, en hábitat y ubicación subrepresentatadas. El presente estudio recopila información entre el 2008 y 2019 muestreando material sobre todos los sustratos, reportando 1760 colecciones catalogadas y depositadas en el Fungario del QCNE de Ecuador, en su mayoría Agaricales sensu lato y Xylariales; además se documenta la diversidad mediante secuenciación de códigos de barras ITS y fotografía digital, la información está disponible en repositorios públicos digitales (GenBank e iNaturalist). Resultados: La identificación preliminar indica la presencia de al menos 727 especies únicas de hongos dentro de la Reserva, que representan 4 filos, 17 clases, 40 órdenes, 101 familias y 229 géneros. Recientemente dos taxones en Los Cedros se recomendaron a la Iniciativa de Lista Roja de Hongos de la UICN (Thamnomyces chocöensis Læssøe y "Lactocollybia" aurantiaca Singer) y agregamos datos de presencia de otros dos que ya estaban bajo consideración (Hygrocybe aphylla Læssøe & Boertm. y Lamelloporus americanus Ryvarden). Conclusiones: Se sabe que plantas y animales exhiben una diversidad y endemismo excepcionalmente altos en la bioregión del Chocó y los hongos no son la excepción. Nuestras colecciones contribuyen a comprender este importante promotor de la biodiversidad en el Neotrópico además de ilustrar la importancia y utilidad de dichos datos para los esfuerzos de conservación.
... The public National Center for Biotechnology Information Nucleotide database (NCBI-nt, including the GenBank database) is the largest sequence repository and is widely used in eDNA metabarcoding studies Hajibabaei 2018b, 2020). However, the presence of mislabeled specimens, the large variation in quality of sequences available, and gaps in species coverage (i.e., unrepresented species) result in erroneous species identification when directly comparing unknown sequences to NBCI-nt (Bidartondo 2008;Mioduchowska et al. 2018;Leray et al. 2019). The Barcode of Life Data Systems (BOLD) is another sequence repository specific to the most common barcode regions, including the cytochrome c oxidase I (COI) gene, which is the widely used gene region for animal DNA barcoding (Ratnasingham and Hebert 2007;Porter and Hajibabaei 2018b). ...
... All undetected species will be reviewed prior to the next release of GSL-rl. With NCBI-nt, we also observed under-classification of the sea star genus Leptasterias due to sequence mislabeling, which has been shown previously (Bidartondo 2008;Mioduchowska et al. 2018). The underclassification is due to two misidentified sequences, one is for Leptasterias littoralis identified as the sea star Asterias forbesi and the other is for Leptasterias polaris identified as the butterfly Polyommatus fulgens. ...
Article
Full-text available
Biodiversity assessments relying on DNA have increased rapidly over the last decade. However, the reliability of taxonomic assignments in metabarcoding studies is variable and affected by the reference databases and the assignment methods used. Species level assignments are usually considered as reliable using regional libraries but unreliable using public repositories. In this study, we aimed to test this assumption for metazoan species detected in the Gulf of St. Lawrence in the Northwest Atlantic. We first created a regional library (GSL-rl) by data mining COI barcode sequences from BOLD, and included a reliability ranking system for species assignments. We then estimated 1) the accuracy and precision of the public repository NCBI-nt for species assignments using sequences from the regional library and 2) compared the detection and reliability of species assignments of a metabarcod-ing dataset using either NCBI-nt or the regional library and popular assignment methods. With NCBI-nt and sequences from the regional library, the BLAST-LCA (least common ancestor) method was the most precise method for species assignments, but the accuracy was higher with the BLAST-TopHit method (>80% over all taxa, between 70% and 90% amongst taxonomic groups). With the metabarcoding dataset, the reliability of species assignments was greater using GSL-rl compared to NCBI-nt. However, we also observed that the total number of reliable species assignments could be maximized using both GSL-rl and NCBI-nt with different optimized assignment methods. The use of a two-step approach for species assignments, i.e., using a regional library and a public repository, could improve the reliability and the number of detected species in metabarcoding studies.
... The most common problem compromising unsupervised species-level identifications encountered in our survey was the numerous incorrectly labelled sequences present in publicly available databases (Kvist, 2013;Mutanen et al., 2016). These problematic records include wrong identification of the sequenced specimen or simply labelling errors (Bidartondo et al., 2008;Pentinsaari et al., 2020) and explained 19 out of 78 discordant species ID of the queried mammal haplotypes. This common source of error compromised for instance all identification of haplotypes issued from two bat species easily identified by external characters (the common noctule Nyctalus noctula and the serotine bat Eptesicus serotinus) because one published complete mitochondrial genome was clearly issued from a mislabelled (or wrongly identified) individual (GenBank #MT584130). ...
Preprint
Full-text available
Surveyed mitochondrial DNA (COI barcode, CytB and 16S) variation of all wild mammals of Switzerland and tested the efficiency of automated species-level identification. Mean success is only 70% reliable ID, most due to labelling errors or misidentification of the publicly available databases. Several very divergent lineages meeting in the alpine region. The most extreme example includes two cryptic species within Muscardinus avellanarius, the nominal form and M. speciosus, a Western European species. New curated reference data are provided, in order to minimize further ID throughout DNA barcodes.
... This database is also widely used in food safety to perform correct specimen identification due to the many available sequences. Despite this, it has been criticized as being susceptible to problems such as incorrect species identification and missing information [170], especially with fish and seafood [171]. ...
Article
Full-text available
The recent increase in international fish trade leads to the need for improving the traceability of fishery products. In relation to this, consistent monitoring of the production chain focusing on technological developments, handling, processing and distribution via global networks is necessary. Molecular barcoding has therefore been suggested as the gold standard in seafood species traceability and labelling. This review describes the DNA barcoding methodology for preventing food fraud and adulteration in fish. In particular, attention has been focused on the application of molecular techniques to determine the identity and authenticity of fish products, to discriminate the presence of different species in processed seafood and to characterize raw materials undergoing food industry processes. In this regard, we herein present a large number of studies performed in different countries, showing the most reliable DNA barcodes for species identification based on both mitochondrial (COI, cytb, 16S rDNA and 12S rDNA) and nuclear genes. Results are discussed considering the advantages and disadvantages of the different techniques in relation to different scientific issues. Special regard has been dedicated to a dual approach referring to both the consumer’s health and the conservation of threatened species, with a special focus on the feasibility of the different genetic and genomic approaches in relation to both scientific objectives and permissible costs to obtain reliable traceability.
... These advantages notwithstanding, sequence data do not always form a straightforward component of mycological research. Fungal ITS sequences with incorrect taxonomic annotation accumulate in the public sequence databases, and entries that are chimeric or that feature poorly read regions are similarly concerns (Bidartondo et al. 2008;Ryberg et al. 2009;Nilsson et al. 2010). Another problematic aspect of publicly available fungal ITS sequences is that a portion of them are incorrectly deposited in the reverse complementary orientation (i.e., backward and with all purines and pyrimidines transposed, e.g., CTAGG instead of the correct CCTAG). ...
Article
Full-text available
Reverse complementary DNA sequences—sequences that are inadvertently cast backward and in which all purines and pyrimidines are transposed—are not uncommon in sequence databases, where they may introduce noise into sequence-based research. We show that about 1% of the public fungal ITS sequences, the most commonly sequenced genetic marker in mycology, are reverse complementary, and we introduce an open source software solution to automate their detection and reorientation. The MacOSX/Linux/UNIX software operates on public or private datasets of any size, although some 50 base pairs of the 5.8 S gene of the ITS region are needed for the analysis.
... DNA sequencing institutes throughout the international scientific community procure strains from various BRCs, extract and sequence the DNA then upload single genes or whole genome assemblies to public databases, such as GenBank [2], who assigns an identifier for each assembly received. Because this is a decentralized international activity, there has been persistent uncertainty about what data belongs to each strain [3,21]. A prime example of the need for unification can be seen in a strain isolated from a healthy Japanese male in 2011 [37]. ...
Article
Full-text available
Motivation: Microbial metagenomic profiling software and databases are advancing rapidly for development of novel disease biomarkers and therapeutics yet three problems impede analyses: 1) the conflation of “genome assembly” and “strain” in reference databases; 2) difficulty connecting DNA biomarkers to a procurable strain for laboratory experimentation; and 3) absence of a comprehensive and unified strain-resolved reference database for integrating both shotgun metagenomics and 16S rRNA gene data.Results: We demarcated 681,087 strains, the largest collection of its kind, by filtering public data into a knowledge graph of vertices representing contiguous DNA sequences, genome assemblies, strain monikers and bio-resource center (BRC) catalog numbers then adding inter-vertex edges only for synonyms or direct derivatives. Surprisingly, for 10,043 important strains, we found replicate RefSeq genome assemblies obstructing interpretation of database searches. We organized each strain into eight taxonomic ranks with bootstrap confidence inversely correlated with genome assembly contamination. The StrainSelect database is suited for applications where a taxonomic, functional or procurement reference is needed for shotgun or amplicon metagenomics since 636,568 strains have at least one 16S rRNA gene, 245,005 have at least one annotated genome assembly, and 36,671 are procurable from at least one BRC. The database overcomes all three aforementioned problems since it disambiguates strains from assemblies, locates strains at BRCs, and unifies a taxonomic reference for both 16S rRNA and shotgun metagenomics.Availability: The StrainSelect database is available in igraph and tabular vertex-edge formats compatible with Neo4J. Dereplicated MinHash and fasta databases are distributed for sourmash and usearch pipelines at http://strainselect.secondgenome.com.Contact: todd.desantis@gmail.com.Supplementary information: Supplementary data are available online.
Chapter
Fungal identification has become more urgent than ever due to the increasing number of fungal species that are found to be pathogenic for humans. While the major fungal pathogens probably do not exceed 50 commonly encountered species, a growing immunosuppressed patient population and increasingly aggressive medical therapies predispose patients to a broader spectrum of fungi capable of causing disease than ever before. While clinical microbiology laboratories can identify the common fungi using classical methods such as biochemistry and morphology, the rarer fungi need more complex identification methods. These methods are drawn from the field of molecular biology. Fungi offer unique problems that must be addressed before existing molecular methods can be applied to clinical specimens. Fortunately, molecular mycology is accelerating in the rate of diagnostic assay development as several molecular assays have been FDA approved, and many more have been commercialized and are now available. These assays can be focused on single species, major pathogens from one genus, and, in some cases, pan fungal. Fungal molecular diagnostics has advanced from PCR to whole-genome sequencing, and many assays are incorporating emerging technologies. While still lagging behind bacterial and viral diagnostics, the increasing number of commercial and approved fungal diagnostic assays will be welcome additions to the clinical microbiology laboratory.
Article
In the world trade of medicinal plants, the naming of plants is fundamental to understanding which species are acceptable for therapeutic use. There are a variety of nomenclatural systems that are used, inclusive of common names, Latinized binomials, Galenic or pharmaceutical names, and pharmacopeial definitions. Latinized binomials are the primary system used for naming wild plants, but these alone do not adequately define medicinal plant parts. Each system has its specific applications, advantages, and disadvantages. The topic of medicinal plant nomenclature is discussed broadly by underscoring when and how varying nomenclatural systems should be used. It is emphasized that pharmacopeial definitions represent the only naming system that integrates plant identity, relevant plant parts, and the specific quality metrics to which a material must comply, thus affording the most appropriate identification method available for medicinal plant materials.
Article
In this study, we determined the complete mitochondrial genome of the invasive insect species Melanoplus differentialis captured in Korea. The complete mitochondrial genome of M. differentialis is 15,625 bp long and comprises 13 protein‐coding genes, two ribosomal RNA genes, and 22 transfer RNAs, with a GC ratio of 25.2%. In total, 353 SNPs and 11 INDEL regions (total length 67 bp) were found against the previously sequenced M. differentialis mitochondrial genome recorded as public genome data. The number of interspecific variations was greater than the number of intraspecific variations in this insect. Phylogenetic tree analysis showed that the mitochondrial genome clustered the Melanoplus clade with two previously reported Melanoplus sequences. However, the sequences were not divided at the species‐level clade possibly as a consequence of misidentification caused by an error in the public database. Our results extend the molecular database status of Melanoplus by providing a novel complete mitochondrial genome sequence for M. differentialis that could serve as reference for further molecular studies.
Article
Full-text available
DNA sequences are increasingly seen as one of the primary information sources for species identification in many organism groups. Such approaches, popularly known as barcoding, are underpinned by the assumption that the reference databases used for comparison are sufficiently complete and feature correctly and informatively annotated entries. The present study uses a large set of fungal DNA sequences from the inclusive International Nucleotide Sequence Database to show that the taxon sampling of fungi is far from complete, that about 20% of the entries may be incorrectly identified to species level, and that the majority of entries lack descriptive and up-to-date annotations. The problems with taxonomic reliability and insufficient annotations in public DNA repositories form a tangible obstacle to sequence-based species identification, and it is manifest that the greatest challenges to biological barcoding will be of taxonomical, rather than technical, nature.
Article
Full-text available
Public sequence databases contain information on the sequence, structure and function of proteins. Genome sequencing projects have led to a rapid increase in protein sequence information, but reliable, experimentally verified, information on protein function lags a long way behind. To address this deficit, functional annotation in protein databases is often inferred by sequence similarity to homologous, annotated proteins, with the attendant possibility of error. Now, the functional annotation in these homologous proteins may itself have been acquired through sequence similarity to yet other proteins, and it is generally not possible to determine how the functional annotation of any given protein has been acquired. Thus the possibility of chains of misannotation arises, a process we term ‘error percolation’. With some simple assumptions, we develop a dynamical probabilistic model for these misannotation chains. By exploring the consequences of the model for annotation quality it is evident that this iterative approach leads to a systematic deterioration of database quality. Contact: WRG: wally.gilks@mrc-bsu.cam.ac.uk; BA and CAO: audit@ebi.ac.uk; ouzounis@ebi.ac.uk * To whom correspondence should be addressed. † Both these authors contributed equally to this work.
Article
Sequencing mitochondrial DNA (mtDNA) is now a routine laboratory procedure. Most journals insist that published sequences be submitted to data bases such as GenBank, where they are publicly available. But quality control of the raw data often depends solely on the original scientists. So just how reliable are the sequences in the data bases? According to a new paper by Forster in the Annals of Human Genetics, more than half of all published human mtDNA studies contain mistakes, a level so high that geneticists could be drawing incorrect conclusions in population and evolutionary studies. Much greater controls are needed, both from journals and from individual scientists. Fortunately, some new methods for detecting errors using phylogenetic networks have recently been proposed. How effective these are remains to be tested.
Article
Guidelines for submitting commentsPolicy: Comments that contribute to the discussion of the article will be posted within approximately three business days. We do not accept anonymous comments. Please include your email address; the address will not be displayed in the posted comment. Cell Press Editors will screen the comments to ensure that they are relevant and appropriate but comments will not be edited. The ultimate decision on publication of an online comment is at the Editors' discretion. Formatting: Please include a title for the comment and your affiliation. Note that symbols (e.g. Greek letters) may not transmit properly in this form due to potential software compatibility issues. Please spell out the words in place of the symbols (e.g. replace “α” with “alpha”). Comments should be no more than 8,000 characters (including spaces ) in length. References may be included when necessary but should be kept to a minimum. Be careful if copying and pasting from a Word document. Smart quotes can cause problems in the form. If you experience difficulties, please convert to a plain text file and then copy and paste into the form.
  • J D Harris
J. D. Harris, Trends Ecol. Evol. 18, 317 (2003).
  • R H Nilsson
R. H. Nilsson et al., PLoS ONE 1, e59 (2006).
  • W R Gilks
W. R. Gilks et al., Bioinformatics 18, 1641 (2002).
139 University of Tras-os-Montes e Alto Douro
  • Eastern Cereal
  • Oilseed Research
  • Center
Eastern Cereal and Oilseed Research Center, Ottawa, K1A 0C6, Canada. 139 University of Tras-os-Montes e Alto Douro, Vila Real, 5001801, Portugal.
41 Sung-Oui Suh, 42 Donald H. Pfister, 43 Manfred Binder
  • Thorsten Lumbsch
  • Jaime E Blair
Thorsten Lumbsch, 40 Jaime E. Blair, 41 Sung-Oui Suh, 42 Donald H. Pfister, 43 Manfred Binder, 44 Eric W. Boehm, 45 Linda Kohn, 46 Juan L. Mata, 47 Paul Dyer, 48 Gi-Ho Sung, 49 Bryn Dentinger, 50 Emory G. Simmons, 51 Richard E. Baird, 52 Thomas J.
Catherine Aime, 73 Frank Kauff, 74 Andrea Porras
  • Peter Kennedy
  • Sarah Bergemann
Peter Kennedy, 71 Sarah Bergemann, 72 M. Catherine Aime, 73 Frank Kauff, 74 Andrea Porras-Alfaro, 75 Cecile Gueidan, 76 Andreas Beck, 77 Birgitte Andersen, 78 Stephen Marek, 79 Jo A. Crouch, 80 Julia Kerrigan, 81 Jean Beagle Ristaino, 82 Kathie T.