I would like add up to Richard Christen, for his following sentences " Because not other gene can be found in the public databases with sequences covering so many clades and species." Although we are talking about bacteria here, given that the sequence is 16S rRNA, is rRNA rigorous enough for eukaryotic species' taxonomic classification? If that is the case, I would go for cytochrome oxidase 1 (COI) gene, since it is widely sequenced for many species and well deposited in the databases like NCBI Genbank and BOLD. Moreover, compared to COI, is not the percentage cutoff for assigning specimens to species relatively arbitrary (for COI it ranges between %1-3) for rRNA when used as a marker? Hence I am wondering, which one of these genes (COI and rRNA) is more superior in terms of representation of sequences in the databases and accuracy of taxonomic classification.
For Bacteria, the SSU rRNA gene sequences are the gold standart.
For Archaea, the SSU rRNA gene sequences are the gold standart.
For both, when a new strain is isolated, the SSU rRNA sequence is required for validation of a name. see http://www.bacterio.net.
For unicellular Eukaryota (protists) the SSU rRNA gene sequences are the gold standart (see http://ssu-rrna.org).
For Fungi, Animals and Viridiplantae, SSU rRNA gene sequences are often not very resolutive. Other genes are often used, but there is no universal primers allowing to amplify every clade (as it is almost the case for Prokaryotes and Protists - but see an exception with Foraminifera for example), except when ITS are used for Fungi, but ITS have a very poor phylogenetic signal, even though they allow to identify a species or better (I can detail that if needed).
- For Bacteria the SSU rRNA gene sequences always allow to assign a taxonomy down to the genus level, but not always down to the species level. Some different species may have exactly the same sequence (two mycobacteria for example), and for Enterobacteriaceae (an example), the situation may be problematic when the entire sequence is not available (as in NGS analyses). In these cases, it is best to use a house-keeping gene (tuf, rpoB,....)
- When there are several operons in a bacterial genome, they may have different sequences, some may be misleading concerning taxonomy, see PubMed id: 8742634.
- The definitive identification of Bacteria down to the species level or better is now based on the combination of several house keeping genes (see for example PubMed id: 24409173). The best genes combination may depend on the genus analyzed, and remember you need to use different primers for different genera!
Kieng Soon Hii
Tomasz A Leski
United States Naval Research Laboratory
Walter Reed Army Institute of Research
Andrew V Z Brower
Middle Tennessee State University
Elisabeth Margaretha Bik
Sunil Kumar Sahu
National University of Singapore
Thara Kalila Mohd. Firdaus
Indian Institute of Technology Roorkee