Nuclear mitochondrial pseudogenes, or "numts", are nonfunctional copies of mitochondrial genes that have been translocated to the nuclear genome. Numts have been used to study differences in mutation rates between the nuclear and mitochondrial genomes, but have also been implicated as troublesome for phylogenetic studies and DNA-based species identification (i.e., DNA barcoding). In this study, a suspected numt discovered during a study of mitochondrial cytochrome c oxidase I (COI) diversity in North American birds was targeted and sequenced from tyrant flycatchers (family: Tyrannidae). In total, the numt was found in five taxa representing two genera. Substitution rates were compared between COI and numt sequences. None of the numt sequences harboured stop codons nor frameshift mutations, but phylogenetic analysis revealed they had accumulated more amino acid substitutions than the mitochondrial COI sequences. Mitochondrial COI appeared to be preferentially amplified in most cases, but methods for numt detection are discussed for cases like this where sequences lack obvious features for identification. Because of its persistence across a broad taxonomic lineage, this numt could form a valuable model system for studying evolution in numts. The full size of the numt and its location within the nuclear genome are yet to be determined.
"However, cases with full open reading frames are described, including some that differ minimally from the mitochondrial sequence . To date, eight avian COI pseudogene sequences with open reading frames are reported , . When applied to the frequency matrix generated in this study, these contained 7–10 nucleotide and amino acid VLFs, strengthening the observation that pseudogenes can be identified by the presence of multiple VLFs. "
[Show abstract][Hide abstract] ABSTRACT: The accuracy of DNA barcode databases is critical for research and practical applications. Here we apply a frequency matrix to assess sequencing errors in a very large set of avian BARCODEs. Using 11,000 sequences from 2,700 bird species, we show most avian cytochrome c oxidase I (COI) nucleotide and amino acid sequences vary within a narrow range. Except for third codon positions, nearly all (96%) sites were highly conserved or limited to two nucleotides or two amino acids. A large number of positions had very low frequency variants present in single individuals of a species; these were strongly concentrated at the ends of the barcode segment, consistent with sequencing error. In addition, a small fraction (0.1%) of BARCODEs had multiple very low frequency variants shared among individuals of a species; these were found to represent overlooked cryptic pseudogenes lacking stop codons. The calculated upper limit of sequencing error was 8×10(-5) errors/nucleotide, which was relatively high for direct Sanger sequencing of amplified DNA, but unlikely to compromise species identification. Our results confirm the high quality of the avian BARCODE database and demonstrate significant quality improvement in avian COI records deposited in GenBank over the past decade. This approach has potential application for genetic database quality control, discovery of cryptic pseudogenes, and studies of low-level genetic variation.
PLoS ONE 08/2012; 7(8):e43992. DOI:10.1371/journal.pone.0043992 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: As of February 2011, COI DNA barcode sequences (a 648-bp segment of the 5' end of the mitochondrial gene cytochrome c oxidase I, the standard DNA barcode for animals) have been collected from over 23,000 avian specimens representing 3,800 species, more than one-third of the world's avifauna. Here, we detail the methodology for obtaining DNA barcodes from birds, covering the entire process from field collection to data analysis. We emphasize key aspects of the process and describe in more detail those that are particularly relevant in the case of birds. We provide elemental information about collection of specimens, detailed protocols for DNA extraction and PCR, and basic aspects of sequencing methodology. In particular, we highlight the primer pairs and thermal cycling profiles associated with successful amplification and sequencing from a broad range of avian species. Finally, we succinctly review the methodology for data analysis, including the detection of errors (such as contamination, misidentifications, or amplification of pseudogenes), assessment of species resolution, detection of divergent intraspecific lineages, and identification of unknown specimens.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.