A cryptic, intergeneric cytochrome c oxidase I pseudogene in tyrant flycatchers (family: Tyrannidae)

Department of Integrative Biology, University of Guelph, Guelph, ON N1G 2W1, Canada.
Genome (Impact Factor: 1.42). 12/2010; 53(12):1103-9. DOI: 10.1139/G10-085
Source: PubMed


Nuclear mitochondrial pseudogenes, or "numts", are nonfunctional copies of mitochondrial genes that have been translocated to the nuclear genome. Numts have been used to study differences in mutation rates between the nuclear and mitochondrial genomes, but have also been implicated as troublesome for phylogenetic studies and DNA-based species identification (i.e., DNA barcoding). In this study, a suspected numt discovered during a study of mitochondrial cytochrome c oxidase I (COI) diversity in North American birds was targeted and sequenced from tyrant flycatchers (family: Tyrannidae). In total, the numt was found in five taxa representing two genera. Substitution rates were compared between COI and numt sequences. None of the numt sequences harboured stop codons nor frameshift mutations, but phylogenetic analysis revealed they had accumulated more amino acid substitutions than the mitochondrial COI sequences. Mitochondrial COI appeared to be preferentially amplified in most cases, but methods for numt detection are discussed for cases like this where sequences lack obvious features for identification. Because of its persistence across a broad taxonomic lineage, this numt could form a valuable model system for studying evolution in numts. The full size of the numt and its location within the nuclear genome are yet to be determined.

1 Follower
19 Reads
  • Source
    • "However, cases with full open reading frames are described, including some that differ minimally from the mitochondrial sequence [30]. To date, eight avian COI pseudogene sequences with open reading frames are reported [32], [36]. When applied to the frequency matrix generated in this study, these contained 7–10 nucleotide and amino acid VLFs, strengthening the observation that pseudogenes can be identified by the presence of multiple VLFs. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The accuracy of DNA barcode databases is critical for research and practical applications. Here we apply a frequency matrix to assess sequencing errors in a very large set of avian BARCODEs. Using 11,000 sequences from 2,700 bird species, we show most avian cytochrome c oxidase I (COI) nucleotide and amino acid sequences vary within a narrow range. Except for third codon positions, nearly all (96%) sites were highly conserved or limited to two nucleotides or two amino acids. A large number of positions had very low frequency variants present in single individuals of a species; these were strongly concentrated at the ends of the barcode segment, consistent with sequencing error. In addition, a small fraction (0.1%) of BARCODEs had multiple very low frequency variants shared among individuals of a species; these were found to represent overlooked cryptic pseudogenes lacking stop codons. The calculated upper limit of sequencing error was 8×10(-5) errors/nucleotide, which was relatively high for direct Sanger sequencing of amplified DNA, but unlikely to compromise species identification. Our results confirm the high quality of the avian BARCODE database and demonstrate significant quality improvement in avian COI records deposited in GenBank over the past decade. This approach has potential application for genetic database quality control, discovery of cryptic pseudogenes, and studies of low-level genetic variation.
    Full-text · Article · Aug 2012 · PLoS ONE
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As of February 2011, COI DNA barcode sequences (a 648-bp segment of the 5' end of the mitochondrial gene cytochrome c oxidase I, the standard DNA barcode for animals) have been collected from over 23,000 avian specimens representing 3,800 species, more than one-third of the world's avifauna. Here, we detail the methodology for obtaining DNA barcodes from birds, covering the entire process from field collection to data analysis. We emphasize key aspects of the process and describe in more detail those that are particularly relevant in the case of birds. We provide elemental information about collection of specimens, detailed protocols for DNA extraction and PCR, and basic aspects of sequencing methodology. In particular, we highlight the primer pairs and thermal cycling profiles associated with successful amplification and sequencing from a broad range of avian species. Finally, we succinctly review the methodology for data analysis, including the detection of errors (such as contamination, misidentifications, or amplification of pseudogenes), assessment of species resolution, detection of divergent intraspecific lineages, and identification of unknown specimens.
    Full-text · Article · Jun 2012 · Methods in molecular biology (Clifton, N.J.)
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Flycatchers in the genus Empidonax are among the most difficult avian taxonomic groups to identify to species. Observers often rely on calls or songs in the field or detailed morphometrics in the hand to identify species. In January and February 2013, we twice captured an Empidonax flycatcher at the Virginia Zoo in Norfolk, Virginia. After being unable to identify the flycatcher to species level using morphometrics and photographs, we extracted DNA from two tail feathers collected during the second encounter to identify the individual genetically. Comparison of cytochrome c oxidase I (COI) with reference sequences in the Barcode of Life Database (BOLD) suggested that the specimen had a >99.8% probability of placement as a Dusky Flycatcher (Empidonax oberholseri). Additional comparisons of NADH dehydrogenase subunit 2 (ND2) to reference sequences in GenBank, however, suggested that the specimen was a Pine Flycatcher (Empidonax affinis), a species not represented in BOLD and confined geographically to a small area in Mexico and Guatemala. After analyzing both COI and ND2 from additional vouchered specimens, the bird caught in Virginia was determined to be a Dusky Flycatcher. We also suspect that some of the sequences in GenBank might derive from incorrectly identified specimens or otherwise could represent overlooked pseudogenes. Because the putative identification, based on GenBank sequences, would have represented the first record of Pine Flycatcher from the United States, our results reinforce the need for carefully vetted and taxonomically comprehensive molecular databases to allow definitive conclusions about sample identity. Further molecular phylogeographic review of this genus is warranted to resolve haplotype ambiguities. Retos sobre la identificación morfológica y molecular de papamoscas del genero Empidonax: un caso de estudio con Empidonax oberholseri Los papamoscas Empidonax se encuentran entre el grupo de aves más difíciles de identificar taxonómicamente a nivel de especies. Para identificar estos pájaros, los observadores dependen, particularmente, del canto o de las llamadas o de detalles que se pueden observar cuando tienen el ave en la mano En enero y febrero del 2013, en dos ocasiones capturamos un Empidonax en el Zoológico de Virginia, Virginia. No pudimos identificar la especie utilizando fotografías o rasgos morfométricos y a tales efectos extrajimos ADN de dos de las plumas del rabo, para tratar de identificar el segundo individuo genéticamente. La comparación de la oxidasa c citocromica I (COI), con la referencia secuencial en la base de datos del “Código de Barras de la Vida” (BOLD) sugirió que el espécimen tenía >99.8% de probabilidad de ser un Empidonax oberholseri. No obstante, una comparación de NADH dehidrogenasa, subunidad 2 (ND2) del Banco Genético (GenBank) sugería que el ave era Empidonax affinis, una especie que no estaba representada en BOLD y geográficamente confinada a un área pequeña de México y Guatemala. Luego de analizar el COI y el ND2 de especímenes adicionales, se determinó que el ave capturada en Virginia era E. oberholseri. Sospechamos que la misma secuencia en el GenBank se había obtenido de un individuo mal identificado, que de otra manera hubiera representado un pseudogene, pasado por alto. Debido a que la identificación putativa, basado en la secuencia del GenBank, hubiera representado el primer registro de un E. affinis en los Estados Unidos, nuestros resultados apoyan la necesidad de tener mucho cuidado con las bases moleculares para permitir la identificación conclusiva de muestras. Se necesita una revisión molecular filogeográfica de los Empidonax para resolver ambigüedades haplotípicas.
    Full-text · Article · Jan 2016 · Journal of Field Ornithology