True Single-molecule DNA sequencing of a Pleistocene horse bone

Centre for GeoGenetics, Natural History Museum of Denmark, Copenhagen University, Copenhagen DK-1350, Denmark.
Genome Research (Impact Factor: 14.63). 07/2011; 21(10):1705-19. DOI: 10.1101/gr.122747.111
Source: PubMed


Second-generation sequencing platforms have revolutionized the field of ancient DNA, opening access to complete genomes of past individuals and extinct species. However, these platforms are dependent on library construction and amplification steps that may result in sequences that do not reflect the original DNA template composition. This is particularly true for ancient DNA, where templates have undergone extensive damage post-mortem. Here, we report the results of the first "true single molecule sequencing" of ancient DNA. We generated 115.9 Mb and 76.9 Mb of DNA sequences from a permafrost-preserved Pleistocene horse bone using the Helicos HeliScope and Illumina GAIIx platforms, respectively. We find that the percentage of endogenous DNA sequences derived from the horse is higher among the Helicos data than Illumina data. This result indicates that the molecular biology tools used to generate sequencing libraries of ancient DNA molecules, as required for second-generation sequencing, introduce biases into the data that reduce the efficiency of the sequencing process and limit our ability to fully explore the molecular complexity of ancient DNA extracts. We demonstrate that simple modifications to the standard Helicos DNA template preparation protocol further increase the proportion of horse DNA for this sample by threefold. Comparison of Helicos-specific biases and sequence errors in modern DNA with those in ancient DNA also reveals extensive cytosine deamination damage at the 3' ends of ancient templates, indicating the presence of 3'-sequence overhangs. Our results suggest that paleogenomes could be sequenced in an unprecedented manner by combining current second- and third-generation sequencing approaches.

Download full-text


Available from: John F Thompson
  • Source
    • "This leads to an excess of C-to-T or G-to-A mismatches, depending on the strand being sequenced. This is most significant in the ends of the reads and decreases rapidly towards the center [15]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern DNA sequencing methods produce vast amounts of data that often requires mapping to a reference genome. Most existing programs use the number of mismatches between the read and the genome as a measure of quality. This approach is without a statistical foundation and can for some data types result in many wrongly mapped reads. Here we present a probabilistic mapping method based on position-specific scoring matrices, which can take into account not only the quality scores of the reads but also user-specified models of evolution and data-specific biases. We show how evolution, data-specific biases, and sequencing errors are naturally dealt with probabilistically. Our method achieves better results than Bowtie and BWA on simulated and real ancient and PAR-CLIP reads, as well as on simulated reads from the AT rich organism P. falciparum, when modeling the biases of these data. For simulated Illumina reads, the method has consistently higher sensitivity for both single-end and paired-end data. We also show that our probabilistic approach can limit the problem of random matches from short reads of contamination and that it improves the mapping of real reads from one organism (D. melanogater) to a related genome (D. simulans). The presented work is an implementation of a novel approach to short read mapping where quality scores, prior mismatch probabilities and mapping qualities are handled in a statistically sound manner. The resulting implementation provides not only a tool for biologists working with low quality and/or biased sequencing data but also a demonstration of the feasibility of using a probability based alignment method on real and simulated data sets.
    Full-text · Article · Apr 2014 · BMC Bioinformatics
  • Source
    • "A total of 12 channels of tSMS was performed on a Helicos HeliScope sequencer at Helicos BioSciences (Cambridge, MA). Eight μl per channel of each bulk and sheared bulk were prepared for tSMS according to the standard protocol for ancient samples [32]. Two channels of each bulk and one channel of each sheared bulk were sequenced. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Identification of historic pathogens is challenging since false positives and negatives are a serious risk. Environmental non-pathogenic contaminants are ubiquitous. Furthermore, public genetic databases contain limited information regarding these species. High-throughput sequencing may help reliably detect and identify historic pathogens. We shotgun-sequenced 8 16th-century Mixtec individuals from the site of Teposcolula Yucundaa (Oaxaca, Mexico) who are reported to have died from the huey cocoliztli ('Great Pestilence' in Nahautl), an unknown disease that decimated native Mexican populations during the Spanish colonial period, in order to identify the pathogen. Comparison of these sequences with those deriving from the surrounding soil and from 4 precontact individuals from the site found a wide variety of contaminant organisms that confounded analyses. Without the comparative sequence data from the precontact individuals and soil, false positives for Yersinia pestis and rickettsiosis could have been reported. False positives and negatives remain problematic in ancient DNA analyses despite the application of high-throughput sequencing. Our results suggest that several studies claiming the discovery of ancient pathogens may need further verification. Additionally, true single molecule sequencing's short read lengths, inability to sequence through DNA lesions, and limited ancient-DNA-specific technical development hinder its application to palaeopathology.
    Full-text · Article · Feb 2014 · BMC Research Notes
  • Source
    • "That AT-overhang ligation is biased against DNA templates with 5′-dT has major consequences for ancient DNA research and museomics. We know now that most ancient DNA templates contain overhanging ends [4], [16], [21]. At such sites, cytosine residues show increased rates of deamination into uracils [17], [18], a chemical analogue to thymines. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Ancient DNA extracts consist of a mixture of endogenous molecules and contaminant DNA templates, often originating from environmental microbes. These two populations of templates exhibit different chemical characteristics, with the former showing depurination and cytosine deamination by-products, resulting from post-mortem DNA damage. Such chemical modifications can interfere with the molecular tools used for building second-generation DNA libraries, and limit our ability to fully characterize the true complexity of ancient DNA extracts. In this study, we first use fresh DNA extracts to demonstrate that library preparation based on adapter ligation at AT-overhangs are biased against DNA templates starting with thymine residues, contrarily to blunt-end adapter ligation. We observe the same bias on fresh DNA extracts sheared on Bioruptor, Covaris and nebulizers. This contradicts previous reports suggesting that this bias could originate from the methods used for shearing DNA. This also suggests that AT-overhang adapter ligation efficiency is affected in a sequence-dependent manner and results in an uneven representation of different genomic contexts. We then show how this bias could affect the base composition of ancient DNA libraries prepared following AT-overhang ligation, mainly by limiting the ability to ligate DNA templates starting with thymines and therefore deaminated cytosines. This results in particular nucleotide misincorporation damage patterns, deviating from the signature generally expected for authenticating ancient sequence data. Consequently, we show that models adequate for estimating post-mortem DNA damage levels must be robust to the molecular tools used for building ancient DNA libraries.
    Full-text · Article · Oct 2013 · PLoS ONE
Show more