Sequencing the unsequenceable: Expanded CGG-repeat alleles of the fragile X gene

Biochemistry and Molecular Medicine, University of California, Davis, School of Medicine
Genome Research (Impact Factor: 14.63). 10/2012; 23(1). DOI: 10.1101/gr.141705.112
Source: PubMed


The human fragile X mental retardation 1 (FMR1) gene contains a (CGG)(n) trinucleotide repeat in its 5' untranslated region (5'UTR). Expansions of this repeat result in a number of clinical disorders with distinct molecular pathologies, including fragile X syndrome (FXS; full mutation range, > 200 CGG repeats) and fragile X-associated tremor/ataxia syndrome (FXTAS; premutation range, 55-200 repeats). Study of these diseases has been limited by an inability to sequence expanded CGG repeats, particularly in the full mutation range, with existing DNA sequencing technologies. Single molecule real time (SMRT) sequencing provides an approach to sequencing that is fundamentally different from other "next-generation" sequencing platforms, and is well suited for long, repetitive DNA sequences. We report the first sequence data for expanded CGG-repeat FMR1 alleles in the full mutation range that reveal the confounding effects of CGG-repeat tracts on both cloning and PCR. A unique feature of SMRT sequencing is its ability to yield real-time information on the rates of nucleoside addition by the tethered DNA polymerase; for the CGG-repeat alleles, we find a strand-specific effect of CGG-repeat DNA on the inter-pulse distance. This kinetic signature reveals a novel aspect of the repeat element; namely, that the particular G bias within the CGG/CCG-repeat element influences polymerase activity in a manner that extends beyond simple nearest-neighbor effects. These observations provide a baseline for future kinetic studies of repeat elements, as well as for studies of epigenetic and other chemical modifications thereof.

Download full-text


Available from: Erick W Loomis, May 14, 2014
  • Source
    • "Novel sequencing platforms, such as SMRT (Single Molecule Real Time sequencing) PacBio RS family [21] by Pacific Biosciences, generate reads with a mean length of 8.500 bases and longest reads exceeding 30 Kbp. From metagenomic studies to genome-based personalised patients care, longer reads are mandatory to solve structural complexities in nucleotide sequences that are analysed in heterogeneous assays including de novo genome assembly [22], haplotype phasing [23], transcriptome analysis [24], and structural and copy number analysis [25]. In [26], a review of alignment algorithms, by introducing their practical applications on different types of experimental data, is proposed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we advocate high-level programming methodology for Next Generation Sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools against their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols and task scheduling, gaining more possibility for seamless performance tuning. In this work we show some use case in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.
    Full-text · Article · Jun 2014 · BioMed Research International
  • Source
    • "For analysis of large gene inserts mediated by HDR, we have overcome this obstacle using embedded primers that distinguish between targeted and WT allele sequences while producing similar PCR amplicon sizes (Voit et al., 2014). Simultaneous measurement of amplicons with different lengths has also been achieved by adding a size standard ladder to the SMRT sequencing reaction, and a similar strategy could be used for quantification of large gene additions or NHEJ-mediated integrations of the donor template (Loomis et al., 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Targeted genome editing with engineered nucleases has transformed the ability to introduce precise sequence modifications at almost any site within the genome. A major obstacle to probing the efficiency and consequences of genome editing is that no existing method enables the frequency of different editing events to be simultaneously measured across a cell population at any endogenous genomic locus. We have developed a method for quantifying individual genome-editing outcomes at any site of interest with single-molecule real-time (SMRT) DNA sequencing. We show that this approach can be applied at various loci using multiple engineered nuclease platforms, including transcription-activator-like effector nucleases (TALENs), RNA-guided endonucleases (CRISPR/Cas9), and zinc finger nucleases (ZFNs), and in different cell lines to identify conditions and strategies in which the desired engineering outcome has occurred. This approach offers a technique for studying double-strand break repair, facilitates the evaluation of gene-editing technologies, and permits sensitive quantification of editing outcomes in almost every experimental system used.
    Full-text · Article · Mar 2014 · Cell Reports
  • Source
    • "Promising avenues for overcoming this technical limitation of short-read sequencing include the subcloning of individual subtelomeres, allowing independent sequencing and assembly , and the use of emerging sequencing technologies that produce much longer reads (Bashir et al. 2012; Loomis et al. 2013). We find systematic trends in the types of genes that tend to be affected by certain types of potentially functional variation. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The question of how genetic variation in a population influences phenotypic variation and evolution is of major impor-tance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymor-phisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metab-olism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies.
    Full-text · Article · Jan 2014 · Molecular Biology and Evolution
Show more