Paired-end sequencing of Fosmid libraries by Illumina

Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02141, USA.
Genome Research (Impact Factor: 14.63). 07/2012; 22(11). DOI: 10.1101/gr.138925.112
Source: PubMed


Eliminating the bacterial cloning step has been a major factor in the vastly improved efficiency of massively parallel sequencing approaches. However, this also has made it a technical challenge to produce the modern equivalent of the Fosmid- or BAC-end sequences that were crucial for assembling and analyzing complex genomes during the Sanger-based sequencing era. To close this technology gap, we developed Fosill, a method for converting Fosmids to Illumina-compatible jumping libraries. We constructed Fosmid libraries in vectors with Illumina primer sequences and specific nicking sites flanking the cloning site. Our family of pFosill vectors allows multiplex Fosmid cloning of end-tagged genomic fragments without physical size selection and is compatible with standard and multiplex paired-end Illumina sequencing. To excise the bulk of each cloned insert, we introduced two nicks in the vector, translated them into the inserts, and cleaved them. Recircularization of the vector via coligation of insert termini followed by inverse PCR generates a jumping library for paired-end sequencing with 101-base reads. The yield of unique Fosmid-sized jumps is sufficiently high, and the background of short, incorrectly spaced and chimeric artifacts sufficiently low, to enable applications such as mapping of structural variation and scaffolding of de novo assemblies. We demonstrate the power of Fosill to map genome rearrangements in a cancer cell line and identified three fusion genes that were corroborated by RNA-seq data. Our Fosill-powered assembly of the mouse genome has an N50 scaffold length of 17.0 Mb, rivaling the connectivity (16.9 Mb) of the Sanger-sequencing based draft assembly.

Download full-text


Available from: Aaron M Berlin, Mar 11, 2014
  • Source
    • "The founder animals originated from the colony of J.U. Jarvis, at the University of Cape Town, South Africa. The Heterocephalus glaber assembly, HetGla_female_1.0, was constructed from 180 bp paired end fragment libraries (45 Â coverage), 3 kb jumping libraries (42 Â coverage), 6–14 kb sheared jumping libraries (2 Â coverage) and 40 kb FOSILLs (Williams et al., 2012) (1 Â coverage). All libraries were sequenced by Hi-Seq Illumina machines, producing 101 bp paired-end reads. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs. Results: We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat’s extraordinary traits, including in regions of p53, and the hyaluronan receptors CD44 and HMMR (RHAMM). Furthermore, we developed a freely available web portal, the Naked Mole Rat Genome Resource (, featuring the data and results of our analysis, to assist researchers interested in the genome and genes of the naked mole rat, and also to facilitate further studies on this fascinating species. Availability and implementation: The Naked Mole Rat Genome Resource is freely available online at This resource is open source and the source code is available at Contact:
    Full-text · Article · Aug 2014 · Bioinformatics
  • Source
    • "Several approaches are available for providing scaffolding capabilities. These include the generation of mate-paired reads from variable lengths of inserts (Boetzer et al., 2011; Gao et al., 2011; Gritsenko et al., 2012; Williams et al., 2012; Hunt et al., 2014; Kajitani et al., 2014; Zimin et al., 2014) or using transcript sequences (Mortazavi et al., 2010). Mate-paired reads can be generated from Illumina sequencing using libraries of various sizes, by using Fosmid libraries (Williams et al., 2012) or bacterial artificial chromosome (BAC) libraries (Xu et al., 2007; Liu et al., 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Rapid advances of the next-generation sequencing technologies have allowed whole genome sequencing of many species. However, with the current sequencing technologies, the whole genome sequence assemblies often fall in short in one of the four quality measurements: accuracy, contiguity, connectivity, and completeness. In particular, small-sized contigs and scaffolds limit the applicability of whole genome sequences for genetic analysis. To enhance the quality of whole genome sequence assemblies, particularly the scaffolding capabilities, additional genomic resources are required. Among these, sequences derived from known physical locations offer great powers for scaffolding. In this mini-review, we will describe the principles, procedures and applications of physical-map-derived sequences, with the focus on physical map contig-specific sequences.
    Full-text · Article · Jul 2014 · Frontiers in Genetics
  • Source
    • "Bacteriophage lambda packaging restricts the fragment length to ~40 Kbp. Fosmid ends can produce mate-pair (jump) libraries that facilitate the assembly of shotgun genome sequences in the absence of large-scale bacterial cloning [3, 4]. Another application of Fosmids is in obtaining material for genome-scale sequencing via a massive Fosmid-based approach in which the inserts are completely sequenced. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality. Results: In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS. Conclusions: By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.We have made public the input data (FASTQ format) for the set of pools used in this study: accessible via software used for running the assembly process is available at
    Full-text · Article · Jun 2014 · BMC Genomics
Show more