Long-range massively parallel mate pair sequencing detects distinct mutations and similar patterns of structural mutability in two breast cancer cell lines

Graduate Program in Structural and Computational Biology and Molecular Biophysics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
Cancer Genetics (Impact Factor: 2.98). 08/2011; 204(8):447-57. DOI: 10.1016/j.cancergen.2011.07.009
Source: PubMed


Cancer genomes frequently undergo genomic instability resulting in accumulation of chromosomal rearrangement. To date, one of the main challenges has been to confidently and accurately identify these rearrangements by using short-read massively parallel sequencing. We were able to improve cancer rearrangement detection by combining two distinct massively parallel sequencing strategies: fosmid-sized (36 kb on average) and standard 5 kb mate pair libraries. We applied this combined strategy to map rearrangements in two breast cancer cell lines, MCF7 and HCC1954. We detected and validated a total of 91 somatic rearrangements in MCF7 and 25 in HCC1954, including genomic alterations corresponding to previously reported transcript aberrations in these two cell lines. Each of the genomes contains two types of breakpoints: clustered and dispersed. In both cell lines, the dispersed breakpoints show enrichment for low copy repeats, while the clustered breakpoints associate with high copy number amplifications. Comparing the two genomes, we observed highly similar structural mutational spectra affecting different sets of genes, pointing to similar histories of genomic instability against the background of very different gene network perturbations.

Download full-text


Available from: Cristian Coarfa, Oct 02, 2015
45 Reads
  • Source
    • "Cloning-free methods for linking reads together over long stretches of DNA range from 2 to 20 kb with increasing need for input DNA for increasing insert sizes. Cloning-based libraries such as fosmid libraries can further increase the insert size17. By incorporating a known distance between two sequence reads during construction of the library, that information can be utilized when deciphering the structure of a known genome, ordering the contigs built from assembly, to span or aid mapping in low complexity regions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Here we demonstrate the use of short-read massive sequencing systems to in effect achieve longer read lengths through hierarchical molecular tagging. We show how indexed and PCR-amplified targeted libraries are degraded, sub-sampled and arrested at timed intervals to achieve pools of differing average length, each of which is indexed with a new tag. By this process, indices of sample origin, molecular origin, and degree of degradation is incorporated in order to achieve a nested hierarchical structure, later to be utilized in the data processing to order the reads over a longer distance than the sequencing system originally allows. With this protocol we show how continuous regions beyond 3000 bp can be decoded by an Illumina sequencing system, and we illustrate the potential applications by calling variants of the lambda genome, analysing TP53 in cancer cell lines, and targeting a variable canine mitochondrial region.
    Scientific Reports 03/2013; 3:1186. DOI:10.1038/srep01186 · 5.58 Impact Factor
  • Source
    • "In ZR-75-30, using structural analysis, we found half of the six expressed fusions detected by Robinson DR et al. [15], while, using cDNA sequencing, they found three of the nine we detected—both figures suggest the true total might be around 18. This is consistent with recent, probably incomplete, figures from other cell lines: 20 expressed fusions have been verified in MCF7, with several more predicted computationally [6,13,15,40]; 43 have been found in BT474 and 13 in SKBR3 [13]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background It has recently emerged that common epithelial cancers such as breast cancers have fusion genes like those in leukaemias. In a representative breast cancer cell line, ZR-75-30, we searched for fusion genes, by analysing genome rearrangements. Results We first analysed rearrangements of the ZR-75-30 genome, to around 10kb resolution, by molecular cytogenetic approaches, combining array painting and array CGH. We then compared this map with genomic junctions determined by paired-end sequencing. Most of the breakpoints found by array painting and array CGH were identified in the paired end sequencing—55% of the unamplified breakpoints and 97% of the amplified breakpoints (as these are represented by more sequence reads). From this analysis we identified 9 expressed fusion genes: APPBP2-PHF20L1, BCAS3-HOXB9, COL14A1-SKAP1, TAOK1-PCGF2, TIAM1-NRIP1, TIMM23-ARHGAP32, TRPS1-LASP1, USP32-CCDC49 and ZMYM4-OPRD1. We also determined the genomic junctions of a further three expressed fusion genes that had been described by others, BCAS3-ERBB2, DDX5-DEPDC6/DEPTOR and PLEC1-ENPP2. Of this total of 12 expressed fusion genes, 9 were in the coamplification. Due to the sensitivity of the technologies used, we estimate these 12 fusion genes to be around two-thirds of the true total. Many of the fusions seem likely to be driver mutations. For example, PHF20L1, BCAS3, TAOK1, PCGF2, and TRPS1 are fused in other breast cancers. HOXB9 and PHF20L1 are members of gene families that are fused in other neoplasms. Several of the other genes are relevant to cancer—in addition to ERBB2, SKAP1 is an adaptor for Src, DEPTOR regulates the mTOR pathway and NRIP1 is an estrogen-receptor coregulator. Conclusions This is the first structural analysis of a breast cancer genome that combines classical molecular cytogenetic approaches with sequencing. Paired-end sequencing was able to detect almost all breakpoints, where there was adequate read depth. It supports the view that gene breakage and gene fusion are important classes of mutation in breast cancer, with a typical breast cancer expressing many fusion genes.
    BMC Genomics 12/2012; 13(1):719. DOI:10.1186/1471-2164-13-719 · 3.99 Impact Factor
  • Source
    • "In this respect, Fosills are similar to Fosmid ''diTags'' (Hampton et al. 2011). The principal advantage of our method is that it allows much longer sequencing reads (up to 2 3 101 bases in the current study, but even longer reads are possible), whereas the EcoP15I digest strictly limits the diTags to 2 3 26 bases. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Eliminating the bacterial cloning step has been a major factor in the vastly improved efficiency of massively parallel sequencing approaches. However, this also has made it a technical challenge to produce the modern equivalent of the Fosmid- or BAC-end sequences that were crucial for assembling and analyzing complex genomes during the Sanger-based sequencing era. To close this technology gap, we developed Fosill, a method for converting Fosmids to Illumina-compatible jumping libraries. We constructed Fosmid libraries in vectors with Illumina primer sequences and specific nicking sites flanking the cloning site. Our family of pFosill vectors allows multiplex Fosmid cloning of end-tagged genomic fragments without physical size selection and is compatible with standard and multiplex paired-end Illumina sequencing. To excise the bulk of each cloned insert, we introduced two nicks in the vector, translated them into the inserts, and cleaved them. Recircularization of the vector via coligation of insert termini followed by inverse PCR generates a jumping library for paired-end sequencing with 101-base reads. The yield of unique Fosmid-sized jumps is sufficiently high, and the background of short, incorrectly spaced and chimeric artifacts sufficiently low, to enable applications such as mapping of structural variation and scaffolding of de novo assemblies. We demonstrate the power of Fosill to map genome rearrangements in a cancer cell line and identified three fusion genes that were corroborated by RNA-seq data. Our Fosill-powered assembly of the mouse genome has an N50 scaffold length of 17.0 Mb, rivaling the connectivity (16.9 Mb) of the Sanger-sequencing based draft assembly.
    Genome Research 07/2012; 22(11). DOI:10.1101/gr.138925.112 · 14.63 Impact Factor
Show more