[Show abstract][Hide abstract] ABSTRACT: Structural variations (SVs) contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. While genomic DNA paired-end-tag (DNA-PET) sequencing is an attractive approach to identify genomic SVs, the current application of PET sequencing with short insert size DNA can be insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We employed a recently developed procedure to generate PET sequencing data using large DNA inserts of 10-20 kb and compared their characteristics with short insert (1 kb) libraries for their ability to identify SVs. Our results suggest that although short insert libraries bear an advantage in identifying small deletions, they do not provide significantly better breakpoint resolution. In contrast, large inserts are superior to short inserts in providing higher physical genome coverage for the same sequencing cost and achieve greater sensitivity, in practice, for the identification of several classes of SVs, such as copy number neutral and complex events. Furthermore, our results confirm that large insert libraries allow for the identification of SVs within repetitive sequences, which cannot be spanned by short inserts. This provides a key advantage in studying rearrangements in cancer, and we show how it can be used in a fusion-point-guided-concatenation algorithm to study focally amplified regions in cancer.
PLoS ONE 01/2012; 7(9):e46152. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Somatic genome rearrangements are thought to play important roles in cancer development. We optimized a long-span paired-end-tag (PET) sequencing approach using 10-Kb genomic DNA inserts to study human genome structural variations (SVs). The use of a 10-Kb insert size allows the identification of breakpoints within repetitive or homology-containing regions of a few kilobases in size and results in a higher physical coverage compared with small insert libraries with the same sequencing effort. We have applied this approach to comprehensively characterize the SVs of 15 cancer and two noncancer genomes and used a filtering approach to strongly enrich for somatic SVs in the cancer genomes. Our analyses revealed that most inversions, deletions, and insertions are germ-line SVs, whereas tandem duplications, unpaired inversions, interchromosomal translocations, and complex rearrangements are over-represented among somatic rearrangements in cancer genomes. We demonstrate that the quantitative and connective nature of DNA-PET data is precise in delineating the genealogy of complex rearrangement events, we observe signatures that are compatible with breakage-fusion-bridge cycles, and we discover that large duplications are among the initial rearrangements that trigger genome instability for extensive amplification in epithelial cancers.
Genome Research 04/2011; 21(5):665-75. · 14.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Mammalian genomes are viewed as functional organizations that orchestrate spatial and temporal gene regulation. CTCF, the most characterized insulator-binding protein, has been implicated as a key genome organizer. However, little is known about CTCF-associated higher-order chromatin structures at a global scale. Here we applied chromatin interaction analysis by paired-end tag (ChIA-PET) sequencing to elucidate the CTCF-chromatin interactome in pluripotent cells. From this analysis, we identified 1,480 cis- and 336 trans-interacting loci with high reproducibility and precision. Associating these chromatin interaction loci with their underlying epigenetic states, promoter activities, enhancer binding and nuclear lamina occupancy, we uncovered five distinct chromatin domains that suggest potential new models of CTCF function in chromatin organization and transcriptional control. Specifically, CTCF interactions demarcate chromatin-nuclear membrane attachments and influence proper gene expression through extensive cross-talk between promoters and regulatory elements. This highly complex nuclear organization offers insights toward the unifying principles that govern genome plasticity and function.
[Show abstract][Hide abstract] ABSTRACT: MBF (or DSC1) is known to regulate transcription of a set of G(1)/S-phase genes encoding proteins involved in regulation of DNA replication. Previous studies have shown that MBF binds not only the promoter of G(1)/S-phase genes, but also the constitutive genes; however, it was unclear if the MBF bindings at the G(1)/S-phase and constitutive genes were mechanistically distinguishable. Here, we report a chromatin immunoprecipitation-microarray (ChIP-chip) analysis of MBF binding in the Schizosaccharomyces pombe genome using high-resolution genome tiling microarrays. ChIP-chip analysis indicates that the majority of the MBF occupancies are located at the intragenic regions. Deconvolution analysis using Rpb1 ChIP-chip results distinguishes the Cdc10 bindings at the Rpb1-poor loci (promoters) from those at the Rpb1-rich loci (intragenic sequences). Importantly, Res1 binding at the Rpb1-poor loci, but not at the Rpb1-rich loci, is dependent on the Cdc10 function, suggesting a distinct binding mechanism. Most Cdc10 promoter bindings at the Rpb1-poor loci are associated with the G(1)/S-phase genes. While Res1 or Res2 is found at both the Cdc10 promoter and intragenic binding sites, Rep2 appears to be absent at the Cdc10 promoter binding sites but present at the intragenic sites. Time course ChIP-chip analysis demonstrates that Rep2 is temporally accumulated at the coding region of the MBF target genes, resembling the RNAP-II occupancies. Taken together, our results show that deconvolution analysis of Cdc10 occupancies refines the functional subset of genomic binding sites. We propose that the MBF activator Rep2 plays a role in mediating the cell cycle-specific transcription through the recruitment of RNAP-II to the MBF-bound G(1)/S-phase genes.