On the assessment of statistical significance of three-dimensional colocalization of sets of genomic elements

Department of Biostatistics, University of Washington, Seattle, WA 98109, USA.
Nucleic Acids Research (Impact Factor: 9.11). 01/2012; 40(9):3849-55. DOI: 10.1093/nar/gks012
Source: PubMed


A growing body of experimental evidence supports the hypothesis that the 3D structure of chromatin in the nucleus is closely linked to important functional processes, including DNA replication and gene regulation. In support of this hypothesis, several research groups have examined sets of functionally associated genomic loci, with the aim of determining whether those loci are statistically significantly colocalized. This work presents a critical assessment of two previously reported analyses, both of which used genome-wide DNA-DNA interaction data from the yeast Saccharomyces cerevisiae, and both of which rely upon a simple notion of the statistical significance of colocalization. We show that these previous analyses rely upon a faulty assumption, and we propose a correct non-parametric resampling approach to the same problem. Applying this approach to the same data set does not support the hypothesis that transcriptionally coregulated genes tend to colocalize, but strongly supports the colocalization of centromeres, and provides some evidence of colocalization of origins of early DNA replication, chromosomal breakpoints and transfer RNAs.

4 Reads
  • Source
    • "Due to the complex structure of chromatin conformation capture data , finding suited explicit null distributions is generally not possible ( Paulsen et al . , 2013 ; Witten and Noble , 2012 ) , and even randomization of the data through MC is difficult . Therefore , we consistently perform permutations on the query track only . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently developed methods that couple next-generation sequencing with chromosome conformation capture-based techniques, such as Hi-C and ChIA-PET, allow for characterization of genome-wide chromatin 3D structure. Understanding the organization of chromatin in three dimensions is a crucial next step in the unraveling of global gene regulation, and methods for analyzing such data are needed. We have developed HiBrowse, a user-friendly web-tool consisting of a range of hypothesis-based and descriptive statistics, using realistic assumptions in null-models. HiBrowse is supported by all major browsers, and is freely available at is implemented in Python, and source code is available for download by following instructions on the main site. Supplementary data are available at Bioinformatics online.
    Bioinformatics 02/2014; 30(11). DOI:10.1093/bioinformatics/btu082 · 4.98 Impact Factor
  • Source
    • "Furthermore, statistical methods such as ours provide a systematic way to compare chromatin architecture sets to one another, facilitating, for example, analysis of changes in chromatin organization during development or between healthy and cancer cells. Finally, another direction in which such a systematic method will prove useful is in generating high-confidence contact networks and analyzing the graph or colocalization properties of these networks (Witten and Noble 2012; Paulsen et al. 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Our current understanding of how DNA is packed in the nucleus is most accurate at the fine scale of individual nucleosomes and at the large scale of chromosome territories. However, accurate modeling of DNA architecture at the intermediate scale of ~50 kb-10 Mb is crucial for identifying functional interactions among regulatory elements and their target promoters. We describe a method, Fit-Hi-C, that assigns statistical confidence estimates to mid-range intra-chromosomal contacts by jointly modeling the random polymer looping effect and previously observed technical biases in Hi-C data sets. We demonstrate that our proposed approach computes accurate empirical null models of contact probability without any distribution assumption, corrects for binning artifacts and provides improved statistical power relative to a previously described method. High-confidence contacts identified by Fit-Hi-C preferentially link expressed gene promoters to active enhancers identified by chromatin signatures in human embryonic stem cells (ESCs), capture 77% of RNA polymerase II mediated enhancer-promoter interactions identified using ChIA-PET in mouse ESCs, and confirm previously validated, cell line-specific interactions in mouse cortex cells. Incorporating two sets of independent semi-automated genomic annotations in human ESCs, we observe that insulators and heterochromatin regions are hubs for high-confidence contacts while transcription start sites, promoters and strong enhancers are involved in fewer but potentially more targeted contacts. We also observe that regions containing binding peaks of master pluripotency factors such as NANOG and POU5F1 are highly enriched in high-confidence contacts for human ESCs. Furthermore, we show that pairs of loci linked by high-confidence contacts exhibit similar replication timing in human and mouse ESCs and preferentially lie within the boundaries of previously described topological domains for all human and mouse cell lines analyzed here.
    Genome Research 02/2014; 24(6). DOI:10.1101/gr.160374.113 · 14.63 Impact Factor
  • Source
    • "We generated 50 sets of random contact data in similar fashion to Witten and Noble (29): 500 nodes were randomly distributed uniformly in a unit cube. Euclidean distances between all pairs of nodes were measured and edges drawn between nodes with a distance among the smallest 2% of all distances, leading to an average node degree of 10. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Experimental techniques for the investigation of three-dimensional (3D) genome organization are being developed at a fast pace. Currently, the associated computational methods are mostly specific to the individual experimental approach. Here we present a general statistical framework that is widely applicable to the analysis of genomic contact maps, irrespective of the data acquisition and normalization processes. Within this framework DNA-DNA contact data are represented as a complex network, for which a broad number of directly applicable methods already exist. In such a network representation, DNA segments and contacts between them are denoted as nodes and edges, respectively. Furthermore, we present a robust method for generating randomized contact networks that explicitly take into account the inherent 3D nature of the genome and serve as realistic null-models for unbiased statistical analyses. By integrating a variety of large-scale genome-wide datasets we demonstrate that meiotic crossover sites display enriched genomic contacts and that cohesin-bound genes are significantly colocalized in the yeast nucleus. We anticipate that the complex network framework in conjunction with the randomization of DNA-DNA contact networks will become a widely used tool in the study of nuclear architecture.
    Nucleic Acids Research 11/2012; 41(2). DOI:10.1093/nar/gks1096 · 9.11 Impact Factor
Show more


4 Reads
Available from