On the assessment of statistical significance of three-dimensional colocalization of sets of genomic elements.

Department of Biostatistics, University of Washington, Seattle, WA 98109, USA.
Nucleic Acids Research (Impact Factor: 8.81). 01/2012; 40(9):3849-55. DOI: 10.1093/nar/gks012
Source: PubMed

ABSTRACT A growing body of experimental evidence supports the hypothesis that the 3D structure of chromatin in the nucleus is closely linked to important functional processes, including DNA replication and gene regulation. In support of this hypothesis, several research groups have examined sets of functionally associated genomic loci, with the aim of determining whether those loci are statistically significantly colocalized. This work presents a critical assessment of two previously reported analyses, both of which used genome-wide DNA-DNA interaction data from the yeast Saccharomyces cerevisiae, and both of which rely upon a simple notion of the statistical significance of colocalization. We show that these previous analyses rely upon a faulty assumption, and we propose a correct non-parametric resampling approach to the same problem. Applying this approach to the same data set does not support the hypothesis that transcriptionally coregulated genes tend to colocalize, but strongly supports the colocalization of centromeres, and provides some evidence of colocalization of origins of early DNA replication, chromosomal breakpoints and transfer RNAs.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recently developed methods that couple next-generation sequencing with chromosome conformation capture-based techniques, such as Hi-C and ChIA-PET, allow for characterization of genome-wide chromatin 3D structure. Understanding the organization of chromatin in three dimensions is a crucial next step in the unraveling of global gene regulation, and methods for analyzing such data are needed. We have developed HiBrowse, a user-friendly web-tool consisting of a range of hypothesis-based and descriptive statistics, using realistic assumptions in null-models. HiBrowse is supported by all major browsers, and is freely available at is implemented in Python, and source code is available for download by following instructions on the main site. Supplementary data are available at Bioinformatics online.
    Bioinformatics 02/2014; 30(11). DOI:10.1093/bioinformatics/btu082 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent studies used the contact data or three-dimensional (3D) genome reconstructions from Hi-C (chromosome conformation capture with next-generation sequencing) to assess the co-localization of functional genomic annotations in the nucleus. These analyses dichotomized data point pairs belonging to a functional annotation as "close" or "far" based on some threshold and then tested for enrichment of "close" pairs. We propose an alternative approach that avoids dichotomization of the data and instead directly estimates the significance of distances within the 3D reconstruction. We applied this approach to 3D genome reconstructions for Plasmodium falciparum, the causative agent of malaria, and Saccharomyces cerevisiae and compared the results to previous approaches. We found significant 3D co-localization of centromeres, telomeres, virulence genes, and several sets of genes with developmentally regulated expression in P. falciparum; and significant 3D co-localization of centromeres and long terminal repeats in S. cerevisiae. Additionally, we tested the experimental observation that telomeres form three to seven clusters in P. falciparum and S. cerevisiae. Applying affinity propagation clustering to telomere coordinates in the 3D reconstructions yielded six telomere clusters for both organisms. Distance-based assessment replicated key findings, while avoiding dichotomization of the data (which previously yielded threshold-sensitive results).
    BMC Genomics 11/2014; 15(1):992. DOI:10.1186/1471-2164-15-992 · 4.04 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: It is widely recognized that the three-dimensional (3D) architecture of eukaryotic chromatin plays an important role in processes such as gene regulation and cancer-driving gene fusions. Observing or inferring this 3D structure at even modest resolutions had been problematic, since genomes are highly condensed and traditional assays are coarse. However, recently devised high-throughput molecular techniques have changed this situation. Notably, the development of a suite of chromatin conformation capture (CCC) assays has enabled elicitation of contacts-spatially close chromosomal loci-which have provided insights into chromatin architecture. Most analysis of CCC data has focused on the contact level, with less effort directed toward obtaining 3D reconstructions and evaluating the accuracy and reproducibility thereof. While questions of accuracy must be addressed experimentally, questions of reproducibility can be addressed statistically-the purpose of this paper. We use a constrained optimization technique to reconstruct chromatin configurations for a number of closely related yeast datasets and assess reproducibility using four metrics that measure the distance between 3D configurations. The first of these, Procrustes fitting, measures configuration closeness after applying reflection, rotation, translation, and scaling-based alignment of the structures. The others base comparisons on the within-configuration inter-point distance matrix. Inferential results for these metrics rely on suitable permutation approaches. Results indicate that distance matrix-based approaches are preferable to Procrustes analysis, not because of the metrics per se but rather on account of the ability to customize permutation schemes to handle within-chromosome contiguity. It has recently been emphasized that the use of constrained optimization approaches to 3D architecture reconstruction are prone to being trapped in local minima. Our methods of reproducibility assessment provide a means for comparing 3D reconstruction solutions so that we can discern between local and global optima by contrasting solutions under perturbed inputs.
    Biostatistics 02/2014; 15(3). DOI:10.1093/biostatistics/kxu003 · 2.24 Impact Factor


Available from