Toward Breaking the Histone Code Bayesian Graphical Models for Histone Modifications
ABSTRACT BACKGROUND: -Histones are proteins that wrap DNA around in small spherical structures called nucleosomes. Histone modifications (HMs) refer to the post-translational modifications to the histone tails. At a particular genomic locus, each of these HMs can either be present or absent, and the combinatory patterns of the presence or absence of multiple HMs, or the 'histone codes', are believed to co-regulate important biological processes. We aim to use raw data on HM markers at different genomic loci to (1) decode the complex biological network of HMs in a single region and (2) demonstrate how the HM networks differ in different regulatory regions. We suggest that these differences in network attributes form a signiﬁcant link between histones and genomic functions. METHODS AND RESULTS: -We develop a powerful graphical model under Bayesian paradigm. Posterior inference is fully probabilistic, allowing us to compute the probabilities of distinct dependence patterns of the HMs using graphs. Furthermore, our model-based framework allows for easy but important extensions for inference on differential networks under various conditions, such as the different annotations of the genomic locations (e.g., promoters versus insulators). We applied these models to ChIP-Seq data based on CD4+ T lymphocytes. The results confirmed many existing findings and provided a unified tool to generate various promising hypotheses. Differential network analyses revealed new insights on co-regulation of HMs of transcriptional activities in different genomic regions. CONCLUSIONS: -The use of Bayesian graphical models and borrowing strength across different conditions provide high power to infer histone networks and their differences.
SourceAvailable from: Kairong Cui[Show abstract] [Hide abstract]
ABSTRACT: Histone modifications are implicated in influencing gene expression. We have generated high-resolution maps for the genome-wide distribution of 20 histone lysine and arginine methylations as well as histone variant H2A.Z, RNA polymerase II, and the insulator binding protein CTCF across the human genome using the Solexa 1G sequencing technology. Typical patterns of histone methylations exhibited at promoters, insulators, enhancers, and transcribed regions are identified. The monomethylations of H3K27, H3K9, H4K20, H3K79, and H2BK5 are all linked to gene activation, whereas trimethylations of H3K27, H3K9, and H3K79 are linked to repression. H2A.Z associates with functional regulatory elements, and CTCF marks boundaries of histone methylation domains. Chromosome banding patterns are correlated with unique patterns of histone modifications. Chromosome breakpoints detected in T cell cancers frequently reside in chromatin regions associated with H3K4 methylations. Our data provide new insights into the function of histone methylation and chromatin organization in genome function.Cell 06/2007; 129(4):823-37. DOI:10.1016/j.cell.2007.05.009 · 33.12 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: Alterations in modifications of histones have been linked to deregulated expression of many genes with important roles in cancer development and progression. The effects of these alterations have so far been interpreted from a promoter-specific viewpoint, focussing on gene-gene differences in patterns of histone modifications. However, recent findings suggest that cancer tissues also display cell-cell differences in total amount of specific histone modifications. This novel cellular epigenetic heterogeneity is related to clinical outcome of cancer patients and may serve as a valuable marker of prognosis.British Journal of Cancer 08/2007; 97(1):1-5. DOI:10.1038/sj.bjc.6603844 · 4.82 Impact Factor
[Show abstract] [Hide abstract]
ABSTRACT: We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Bühlmann (2006). We illustrate the method on some cell-signaling data from proteomics.Biostatistics 08/2008; 9(3):432-41. DOI:10.1093/biostatistics/kxm045 · 2.24 Impact Factor