Vol. 26 no. 13 2010, pages 1579–1586
Discover regulatory DNA elements using chromatin signatures
and artificial neural network
Hiram A. Firpi1, Duygu Ucar1and Kai Tan1,2,∗
1Department of Internal Medicine and2Department of Biomedical Engineering, University of Iowa, 2294 CBRB,
285 Newton Road, Iowa City, IA 52242, USA
Associate Editor: Alex Bateman
Advance Access publication May 7, 2010
Motivation: Recent large-scale chromatin states mapping efforts
have revealed characteristic chromatin modification signatures for
various types of functional DNA elements. Given the important
influence of chromatin states on gene regulation and the rapid
accumulation of genome-wide chromatin modification data, there
is a pressing need for computational methods to analyze these
data in order to identify functional DNA elements. However, existing
computational tools do not exploit data transformation and feature
extraction as a means to achieve a more accurate prediction.
We introduce a new computational framework for
identifying functional DNA elements using chromatin signatures. The
framework consists of a data transformation and a feature extraction
step followed by a classification step using time-delay neural
network. We implemented our framework in a software tool CSI-
ANN (chromatin signature identification by artificial neural network).
When applied to predict transcriptional enhancers in the ENCODE
region, CSI-ANN achieved a 65.5% sensitivity and 66.3% positive
predictive value, a 5.9% and 11.6% improvement, respectively, over
the previously best approach.
Availability and Implementation:
Supplementary Information: Supplementary Materials are available
at Bioinformatics online.
available freely at
Received on March 26, 2010; revised on April 30, 2010; accepted on
May 5, 2010
Cis-acting regulatory DNA elements, such as promoters, enhancers
and insulators play an essential role in establishing precise
temporal and tissue-specific gene expression patterns. Systematic
and precise mapping of these regulatory DNA elements, especially
enhancers, is a prerequisite for understanding gene expression
programs in both healthy and diseased cells. Experimentally,
enhancers can be mapped using the powerful technique, chromatin
immunoprecipitation coupled with microarray chip (ChIP-Chip)
(Kim and Ren, 2006) or short-read sequencing (ChIP-Seq) (Park,
2009). However, this approach is limited by the availability of a
large number of chIP-grade antibodies specifically recognizing the
∗To whom correspondence should be addressed.
transcription factors (TFs) of interest. On the other hand, enhancers
can be computationally predicted based on the observation that
they often contain dense clusters of TF binding sites (TFBS) in a
short stretch of DNA(<1000bp) and are often conserved. Methods
relying on clustering of TFBS (Frith et al., 2003; Pennacchio et al.,
2007; Sinha et al., 2003) require prior knowledge of the binding
based on sequence conservation (Blanchette et al., 2006; King et al.,
2005; Visel et al., 2008) require precise alignment of regulatory
DNAsequences from multiple species, which is not necessarily true
for all elements.
Histone proteins in chromatin are subject to a number of
covalent modifications, primarily at their N-terminal tails, including
methylation, acetylation, phosphorylation, ubiquitylation and
ADP-ribosylation. These chromatin modifications have profound
influences on gene expression (Schones and Zhao, 2008). Numerous
genome-wide ChIP-Chip/Seq studies have provided data on the
distribution of histone modifications in various model organisms
and cell types. A picture is now emerging in which distinct
genomic regions such as enhancers, promoters and gene bodies
(both protein coding and non-coding RNA genes) have distinct
and Zhao, 2008). For example, high levels of histone 3 lysine
4 methylation have been found at gene promoters and at many
enhancers (Heintzman et al., 2007, 2009; Wang et al., 2008). In
addition, it has been shown that many regulatory elements carry
these epigenetic modifications only in specific cell/tissue types or
according to environmental conditions, which cannot be determined
by comparative genomics based on sequence alone. Collectively,
these observations suggest that epigenetic signatures could be an
alternative and powerful way to pinpoint regulatory DNA elements
in the genome.
Given the rapid growth of genome-wide chromatin modification
data from different species and cell types, there is now a pressing
need for computational tools capable of integrating various histone
modification maps to discover regulatory DNA elements. Recently,
this need. Heintzman et al. (2007) were the first to develop a
computational tool for predicting promoters and enhancers in HeLa
cells using six histone modification maps covering 1% of the
human genome. Their algorithm predicts promoters and enhancers
based on correlation to the average histone modification profiles
trained on known examples. In spite of the success of the profile-
based method, it is limited in two aspects: (i) the contribution
of each histone modification mark to the classification method
© The Author 2010. Published by Oxford University Press. All rights reserved. For Permissions, please email: firstname.lastname@example.org
H.A.Firpi et al.
Heintzman,N.D. and Ren,B. (2009) Finding distal regulatory elements in the human
genome. Curr. Opin. Genet. Dev., 19, 541–549.
Heintzman,N.D. et al. (2007) Distinct and predictive chromatin signatures of
transcriptional promoters and enhancers in the human genome. Nat. Genet., 39,
Heintzman,N.D. et al. (2009) Histone modifications at human enhancers reflect global
cell-type-specific gene expression. Nature, 459, 108–112.
Hon,G. et al. (2008) ChromaSig: a probabilistic approach to finding common chromatin
signatures in the human genome. PLoS Comput. Biol., 4, e1000201.
Kailath,T. (1967) The divergence and Bhattacharyya distance measures in signal
selection. IEEE Trans Comm. Tech., 15, 52–60.
Khalil,A.M. et al. (2009) Many human large intergenic noncoding RNAs associate with
chromatin-modifying complexes and affect gene expression. Proc. Natl Acad. Sci.
USA, 106, 11667–11672.
Rev. Genomics. Hum. Genet., 7, 81–102.
King,D.C. et al. (2005) Evaluation of regulatory potential and conservation scores
for detecting cis-regulatory modules in aligned mammalian genome sequences.
Genome Res., 15, 1051–1060.
King,G.F. and Kuchel,P.W. (1994) Theoretical and practical aspects of NMR studies of
cells. Immunomethods, 4, 85–97.
Park,P.J. (2009) ChIP-seq: advantages and challenges of a maturing technology. Nat.
Rev. Genet., 10, 669–680.
Genome Res., 17, 201–211.
Raychaudhuri,S. et al. (2000) Principal components analysis to summarize microarray
experiments: application to sporulation time series. Pac. Symp. Biocomput.,
Schones,D.E. and Zhao,K. (2008) Genome-wide approaches to studying chromatin
modifications. Nat. Rev. Genet., 9, 179–191.
Sinha,S. et al. (2003) A probabilistic method to detect regulatory modules.
Bioinformatics, 19(Suppl. 1), i292–i301.
Smart,O. et al. (2007) Genetic programming of conventional features to detect seizure
precursors. Eng. Appl. Artif Intell., 20, 1070–1085.
Strahl,B.D. and Allis,C.D. (2000) The language of covalent histone modifications.
Nature, 403, 41–45.
Su,A.I. et al. (2004) A gene atlas of the mouse and human protein-encoding
transcriptomes. Proc. Natl Acad. Sci. USA, 101, 6062–6067.
Thurman,R.E. et al. (2007) Identification of higher-order functional domains in the
human ENCODE regions. Genome Res., 17, 917–927.
Visel,A. et al. (2008) Ultraconservation identifies a small subset of extremely
constrained developmental enhancers. Nat. Genet., 40, 158–160.
Waibel,A. et al. (1989) Phoneme recognition using time-delay neural networks. IEEE
Trans. Acousc. Speech Sig. Proc., 37, 328–339.
Wang,Z. et al. (2008) Combinatorial patterns of histone acetylations and methylations
in the human genome. Nat. Genet., 40, 897–903.
Wang,Z. et al. (2009) Genome-wide mapping of HATs and HDACs reveals distinct
functions in active and inactive genes. Cell, 138, 1019–1031.
Won,K.J. et al. (2008) Prediction of regulatory elements in mammalian genomes using
chromatin signatures. BMC Bioinformatics, 9, 547.
Zhang,Y. et al. (2008) Identifying positioned nucleosomes with epigenetic marks in
human from ChIP-Seq. BMC Genomics, 9, 537.