Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res

Genomic Functional Analysis Section, National Human Genome Research Institute, National Institutes of Health, Rockville, Maryland 20878, USA.
Genome Research (Impact Factor: 14.63). 01/2007; 16(12):1455-64. DOI: 10.1101/gr.4140006
Source: PubMed


Fields such as genomics and systems biology are built on the synergism between computational and experimental techniques. This type of synergism is especially important in accomplishing goals like identifying all functional transcription factor binding sites in vertebrate genomes. Precise detection of these elements is a prerequisite to deciphering the complex regulatory networks that direct tissue specific and lineage specific patterns of gene expression. This review summarizes approaches for in silico, in vitro, and in vivo identification of transcription factor binding sites. A variety of techniques useful for localized- and high-throughput analyses are discussed here, with emphasis on aspects of data generation and verification.

Download full-text


Available from: Peggy J Farnham, Dec 18, 2013
38 Reads
  • Source
    • " many of these motifs , they can be observed at thousands of places in the genome based on random permutation , thus leading to many false positive predictions . The second disadvantage is that , even if a sequence motif actually corresponds to a CRE , this does not convey information about the activity level of the CRE in a particular cell type ( Elnitski et al . , 2006 ) . The recently developed ChIP - seq technology allows us to address both these shortcomings by exploiting the second characteristic of CRE , which is the marked absence of nucleosomes in these regions ( Mathelier et al . , 2015 ) ( Figures 1B , 3 ) . When inactive , the genomic region corresponding to a CRE is packed into nucleosomes "
    [Show abstract] [Hide abstract]
    ABSTRACT: Functional annotation of the genome is important to understand the phenotypic complexity of various species. The road toward functional annotation involves several challenges ranging from experiments on individual molecules to large-scale analysis of high-throughput sequencing (HTS) data. HTS data is typically a result of the protocol designed to address specific research questions. The sequencing results in reads, which when mapped to a reference genome often leads to the formation of distinct patterns (read profiles). Interpretation of these read profiles is essential for their analysis in relation to the research question addressed. Several strategies have been employed at varying levels of abstraction ranging from a somewhat ad hoc to a more systematic analysis of read profiles. These include methods which can compare read profiles, e.g., from direct (non-sequence based) alignments to classification of patterns into functional groups. In this review, we highlight the emerging applications of read profiles for the annotation of non-coding RNA and cis-regulatory elements (CREs) such as enhancers and promoters. We also discuss the biological rationale behind their formation.
    Frontiers in Genetics 05/2015; 6:188. DOI:10.3389/fgene.2015.00188
  • Source
    • "Recently, several algorithms have been developed to identify DNA motifs in a given set of sequences and to determine if they are over-represented compared to that expected by chance [33]. The integration of these computational analyses with experimental techniques is becoming fundamental to identify genome-scale regulatory elements [35], [36], [37]. Examples of recent studies using motif analysis at a genomic scale include genome-wide identification of estrogen receptor binding sites [38], identification of CTCF-binding sites in the human genome [39] and identification of motifs associated with aberrant CpG island methylation [40]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A variety of environmental toxicants have been shown to induce the epigenetic transgenerational inheritance of disease and phenotypic variation. The process involves exposure of a gestating female and the developing fetus to environmental factors that promote permanent alterations in the epigenetic programming of the germline. The molecular aspects of the phenomenon involve epigenetic modifications (epimutations) in the germline (e.g. sperm) that are transmitted to subsequent generations. The current study integrates previously described experimental epigenomic transgenerational data and web-based bioinformatic analyses to identify genomic features associated with these transgenerationally transmitted epimutations. A previously identified genomic feature associated with these epimutations is a low CpG density (<12/100bp). The current observations suggest the transgenerational differential DNA methylation regions (DMR) in sperm contain unique consensus DNA sequence motifs, zinc finger motifs and G-quadruplex sequences. Interaction of molecular factors with these sequences could alter chromatin structure and accessibility of proteins with DNA methyltransferases to alter de novo DNA methylation patterns. G-quadruplex regions can promote the opening of the chromatin that may influence the action of DNA methyltransferases, or factors interacting with them, for the establishment of epigenetic marks. Zinc finger binding factors can also promote this chromatin remodeling and influence the expression of non-coding RNA. The current study identified genomic features associated with sperm epimutations that may explain in part how these sites become susceptible for transgenerational programming.
    PLoS ONE 06/2014; 9(6):e100194. DOI:10.1371/journal.pone.0100194 · 3.23 Impact Factor
  • Source
    • "However, the data obtained in every ChIP-seq experiment demonstrate TF binding specific to cell line or tissue used and strongly dependant on environmental conditions [37]; [38]; [39]. Moreover, the loss of some tissue-specific features (for example as a result of cell immortalization) or different environmental changes may alter the genome wide pattern of TF binding inherent to differentiated cells of living organism [33]; [40]. Since it is known that specific spatial-temporal patterns of gene expression is controlled by combinatorial binding of different TF sets to regulatory units [27]; [41]; [42] it seems promising to search for these regions by integrative analysis of as many as possible of different ChIP-seq data. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A vast amount of SNPs derived from genome-wide association studies are represented by non-coding ones, therefore exacerbating the need for effective identification of regulatory SNPs (rSNPs) among them. However, this task remains challenging since the regulatory part of the human genome is annotated much poorly as opposed to coding regions. Here we describe an approach aggregating the whole set of ENCODE ChIP-seq data in order to search for rSNPs, and provide the experimental evidence of its efficiency. Its algorithm is based on the assumption that the enrichment of a genomic region with transcription factor binding loci (ChIP-seq peaks) indicates its regulatory function, and thereby SNPs located in this region are more likely to influence transcription regulation. To ensure that the approach preferably selects functionally meaningful SNPs, we performed enrichment analysis of several human SNP datasets associated with phenotypic manifestations. It was shown that all samples are significantly enriched with SNPs falling into the regions of multiple ChIP-seq peaks as compared with the randomly selected SNPs. For experimental verification, 40 SNPs falling into overlapping regions of at least 7 TF binding loci were selected from OMIM. The effect of SNPs on the binding of the DNA fragments containing them to the nuclear proteins from four human cell lines (HepG2, HeLaS3, HCT-116, and K562) has been tested by EMSA. A radical change in the binding pattern has been observed for 29 SNPs, besides, 6 more SNPs also demonstrated less pronounced changes. Taken together, the results demonstrate the effective way to search for potential rSNPs with the aid of ChIP-seq data provided by ENCODE project.
    PLoS ONE 10/2013; 8(10):e78833. DOI:10.1371/journal.pone.0078833 · 3.23 Impact Factor
Show more