Article

GADEM: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery

Biostatistics Branch, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, NC 27709, USA.
Journal of computational biology: a journal of computational molecular cell biology (Impact Factor: 1.67). 02/2009; 16(2):317-29. DOI: 10.1089/cmb.2008.16TT
Source: PubMed

ABSTRACT Genome-wide analyses of protein binding sites generate large amounts of data; a ChIP dataset might contain 10,000 sites. Unbiased motif discovery in such datasets is not generally feasible using current methods that employ probabilistic models. We propose an efficient method, GADEM, which combines spaced dyads and an expectation-maximization (EM) algorithm. Candidate words (four to six nucleotides) for constructing spaced dyads are prioritized by their degree of overrepresentation in the input sequence data. Spaced dyads are converted into starting position weight matrices (PWMs). GADEM then employs a genetic algorithm (GA), with an embedded EM algorithm to improve starting PWMs, to guide the evolution of a population of spaced dyads toward one whose entropy scores are more statistically significant. Spaced dyads whose entropy scores reach a pre-specified significance threshold are declared motifs. GADEM performed comparably with MEME on 500 sets of simulated "ChIP" sequences with embedded known P53 binding sites. The major advantage of GADEM is its computational efficiency on large ChIP datasets compared to competitors. We applied GADEM to six genome-wide ChIP datasets. Approximately, 15 to 30 motifs of various lengths were identified in each dataset. Remarkably, without any prior motif information, the expected known motif (e.g., P53 in P53 data) was identified every time. GADEM discovered motifs of various lengths (6-40 bp) and characteristics in these datasets containing from 0.5 to >13 million nucleotides with run times of 5 to 96 h. GADEM can be viewed as an extension of the well-known MEME algorithm and is an efficient tool for de novo motif discovery in large-scale genome-wide data. The GADEM software is available at (www.niehs.nih.gov/research/resources/software/GADEM/).

1 Follower
 · 
263 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The majority of neural stem cells (NSCs) in the adult brain are quiescent, and this fraction increases with aging. Although signaling pathways that promote NSC quiescence have been identified, the transcriptional mechanisms involved are mostly unknown, largely due to lack of a cell culture model. In this study, we first demonstrate that NSC cultures (NS cells) exposed to BMP4 acquire cellular and transcriptional characteristics of quiescent cells. We then use epigenomic profiling to identify enhancers associated with the quiescent NS cell state. Motif enrichment analysis of these enhancers predicts a major role for the nuclear factor one (NFI) family in the gene regulatory network controlling NS cell quiescence. Interestingly, we found that the family member NFIX is robustly induced when NS cells enter quiescence. Using genome-wide location analysis and overexpression and silencing experiments, we demonstrate that NFIX has a major role in the induction of quiescence in cultured NSCs. Transcript profiling of NS cells overexpressing or silenced for Nfix and the phenotypic analysis of the hippocampus of Nfix mutant mice suggest that NFIX controls the quiescent state by regulating the interactions of NSCs with their microenvironment.
    Genes & development 08/2013; 27(16):1769-86. DOI:10.1101/gad.216804.113 · 12.64 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA methylation patterns are important for establishing cell, tissue, and organism phenotypes, but little is known about their contribution to natural human variation. To determine their contribution to variability, we have generated genome-scale DNA methylation profiles of three human populations (Caucasian-American, African-American, and Han Chinese-American) and examined the differentially methylated CpG sites. The distinctly methylated genes identified suggest an influence of DNA methylation on phenotype differences, such as susceptibility to certain diseases and pathogens, and response to drugs and environmental agents. DNA methylation differences can be partially traced back to genetic variation, suggesting that differentially methylated CpG sites serve as evolutionarily established mediators between the genetic code and phenotypic variability. Notably, one-third of the DNA methylation differences were not associated with any genetic variation, suggesting that variation in population-specific sites takes place at the genetic and epigenetic levels, highlighting the contribution of epigenetic modification to natural human variation.
    Genome Research 08/2013; DOI:10.1101/gr.154187.112 · 13.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A major role of the RNAi pathway in Schizosaccharomyces pombe is to nucleate heterochromatin, but it remains unclear whether this mechanism is conserved. To address this question in Drosophila, we performed genome-wide localization of Argonaute2 (AGO2) by chromatin immunoprecipitation (ChIP)-seq in two different embryonic cell lines and found that AGO2 localizes to euchromatin but not heterochromatin. This localization pattern is further supported by immunofluorescence staining of polytene chromosomes and cell lines, and these studies also indicate that a substantial fraction of AGO2 resides in the nucleus. Intriguingly, AGO2 colocalizes extensively with CTCF/CP190 chromatin insulators but not with genomic regions corresponding to endogenous siRNA production. Moreover, AGO2, but not its catalytic activity or Dicer-2, is required for CTCF/CP190-dependent Fab-8 insulator function. AGO2 interacts physically with CTCF and CP190, and depletion of either CTCF or CP190 results in genome-wide loss of AGO2 chromatin association. Finally, mutation of CTCF, CP190, or AGO2 leads to reduction of chromosomal looping interactions, thereby altering gene expression. We propose that RNAi-independent recruitment of AGO2 to chromatin by insulator proteins promotes the definition of transcriptional domains throughout the genome.
    Genes & development 08/2011; 25(16):1686-701. DOI:10.1101/gad.16651211 · 12.64 Impact Factor

Preview

Download
3 Downloads
Available from