[Show abstract][Hide abstract] ABSTRACT: Microbiota regulate intestinal physiology by modifying host gene expression along the length of the intestine but the underlying regulatory mechanisms remain unresolved. Transcriptional specificity occurs through interactions between transcription factors (TFs) and cis-regulatory regions (CRRs) characterized by nucleosome-depleted accessible chromatin. We profiled transcriptome and accessible chromatin landscapes in intestinal epithelial cells (IECs) from mice reared in the presence or absence of microbiota. We show that regional differences in gene transcription along the intestinal tract were accompanied by major alterations in chromatin accessibility. Surprisingly, we discovered that microbiota modify host gene transcription in IECs without significantly impacting the accessible chromatin landscape. Instead, microbiota regulation of host gene transcription might be achieved by differential expression of specific TFs and enrichment of their binding sites in nucleosome depleted CRRs near target genes. Our results suggest that the chromatin landscape in IECs is pre-programmed by the host in a region-specific manner to permit responses to microbiota through binding of open CRRs by specific TFs.
[Show abstract][Hide abstract] ABSTRACT: Identifying enhancers regulating gene expression remains an important and challenging task. While recent sequencing-based methods provide epigenomic characteristics that correlate well with enhancer activity, it remains onerous to comprehensively identify all enhancers across development. Here we introduce a computational framework to identify tissue-specific enhancers evolving under purifying selection. First, we incorporate high-confidence binding site predictions with target gene functional enrichment analysis to identify transcription factors (TFs) likely functioning in a particular context. We then search the genome for clusters of binding sites for these TFs, overcoming previous constraints associated with biased manual curation of TFs or enhancers. Applying our method to the placenta, we find 33 known and implicate 17 novel TFs in placental function, and discover 2,216 putative placenta enhancers. Using luciferase reporter assays, 31/36 (86%) tested candidates drive activity in placental cells. Our predictions agree well with recent epigenomic data in human and mouse, yet over half our loci, including 7/8 (87%) tested regions, are novel. Finally, we establish that our method is generalizable by applying it to 5 additional tissues: heart, pancreas, blood vessel, bone marrow, and liver.
Preview · Article · Jan 2014 · PLoS Computational Biology
[Show abstract][Hide abstract] ABSTRACT: Mapping the DNA-binding preferences of transcription factor (TF) complexes is critical for deciphering the functions of cis-regulatory elements. Here, we developed a computational method that compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid TF complexes. Structural data were used to estimate TF complex physical plausibility, explore overlapping motif arrangements seldom tackled by non-structure-aware methods, and generate and analyse three-dimensional models of the predicted complexes bound to DNA. Using this approach, we predicted 422 physically realistic TF complex motifs at 18% false discovery rate, the majority of which (326, 77%) contain some sequence overlap between binding sites. The set of mostly novel complexes is enriched in known composite motifs, predictive of binding site configurations in TF-TF-DNA crystal structures, and supported by ChIP-seq datasets. Structural modelling revealed three cooperativity mechanisms: direct protein-protein interactions, potentially indirect interactions and 'through-DNA' interactions. Indeed, 38% of the predicted complexes were found to contain four or more bases in which TF pairs appear to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. Our TF complex and associated binding site predictions are available as a web resource at http://bejerano.stanford.edu/complex.
No preview · Article · Nov 2013 · Philosophical Transactions of The Royal Society B Biological Sciences
[Show abstract][Hide abstract] ABSTRACT: Genetic studies have identified a core set of transcription factors and target genes that control the development of the neocortex, the region of the human brain responsible for higher cognition. The specific regulatory interactions between these factors, many key upstream and downstream genes, and the enhancers that mediate all these interactions remain mostly uncharacterized. We perform p300 ChIP-seq to identify over 6,600 candidate enhancers active in the dorsal cerebral wall of embryonic day 14.5 (E14.5) mice. Over 95% of the peaks we measure are conserved to human. Eight of ten (80%) candidates tested using mouse transgenesis drive activity in restricted laminar patterns within the neocortex. GREAT based computational analysis reveals highly significant correlation with genes expressed at E14.5 in key areas for neocortex development, and allows the grouping of enhancers by known biological functions and pathways for further studies. We find that multiple genes are flanked by dozens of candidate enhancers each, including well-known key neocortical genes as well as suspected and novel genes. Nearly a quarter of our candidate enhancers are conserved well beyond mammals. Human and zebrafish regions orthologous to our candidate enhancers are shown to most often function in other aspects of central nervous system development. Finally, we find strong evidence that specific interspersed repeat families have contributed potentially key developmental enhancers via co-option. Our analysis expands the methodologies available for extracting the richness of information found in genome-wide functional maps.
[Show abstract][Hide abstract] ABSTRACT: Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically
isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish,
a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the
distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis
beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework
to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled
local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply
our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our
zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs (http://zebrafish.stanford.edu) are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied
to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.
Full-text · Article · Jun 2013 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: The human genome encodes 1,500-2,000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high quality non-redundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (Predicting Regulatory Information from Single Motifs) approach obtains 2,543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.
[Show abstract][Hide abstract] ABSTRACT: The identification of homologies, whether morphological, molecular, or genetic, is fundamental to our understanding of common biological principles. Homologies bridging the great divide between deuterostomes and protostomes have served as the basis for current models of animal evolution and development. It is now appreciated that these two clades share a common developmental toolkit consisting of conserved transcription factors and signaling pathways. These patterning genes sometimes show common expression patterns and genetic interactions, suggesting the existence of similar or even conserved regulatory apparatus. However, previous studies have found no regulatory sequence conserved between deuterostomes and protostomes. Here we describe the first such enhancers, which we call bilaterian conserved regulatory elements (Bicores). Bicores show conservation of sequence and gene synteny. Sequence conservation of Bicores reflects conserved patterns of transcription factor binding sites. We predict that Bicores act as response elements to signaling pathways, and we show that Bicores are developmental enhancers that drive expression of transcriptional repressors in the vertebrate central nervous system. Although the small number of identified Bicores suggests extensive rewiring of cis-regulation between the protostome and deuterostome clades, additional Bicores may be revealed as our understanding of cis-regulatory logic and sample of bilaterian genomes continue to grow.
[Show abstract][Hide abstract] ABSTRACT: Mutations in the Sonic hedgehog limb enhancer, the zone of polarizing activity regulatory sequence (ZRS, located within the gene LMBR1), commonly called the ZRS), cause limb malformations. In humans, three classes of mutations have been proposed based on the limb phenotype; single base changes throughout the region cause preaxial polydactyly (PPD), single base changes at one specific site cause Werner mesomelic syndrome, and large duplications cause polysyndactyly. This study presents a novel mutation-a small insertion. In a Swedish family with autosomal-dominant PPD, we found a 13 base pair insertion within the ZRS, NG_009240.1:g.106934_106935insTAAGGAAGTGATT (traditional nomenclature: ZRS603ins13). Computational transcription factor-binding site predictions suggest that this insertion creates new binding sites and a mouse enhancer assay shows that this insertion causes ectopic gene expression. This study is the first to discover a small insertion in an enhancer that causes a human limb malformation and suggests a potential mechanism that could explain the ectopic expression caused by this mutation.
[Show abstract][Hide abstract] ABSTRACT: Enhancers are essential gene regulatory elements whose alteration can lead to morphological differences between species, developmental abnormalities, and human disease. Current strategies to identify enhancers focus primarily on noncoding sequences and tend to exclude protein coding sequences. Here, we analyzed 25 available ChIP-seq data sets that identify enhancers in an unbiased manner (H3K4me1, H3K27ac, and EP300) for peaks that overlap exons. We find that, on average, 7% of all ChIP-seq peaks overlap coding exons (after excluding for peaks that overlap with first exons). By using mouse and zebrafish enhancer assays, we demonstrate that several of these exonic enhancer (eExons) candidates can function as enhancers of their neighboring genes and that the exonic sequence is necessary for enhancer activity. Using ChIP, 3C, and DNA FISH, we further show that one of these exonic limb enhancers, Dync1i1 exon 15, has active enhancer marks and physically interacts with Dlx5/6 promoter regions 900 kb away. In addition, its removal by chromosomal abnormalities in humans could cause split hand and foot malformation 1 (SHFM1), a disorder associated with DLX5/6. These results demonstrate that DNA sequences can have a dual function, operating as coding exons in one tissue and enhancers of nearby gene(s) in another tissue, suggesting that phenotypes resulting from coding mutations could be caused not only by protein alteration but also by disrupting the regulation of another gene.
[Show abstract][Hide abstract] ABSTRACT: Genes expressed in the somatopleuric mesoderm, the embryonic domain giving rise to the vertebrate pelvis, appear important for pelvic girdle formation. Among such genes, Pbx family members and Emx2 were found to genetically interact in hindlimb and pectoral girdle formation. Here, we generated compound mutant embryos carrying combinations of mutated alleles for Pbx1, Pbx2, and Pbx3, as well as Pbx1 and Emx2, to examine potential genetic interactions during pelvic development. Indeed, Pbx genes share overlapping functions and Pbx1 and Emx2 genetically interact in pelvic formation. We show that, in compound Pbx1;Pbx2 and Pbx1;Emx2 mutants, pelvic mesenchymal condensation is markedly perturbed, indicative of an upstream control by these homeoproteins. We establish that expression of Tbx15, Prrx1, and Pax1, among other genes involved in the specification and development of select pelvic structures, is altered in our compound mutants. Lastly, we identify potential Pbx1-Emx2-regulated enhancers for Tbx15, Prrx1, and Pax1, using bioinformatics analyses.
Full-text · Article · May 2011 · Developmental Dynamics
[Show abstract][Hide abstract] ABSTRACT: Humans differ from other animals in many aspects of anatomy, physiology, and behaviour; however, the genotypic basis of most human-specific traits remains unknown. Recent whole-genome comparisons have made it possible to identify genes with elevated rates of amino acid change or divergent expression in humans, and non-coding sequences with accelerated base pair changes. Regulatory alterations may be particularly likely to produce phenotypic effects while preserving viability, and are known to underlie interesting evolutionary differences in other species. Here we identify molecular events particularly likely to produce significant regulatory changes in humans: complete deletion of sequences otherwise highly conserved between chimpanzees and other mammals. We confirm 510 such deletions in humans, which fall almost exclusively in non-coding regions and are enriched near genes involved in steroid hormone signalling and neural function. One deletion removes a sensory vibrissae and penile spine enhancer from the human androgen receptor (AR) gene, a molecular change correlated with anatomical loss of androgen-dependent sensory vibrissae and penile spines in the human lineage. Another deletion removes a forebrain subventricular zone enhancer near the tumour suppressor gene growth arrest and DNA-damage-inducible, gamma (GADD45G), a loss correlated with expansion of specific brain regions in humans. Deletions of tissue-specific enhancers may thus accompany both loss and gain traits in the human lineage, and provide specific examples of the kinds of regulatory alterations and inactivation events long proposed to have an important role in human evolutionary divergence.
[Show abstract][Hide abstract] ABSTRACT: We developed the Genomic Regions Enrichment of Annotations Tool (GREAT) to analyze the functional significance of cis-regulatory regions identified by localized measurements of DNA binding events across an entire genome. Whereas previous methods took into account only binding proximal to genes, GREAT is able to properly incorporate distal binding sites and control for false positives using a binomial test over the input genomic regions. GREAT incorporates annotations from 20 ontologies and is available as a web application. Applying GREAT to data sets from chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-seq) of multiple transcription-associated factors, including SRF, NRSF, GABP, Stat3 and p300 in different developmental contexts, we recover many functions of these factors that are missed by existing gene-based tools, and we generate testable hypotheses. The utility of GREAT is not limited to ChIP-seq, as it could also be applied to open chromatin, localized epigenomic markers and similar functional data sets, as well as comparative genomics sets.
Full-text · Article · May 2010 · Nature Biotechnology