Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Nature (Impact Factor: 42.35). 07/2007; 447(7146):799-816. DOI: 10.1038/nature05874
Source: PubMed

ABSTRACT We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Current ChIP-seq studies are interested in comparing multiple epigenetic profiles across several cell types and tissues simultaneously for studying constitutive and differential regulation. Simultaneous analysis of multiple epigenetic features in many samples can gain substantial power and specificity than analyzing individual features and/or samples separately. Yet there are currently few tools can perform joint inference of constitutive and differential regulation in multi-feature-multi-condition contexts with statistical testing. Existing tools either test regulatory variation for one factor in multiple samples at a time, or for multiple factors in one or two samples. Many of them only identify binary rather than quantitative variation, which are sensitive to threshold choices. We propose a novel and powerful method called dCaP for simultaneously detecting constitutive and differential regulation of multiple epigenetic factors in multiple samples. Using simulation, we demonstrate the superior power of dCaP compared to existing methods. We then apply dCaP to two datasets from human and mouse ENCODE projects to demonstrate its utility. We show in the human dataset that the cell-type specific regulatory loci detected by dCaP are significantly enriched near genes with cell-type specific functions and disease relevance. We further show in the mouse dataset that dCaP captures genomic regions showing significant signal variations for TAL1 occupancy between two mouse erythroid cell lines. The novel TAL1 occupancy loci detected only by dCaP are highly enriched with GATA1 occupancy and differential gene expression, while those detected only by other methods are not. Here, we developed a novel approach to utilize the cooperative property of proteins to detect differential binding given multivariate ChIP-seq samples to provide better power, aiming for complementing existing approaches and providing new insights in the method development in this field.
    BMC Genomics 12/2014; 15(Suppl 9):S12. · 4.04 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Long non-coding (lnc) RNAs are defined as non-protein coding RNAs distinct from housekeeping RNAs such as tRNAs, rRNAs, and snRNAs, and independent from small RNAs with specific molecular processing machinery such as micro- or piwi-RNAs. Recent studies of lncRNAs across different species have revealed a diverse population of RNA molecules of differing size and function. RNA sequencing studies suggest transcription throughout the genome, so there is a need to understand how sequence relates to functional and structural relationships amongst RNA molecules. Our synthesis of recent studies suggests that neither size, presence of a poly-A tail, splicing, direction of transcription, nor strand specificity are of importance to lncRNA function. Rather, relative genomic position in relation to a target is fundamentally important. In this review, we describe issues of key importance in functional assessment of lncRNA and how this might apply to lncRNAs important in neurodevelopment.
    Frontiers in Cellular Neuroscience 10/2013; 7. · 4.18 Impact Factor
    This article is viewable in ResearchGate's enriched format
  • [Show abstract] [Hide abstract]
    ABSTRACT: Hypoxia has been implicated as a crucial microenvironmental factor that induces cancer metastasis. We previously reported that hypoxia could promote gastric cancer (GC) metastasis, but the underlying mechanisms are not clear. Long noncoding RNAs (lncRNAs) have recently emerged as important regulators of carcinogenesis that act on multiple pathways. However, whether lncRNAs are involved in hypoxia-induced GC metastasis remains unknown. In this study, we investigated the differentially expressed lncRNAs resulting from hypoxia-induced GC and normoxia conditions using microarrays and validated our results through real-time quantitative polymerase chain reaction. We found an lncRNA, AK058003, that is upregulated by hypoxia. AK058003 is frequently upregulated in GC samples and promotes GC migration and invasion in vivo and in vitro. Furthermore, AK058003 can mediate the metastasis of hypoxia-induced GC cells. Next, we identified γ-synuclein (SNCG), which is a metastasis-related gene regulated by AK058003. In addition, we found that the expression of SNCG is positively correlated with that of AK058003 in the clinical GC samples used in our study. Furthermore, we found that the SNCG gene CpG island methylation was significantly increased in GC cells depleted of AK058003. Intriguingly, SNCG expression is also increased by hypoxia, and SNCG upregulation by AK058003 mediates hypoxia-induced GC cell metastasis. These results advance our understanding of the role of lncRNA-AK058003 as a regulator of hypoxia signaling, and this newly identified hypoxia/lncRNA-AK058003/SNCG pathway may help in the development of new therapeutics. Copyright © 2014 Neoplasia Press, Inc. Published by Elsevier Inc. All rights reserved.
    Neoplasia (New York, N.Y.) 12/2014; 16(12):1094-106. · 5.40 Impact Factor

Full-text (6 Sources)

Available from
Aug 25, 2014