Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Nature (Impact Factor: 42.35). 07/2007; 447(7146):799-816. DOI: 10.1038/nature05874
Source: PubMed

ABSTRACT We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

  • [Show abstract] [Hide abstract]
    ABSTRACT: DNase I hypersensitive sites (DHSs) define the accessible chromatin landscape and have revolutionised the discovery of distinct cis-regulatory elements in diverse organisms. Here, we report the first comprehensive map of human transcription factor binding site (TFBS)-clustered regions using Gaussian kernel density estimation based on genome-wide mapping of the TFBSs in 133 human cell and tissue types. Approximately 1.6 million distinct TFBS-clustered regions, collectively spanning 27.7% of the human genome, were discovered. The TFBS complexity assigned to each TFBS-clustered region was highly correlated with genomic location, cell selectivity, evolutionary conservation, sequence features, and functional roles. An integrative analysis of these regions using ENCODE data revealed transcription factor occupancy, transcriptional activity, histone modification, DNA methylation, and chromatin structures that varied based on TFBS complexity. Furthermore, we found that we could recreate lineage-branching relationships by simple clustering of the TFBS-clustered regions from terminally differentiated cells. Based on these findings, a model of transcriptional regulation determined by TFBS complexity is proposed.
    Scientific Reports 02/2015; 5:8465. DOI:10.1038/srep08465 · 5.08 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Long intergenic non-coding RNAs (lincRNAs) play important roles in many cellular processes. Here, we present the first systematic identification and characterization of lincRNAs in fetal porcine skeletal muscle. We obtained a total of 55.02 million 90-bp paired-end reads and assembled 54,550 transcripts using cufflinks. We developed a pipeline to identify 570 multi-exon lincRNAs by integrating a set of previous approaches. These putative porcine lincRNAs share many characteristics with mammalian lincRNAs, such as a relatively short length, small number of exons and low level of sequence conservation. We found that the porcine lincRNAs were preferentially located near genes mediating transcriptional regulation rather than those with developmental functions. We further experimentally analyzed the features of a conserved mouse lincRNA gene and found that isoforms 1 and 4 of this lincRNA were enriched in the cell nucleus and were associated with polycomb repressive complex 2 (PRC2). Our results provide a catalog of fetal porcine lincRNAs for further experimental investigation of the functions of these genes in the skeletal muscle developmental process.
    Scientific Reports 03/2015; 5:8957. DOI:10.1038/srep08957 · 5.08 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The three-dimensional (3D) architecture of the mammalian nucleus is now being unraveled thanks to the recent development of chromatin conformation capture (3C) technologies. Here we report the results of a combined multiscale analysis of genome-wide mean replication timing and chromatin conformation data that reveal some intimate relationships between chromatin folding and human DNA replication. We previously described megabase replication N/U-domains as mammalian multiorigin replication units, and showed that their borders are ‘master’ replication initiation zones that likely initiate cascades of origin firing responsible for the stereotypic replication of these domains. Here, we demonstrate that replication N/U-domains correspond to the structural domains of self-interacting chromatin, and that their borders act as insulating regions both in high-throughput 3C (Hi-C) data and high-resolution 3C (4C) experiments. Further analyses of Hi-C data using a graph-theoretical approach reveal that N/U-domain borders are long-distance, interconnected hubs of the chromatin interaction network. Overall, these results and the observation that a well-defined ordering of chromatin states exists from N/U-domain borders to centers suggest that ‘master’ replication initiation zones are at the heart of a high-order, epigenetically controlled 3D organization of the human genome.
    New Journal of Physics 11/2014; 16(11). DOI:10.1088/1367-2630/16/11/115014 · 3.67 Impact Factor

Full-text (6 Sources)

Available from
Aug 25, 2014