-
Mark B Gerstein,
Zhi John Lu,
Eric L Van Nostrand,
Chao Cheng,
Bradley I Arshinoff,
Tao Liu,
Kevin Y Yip,
Rebecca Robilotto,
Andreas Rechtsteiner,
Kohta Ikegami, [......],
X Shirley Liu,
Valerie Reinke,
Stuart K Kim,
LaDeana W Hillier,
Steven Henikoff,
Fabio Piano,
Michael Snyder,
Lincoln Stein,
Jason D Lieb,
Robert H Waterston
[show abstract]
[hide abstract]
ABSTRACT: We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
Science 12/2010; 330(6012):1775-87. · 31.20 Impact Factor
-
Gary Temple,
Daniela S Gerhard,
Rebekah Rasooly, Elise A Feingold,
Peter J Good,
Cristen Robinson,
Allison Mandich,
Jeffrey G Derge,
Jeanne Lewis,
Debonny Shoaf, [......],
Ralf Wagner,
Stanley Letovksy,
Jacqueline C Pulido,
Keith Robison,
Dominic Esposito,
James Hartley,
Vanessa E Wall,
Ralph F Hopkins,
Osamu Ohara,
Stefan Wiemann
[show abstract]
[hide abstract]
ABSTRACT: Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.
Genome Research 09/2009; 19(12):2324-33. · 13.61 Impact Factor
-
Ewan Birney,
John A Stamatoyannopoulos,
Anindya Dutta,
Roderic Guigó,
Thomas R Gingeras,
Elliott H Margulies,
Zhiping Weng,
Michael Snyder,
Emmanouil T Dermitzakis,
Robert E Thurman, [......],
David B Jaffe,
Jean L Chang,
Kerstin Lindblad-Toh,
Eric S Lander,
Maxim Koriabine,
Mikhail Nefedov,
Kazutoyo Osoegawa,
Yuko Yoshinaga,
Baoli Zhu,
Pieter J de Jong
[show abstract]
[hide abstract]
ABSTRACT: We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Nature 07/2007; 447(7146):799-816. · 36.28 Impact Factor
-
Ewan Birney,
John A. Stamatoyannopoulos,
Anindya Dutta,
Roderic Guig|[oacute,
Thomas R. Gingeras,
Elliott H. Margulies,
Zhiping Weng,
Michael Snyder,
Emmanouil T. Dermitzakis,
Robert E. Thurman, [......],
David B. Jaffe,
Jean L. Chang,
Kerstin Lindblad-Toh,
Eric S. Lander,
Maxim Koriabine,
Mikhail Nefedov,
Kazutoyo Osoegawa,
Yuko Yoshinaga,
Baoli Zhu,
Pieter J. de Jong
[show abstract]
[hide abstract]
ABSTRACT: We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Nature 06/2007; 447(7146):799-816. · 36.28 Impact Factor
-
Ewan Birney,
John A. Stamatoyannopoulos,
Anindya Dutta,
Roderic Guig|[oacute,
Thomas R. Gingeras,
Elliott H. Margulies,
Zhiping Weng,
Michael Snyder,
Emmanouil T. Dermitzakis,
Robert E. Thurman, [......],
David B. Jaffe,
Jean L. Chang,
Kerstin Lindblad-Toh,
Eric S. Lander,
Maxim Koriabine,
Mikhail Nefedov,
Kazutoyo Osoegawa,
Yuko Yoshinaga,
Baoli Zhu,
Pieter J. de Jong
[show abstract]
[hide abstract]
ABSTRACT: We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Nature 06/2007; 447(7146):799-816. · 36.28 Impact Factor
-
Ewan Birney,
John A. Stamatoyannopoulos,
Anindya Dutta,
Roderic Guigó,
Thomas R. Gingeras,
Elliott H. Margulies,
Zhiping Weng,
Michael Snyder,
Emmanouil T. Dermitzakis,
Robert E. Thurman, [......],
David B. Jaffe,
Jean L. Chang,
Kerstin Lindblad-Toh,
Eric S. Lander,
Maxim Koriabine,
Mikhail Nefedov,
Kazutoyo Osoegawa,
Yuko Yoshinaga,
Baoli Zhu,
Pieter J. de Jong
[show abstract]
[hide abstract]
ABSTRACT: We report the generation and analysis of functional data from multiple,
diverse experiments performed on a targeted 1% of the human genome as
part of the pilot phase of the ENCODE Project. These data have been
further integrated and augmented by a number of evolutionary and
computational analyses. Together, our results advance the collective
knowledge about human genome function in several major areas. First, our
studies provide convincing evidence that the genome is pervasively
transcribed, such that the majority of its bases can be found in primary
transcripts, including non-protein-coding transcripts, and those that
extensively overlap one another. Second, systematic examination of
transcriptional regulation has yielded new understanding about
transcription start sites, including their relationship to specific
regulatory sequences and features of chromatin accessibility and histone
modification. Third, a more sophisticated view of chromatin structure
has emerged, including its inter-relationship with DNA replication and
transcriptional regulation. Finally, integration of these new sources of
information, in particular with respect to mammalian evolution based on
inter- and intra-species sequence comparisons, has yielded new
mechanistic and evolutionary insights concerning the functional
landscape of the human genome. Together, these studies are defining a
path for pursuit of a more comprehensive characterization of human
genome function.
Nature 05/2007; 447:799-816. · 36.28 Impact Factor
-
Daniela S Gerhard,
Lukas Wagner, Elise A Feingold,
Carolyn M Shenmen,
Lynette H Grouse,
Greg Schuler,
Steven L Klein,
Susan Old,
Rebekah Rasooly,
Peter Good, [......],
Angelique Schnerch,
Jacqueline E Schein,
Steven J M Jones,
Robert A Holt,
Agnes Baross,
Marco A Marra,
Sandra Clifton,
Kathryn A Makowski,
Stephanie Bosak,
Joel Malek
[show abstract]
[hide abstract]
ABSTRACT: The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5'-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline.
Genome Research 11/2004; 14(10B):2121-7. · 13.61 Impact Factor
-
Robert L Strausberg, Elise A Feingold,
Lynette H Grouse,
Jeffery G Derge,
Richard D Klausner,
Francis S Collins,
Lukas Wagner,
Carolyn M Shenmen,
Gregory D Schuler,
Stephen F Altschul, [......],
Jeremy Schmutz,
Richard M Myers,
Yaron S N Butterfield,
Martin I Krzywinski,
Ursula Skalska,
Duane E Smailus,
Angelique Schnerch,
Jacqueline E Schein,
Steven J M Jones,
Marco A Marra
[show abstract]
[hide abstract]
ABSTRACT: The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene. ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones, which then were sequenced to high accuracy. The MGC has currently sequenced and verified the full ORF for a nonredundant set of >9,000 human and >6,000 mouse genes. Candidate full-ORF clones for an additional 7,800 human and 3,500 mouse genes also have been identified. All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http:mgc.nci.nih.gov).
Proceedings of the National Academy of Sciences 12/2002; 99(26):16899-903. · 9.68 Impact Factor
-
Mark Gerstein,
Zhi John Lu,
Eric L Van Nostrand,
Chao Cheng,
Bradley I Arshinoff,
Tao Liu,
Kevin Y Yip,
Rebecca Robilotto,
Andreas Rechtsteiner,
Kohta Ikegami, [......],
X Shirley Liu,
Valerie Reinke,
Stuart K Kim,
Ladeana W Hillier,
Steven Henikoff,
Fabio Piano,
Michael Snyder,
Lincoln Stein,
Jason D Lieb,
Robert H Waterston
[show abstract]
[hide abstract]
ABSTRACT: We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
Science (New York, N.Y.), v.330, 1775-1787 (2010).
-
Ewan Birney,
John A Stamatoyannopoulos,
Anindya Dutta,
Roderic Guigó,
Thomas R Gingeras,
Elliott H Margulies,
Zhiping Weng,
Michael Snyder,
Emmanouil T Dermitzakis,
Robert E Thurman, [......],
David B Jaffe,
Jean L Chang,
Kerstin Lindblad-Toh,
Eric S Lander,
Maxim Koriabine,
Mikhail Nefedov,
Kazutoyo Osoegawa,
Yuko Yoshinaga,
Baoli Zhu,
Pieter J Jong
[show abstract]
[hide abstract]
ABSTRACT: We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Nature. 447(7146):799-816.