-
Philippe Lamesch,
Tanya Z Berardini,
Donghui Li,
David Swarbreck,
Christopher Wilks,
Rajkumar Sasidharan,
Robert Muller,
Kate Dreher,
Debbie L Alexander,
Margarita Garcia-Hernandez,
Athikkattuvalasu S Karthikeyan,
Cynthia H Lee,
William D Nelson,
Larry Ploetz,
Shanker Singh,
April Wensel,
Eva Huala
[show abstract]
[hide abstract]
ABSTRACT: The Arabidopsis Information Resource (TAIR, http://arabidopsis.org) is a genome database for Arabidopsis thaliana, an important reference organism for many fundamental aspects of biology as well as basic and applied plant biology research. TAIR serves as a central access point for Arabidopsis data, annotates gene function and expression patterns using controlled vocabulary terms, and maintains and updates the A. thaliana genome assembly and annotation. TAIR also provides researchers with an extensive set of visualization and analysis tools. Recent developments include several new genome releases (TAIR8, TAIR9 and TAIR10) in which the A. thaliana assembly was updated, pseudogenes and transposon genes were re-annotated, and new data from proteomics and next generation transcriptome sequencing were incorporated into gene models and splice variants. Other highlights include progress on functional annotation of the genome and the release of several new tools including Textpresso for Arabidopsis which provides the capability to carry out full text searches on a large body of research literature.
Nucleic Acids Research 12/2011; 40(Database issue):D1202-10. · 8.03 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The Arabidopsis Information Resource (TAIR; http://arabidopsis.org) is a comprehensive Web resource of Arabidopsis biology for plant scientists. TAIR curates and integrates information about genes, proteins, gene function, gene expression, mutant phenotypes, biological materials such as clones and seed stocks, genetic markers, genetic and physical maps, biochemical pathways, genome organization, images of mutant plants, protein sub-cellular localizations, publications, and the research community. The various data types are extensively interconnected and can be accessed through a variety of Web-based search and display tools. This unit primarily focuses on some basic methods for searching, browsing, visualizing, and analyzing information about Arabidopsis genes and describes several new tools such as a new TAIR genome browser (GBrowse), and the TAIR synteny viewer (GBrowse_syn). We also describe how to use AraCyc for mining plant metabolic pathways.
Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] 06/2010; Chapter 1:Unit1.11.
-
David Swarbreck,
Christopher Wilks, Philippe Lamesch,
Tanya Z Berardini,
Margarita Garcia-Hernandez,
Hartmut Foerster,
Donghui Li,
Tom Meyer,
Robert Muller,
Larry Ploetz,
Amie Radenbaugh,
Shanker Singh,
Vanessa Swing,
Christophe Tissier,
Peifen Zhang,
Eva Huala
[show abstract]
[hide abstract]
ABSTRACT: The Arabidopsis Information Resource (TAIR, http://arabidopsis.org) is the model organism database for the fully sequenced and intensively studied model plant Arabidopsis thaliana. Data in TAIR is derived in large part from manual curation of the Arabidopsis research literature and direct submissions from the research community. New developments at TAIR include the addition of the GBrowse genome viewer to the TAIR site, a redesigned home page, navigation structure and portal pages to make the site more intuitive and easier to use, the launch of several TAIR web services and a new genome annotation release (TAIR7) in April 2007. A combination of manual and computational methods were used to generate this release, which contains 27,029 protein-coding genes, 3889 pseudogenes or transposable elements and 1123 ncRNAs (32,041 genes in all, 37,019 gene models). A total of 681 new genes and 1002 new splice variants were added. Overall, 10,098 loci (one-third of all loci from the previous TAIR6 release) were updated for the TAIR7 release.
Nucleic Acids Research 02/2008; 36(Database issue):D1009-14. · 8.03 Impact Factor
-
Philippe Lamesch,
Ning Li,
Stuart Milstein,
Changyu Fan,
Tong Hao,
Gabor Szabo,
Zhenjun Hu,
Kavitha Venkatesan,
Graeme Bethel,
Paul Martin,
Jane Rogers,
Stephanie Lawlor,
Stuart McLaren,
Amélie Dricot,
Heather Borick,
Michael E Cusick,
Jean Vandenhaute,
Ian Dunham,
David E Hill,
Marc Vidal
[show abstract]
[hide abstract]
ABSTRACT: Complete sets of cloned protein-encoding open reading frames (ORFs), or ORFeomes, are essential tools for large-scale proteomics and systems biology studies. Here we describe human ORFeome version 3.1 (hORFeome v3.1), currently the largest publicly available resource of full-length human ORFs (available at ). Generated by Gateway recombinational cloning, this collection contains 12,212 ORFs, representing 10,214 human genes, and corresponds to a 51% expansion of the original hORFeome v1.1. An online human ORFeome database, hORFDB, was built and serves as the central repository for all cloned human ORFs (http://horfdb.dfci.harvard.edu). This expansion of the original ORFeome resource greatly increases the potential experimental search space for large-scale proteomics studies, which will lead to the generation of more comprehensive datasets.
Genomics 04/2007; 89(3):307-15. · 3.02 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: cDNA clones have long been valuable reagents for studying the structure and function of proteins. With recent access to the entire human genome sequence, it has become possible and highly productive to compare the sequences of mRNAs to their genes, in order to validate the sequences and protein-coding annotations of each (1,2). Thus, well-characterized collections of human cDNAs are now playing an essential role in defining the structure and function of human genes and proteins. In this review, we will summarize the major collections of human cDNA clones, discuss some limitations common to most of these collections and describe several noteworthy proteomics applications, focusing on the detection and analysis of protein-protein interactions (PPI). These human cDNA collections contain principally two types of cDNA clones. The largest collections comprise cDNAs with full-length protein coding sequences (FL-CDS). Some but not all of these cDNA clones may represent the entire mRNA sequence, but many are missing considerable non-coding UTR sequence, usually at the 5' end. A second type of cDNA clone, a 'full-ORF' (F-ORF) expression clone, is one where the annotated protein-coding sequence, excised of 5' UTR and 3' UTR sequence, has been transferred to a vector designed to facilitate transfer to other vectors for protein expression.
Human Molecular Genetics 05/2006; 15 Spec No 1:R31-43. · 7.64 Impact Factor
-
Jean-François Rual,
Kavitha Venkatesan,
Tong Hao,
Tomoko Hirozane-Kishikawa,
Amélie Dricot,
Ning Li,
Gabriel F Berriz,
Francis D Gibbons,
Matija Dreze,
Nono Ayivi-Guedehoussou, [......],
Jean Vandenhaute,
Huda Y Zoghbi,
Alex Smolyar,
Stephanie Bosak,
Reynaldo Sequerra,
Lynn Doucette-Stamm,
Michael E Cusick,
David E Hill,
Frederick P Roth,
Marc Vidal
[show abstract]
[hide abstract]
ABSTRACT: Systematic mapping of protein-protein interactions, or 'interactome' mapping, was initiated in model organisms, starting with defined biological processes and then expanding to the scale of the proteome. Although far from complete, such maps have revealed global topological and dynamic features of interactome networks that relate to known biological properties, suggesting that a human interactome map will provide insight into development and disease mechanisms at a systems level. Here we describe an initial version of a proteome-scale map of human binary protein-protein interactions. Using a stringent, high-throughput yeast two-hybrid system, we tested pairwise interactions among the products of approximately 8,100 currently available Gateway-cloned open reading frames and detected approximately 2,800 interactions. This data set, called CCSB-HI1, has a verification rate of approximately 78% as revealed by an independent co-affinity purification assay, and correlates significantly with other biological attributes. The CCSB-HI1 data set increases by approximately 70% the set of available binary interactions within the tested space and reveals more than 300 new connections to over 100 disease-associated proteins. This work represents an important step towards a systematic and comprehensive human interactome project.
Nature 11/2005; 437(7062):1173-8. · 36.28 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The genome of Caenorhabditis elegans was the first animal genome to be sequenced. Although considerable effort has been devoted to annotating it, the standard WormBase annotation contains thousands of predicted genes for which there is no cDNA or EST evidence. We hypothesized that a more complete experimental annotation could be obtained by creating a more accurate gene-prediction program and then amplifying and sequencing predicted genes. Our approach was to adapt the TWINSCAN gene prediction system to C. elegans and C. briggsae and to improve its splice site and intron-length models. The resulting system has 60% sensitivity and 58% specificity in exact prediction of open reading frames (ORFs), and hence, proteins-the best results we are aware of any multicellular organism. We then attempted to amplify, clone, and sequence 265 TWINSCAN-predicted ORFs that did not overlap WormBase gene annotations. The success rate was 55%, adding 146 genes that were completely absent from WormBase to the ORF clone collection (ORFeome). The same procedure had a 7% success rate on 90 Worm Base "predicted" genes that do not overlap TWINSCAN predictions. These results indicate that the accuracy of WormBase could be significantly increased by replacing its partially curated predicted genes with TWINSCAN predictions. The technology described in this study will continue to drive the C. elegans ORFeome toward completion and contribute to the annotation of the three Caenorhabditis species currently being sequenced. The results also suggest that this technology can significantly improve our knowledge of the "parts list" for even the best-studied model organisms.
Genome Research 05/2005; 15(4):577-82. · 13.61 Impact Factor
-
Amélie Dricot,
Jean-François Rual, Philippe Lamesch,
Nicolas Bertin,
Denis Dupuy,
Tong Hao,
Christophe Lambert,
Régis Hallez,
Jean-Marc Delroisse,
Jean Vandenhaute, [......],
Alastair P Macmillan,
Sally J Cutler,
Adrian M Whatmore,
Stephanie Bozak,
Reynaldo Sequerra,
Lynn Doucette-Stamm,
Marc Vidal,
David E Hill,
Jean-Jacques Letesson,
Xavier De Bolle
[show abstract]
[hide abstract]
ABSTRACT: The bacteria of the Brucella genus are responsible for a worldwide zoonosis called brucellosis. They belong to the alpha-proteobacteria group, as many other bacteria that live in close association with a eukaryotic host. Importantly, the Brucellae are mainly intracellular pathogens, and the molecular mechanisms of their virulence are still poorly understood. Using the complete genome sequence of Brucella melitensis, we generated a database of protein-coding open reading frames (ORFs) and constructed an ORFeome library of 3091 Gateway Entry clones, each containing a defined ORF. This first version of the Brucella ORFeome (v1.1) provides the coding sequences in a user-friendly format amenable to high-throughput functional genomic and proteomic experiments, as the ORFs are conveniently transferable from the Entry clones to various Expression vectors by recombinational cloning. The cloning of the Brucella ORFeome v1.1 should help to provide a better understanding of the molecular mechanisms of virulence, including the identification of bacterial protein-protein interactions, but also interactions between bacterial effectors and their host's targets.
Genome Research 11/2004; 14(10B):2201-6. · 13.61 Impact Factor
-
Denis Dupuy,
Qian-Ru Li,
Bart Deplancke,
Mike Boxem,
Tong Hao, Philippe Lamesch,
Reynaldo Sequerra,
Stephanie Bosak,
Lynn Doucette-Stamm,
Ian A Hope,
David E Hill,
Albertha J M Walhout,
Marc Vidal
[show abstract]
[hide abstract]
ABSTRACT: An important aspect of the development of systems biology approaches in metazoans is the characterization of expression patterns of nearly all genes predicted from genome sequences. Such "localizome" maps should provide information on where (in what cells or tissues) and when (at what stage of development or under what conditions) genes are expressed. They should also indicate in what cellular compartments the corresponding proteins are localized. Caenorhabditis elegans is particularly suited for the development of a localizome map since all its 959 adult somatic cells can be visualized by microscopy, and its cell lineage has been completely described. Here we address one of the challenges of C. elegans localizome mapping projects: that of obtaining a genome-wide resource of C. elegans promoters needed to generate transgenic animals expressing localization markers such as the green fluorescent protein (GFP). To ensure high flexibility for future uses, we utilized the newly developed MultiSite Gateway system. We generated and validated "version 1.1" of the Promoterome: a resource of approximately 6000 C. elegans promoters. These promoters can be transferred easily into various Gateway Destination vectors to drive expression of markers such as GFP, alone (promoter::GFP constructs), or in fusion with protein-encoding open reading frames available in ORFeome resources (promoter::ORF::GFP).
Genome Research 11/2004; 14(10B):2169-75. · 13.61 Impact Factor
-
Philippe Lamesch,
Stuart Milstein,
Tong Hao,
Jennifer Rosenberg,
Ning Li,
Reynaldo Sequerra,
Stephanie Bosak,
Lynn Doucette-Stamm,
Jean Vandenhaute,
David E Hill,
Marc Vidal
[show abstract]
[hide abstract]
ABSTRACT: The first version of the Caenorhabditis elegans ORFeome cloning project, based on release WS9 of Wormbase (August 1999), provided experimental verifications for approximately 55% of predicted protein-encoding open reading frames (ORFs). The remaining 45% of predicted ORFs could not be cloned, possibly as a result of mispredicted gene boundaries. Since the release of WS9, gene predictions have improved continuously. To test the accuracy of evolving predictions, we attempted to PCR-amplify from a highly representative worm cDNA library and Gateway-clone approximately 4200 ORFs missed earlier and for which new predictions are available in WS100 (May 2003). In this set we successfully cloned 63% of ORFs with supporting experimental data ("touched" ORFs), and 42% of ORFs with no supporting experimental evidence ("untouched" ORFs). Approximately 2000 full-length ORFs were cloned in-frame, 13% of which were corrected in their exon/intron structure relative to WS100 predictions. In total, approximately 12,500 C. elegans ORFs are now available as Gateway Entry clones for various reverse proteomics (ORFeome v3.1). This work illustrates why the cloning of a complete C. elegans ORFeome, and likely the ORFeomes of other multicellular organisms, needs to be an iterative process that requires multiple rounds of experimental validation together with gradually improving gene predictions.
Genome Research 11/2004; 14(10B):2064-9. · 13.61 Impact Factor
-
Jean-François Rual,
Tomoko Hirozane-Kishikawa,
Tong Hao,
Nicolas Bertin,
Siming Li,
Amélie Dricot,
Ning Li,
Jennifer Rosenberg, Philippe Lamesch,
Pierre-Olivier Vidalain, [......],
Blake Simmons,
Reynaldo Sequerra,
Stephanie Bosak,
Lynn Doucette-Stamm,
Christian Le Peuch,
Jean Vandenhaute,
Michael E Cusick,
Joanna S Albala,
David E Hill,
Marc Vidal
[show abstract]
[hide abstract]
ABSTRACT: The advent of systems biology necessitates the cloning of nearly entire sets of protein-encoding open reading frames (ORFs), or ORFeomes, to allow functional studies of the corresponding proteomes. Here, we describe the generation of a first version of the human ORFeome using a newly improved Gateway recombinational cloning approach. Using the Mammalian Gene Collection (MGC) resource as a starting point, we report the successful cloning of 8076 human ORFs, representing at least 7263 human genes, as mini-pools of PCR-amplified products. These were assembled into the human ORFeome version 1.1 (hORFeome v1.1) collection. After assessing the overall quality of this version, we describe the use of hORFeome v1.1 for heterologous protein expression in two different expression systems at proteome scale. The hORFeome v1.1 represents a central resource for the cloning of large sets of human ORFs in various settings for functional proteomics of many types, and will serve as the foundation for subsequent improved versions of the human ORFeome.
Genome Research 11/2004; 14(10B):2128-35. · 13.61 Impact Factor
-
Siming Li,
Christopher M Armstrong,
Nicolas Bertin,
Hui Ge,
Stuart Milstein,
Mike Boxem,
Pierre-Olivier Vidalain,
Jing-Dong J Han,
Alban Chesneau,
Tong Hao, [......],
Jean Vandenhaute,
Claude Sardet,
Mark Gerstein,
Lynn Doucette-Stamm,
Kristin C Gunsalus,
J Wade Harper,
Michael E Cusick,
Frederick P Roth,
David E Hill,
Marc Vidal
[show abstract]
[hide abstract]
ABSTRACT: To initiate studies on how protein-protein interaction (or "interactome") networks relate to multicellular functions, we have mapped a large fraction of the Caenorhabditis elegans interactome network. Starting with a subset of metazoan-specific proteins, more than 4000 interactions were identified from high-throughput, yeast two-hybrid (HT=Y2H) screens. Independent coaffinity purification assays experimentally validated the overall quality of this Y2H data set. Together with already described Y2H interactions and interologs predicted in silico, the current version of the Worm Interactome (WI5) map contains approximately 5500 interactions. Topological and biological features of this interactome network, as well as its integration with phenome and transcriptome data sets, lead to numerous biological hypotheses.
Science 02/2004; 303(5657):540-3. · 31.20 Impact Factor
-
Jérôme Reboul,
Philippe Vaglio,
Jean-François Rual, Philippe Lamesch,
Monica Martinez,
Christopher M Armstrong,
Siming Li,
Laurent Jacotot,
Nicolas Bertin,
Rekin's Janky, [......],
Vasilis Papasotiropoulos,
Peter P Tolias,
Jason Ptacek,
Mike Snyder,
Raymond Huang,
Mark R Chance,
Hongmei Lee,
Lynn Doucette-Stamm,
David E Hill,
Marc Vidal
[show abstract]
[hide abstract]
ABSTRACT: To verify the genome annotation and to create a resource to functionally characterize the proteome, we attempted to Gateway-clone all predicted protein-encoding open reading frames (ORFs), or the 'ORFeome,' of Caenorhabditis elegans. We successfully cloned approximately 12,000 ORFs (ORFeome 1.1), of which roughly 4,000 correspond to genes that are untouched by any cDNA or expressed-sequence tag (EST). More than 50% of predicted genes needed corrections in their intron-exon structures. Notably, approximately 11,000 C. elegans proteins can now be expressed under many conditions and characterized using various high-throughput strategies, including large-scale interactome mapping. We suggest that similar ORFeome projects will be valuable for other organisms, including humans.
Nature Genetics 06/2003; 34(1):35-41. · 35.53 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: WorfDB (Worm ORFeome DataBase; http://worfdb.dfci.harvard.edu) was created to integrate and disseminate the data from the cloning of complete set of approximately 19 000 predicted protein-encoding Open Reading Frames (ORFs) of Caenorhabditis elegans (also referred to as the 'worm ORFeome'). WorfDB serves as a central data repository enabling the scientific community to search for availability and quality of cloned ORFs. So far, ORF sequence tags (OSTs) obtained for all individual clones have allowed exon structure corrections for approximately 3400 ORFs originally predicted by the C. elegans sequencing consortium. In addition, we now have OSTs for approximately 4300 predicted genes for which no ESTs were available. The database contains this OST information along with data pertinent to the cloning process. WorfDB could serve as a model database for other metazoan ORFeome cloning projects.
Nucleic Acids Research 02/2003; 31(1):237-40. · 8.03 Impact Factor