[Show abstract][Hide abstract] ABSTRACT: GeneDB (http://www.genedb.org) is a genome database for prokaryotic and eukaryotic pathogens and closely related organisms. The resource provides a portal to genome sequence and annotation data, which is primarily generated by the Pathogen Genomics group at the Wellcome Trust Sanger Institute. It combines data from completed and ongoing genome projects with curated annotation, which is readily accessible from a web based resource. The development of the database in recent years has focused on providing database-driven annotation tools and pipelines, as well as catering for increasingly frequent assembly updates. The website has been significantly redesigned to take advantage of current web technologies, and improve usability. The current release stores 41 data sets, of which 17 are manually curated and maintained by biologists, who review and incorporate data from the scientific literature, as well as other sources. GeneDB is primarily a production and annotation database for the genomes of predominantly pathogenic organisms.
Nucleic Acids Research 11/2012; 40(Database issue):D98-108. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Antigenic variation enables pathogens to avoid the host immune response by continual switching of surface proteins. The protozoan blood parasite Trypanosoma brucei causes human African trypanosomiasis ("sleeping sickness") across sub-Saharan Africa and is a model system for antigenic variation, surviving by periodically replacing a monolayer of variant surface glycoproteins (VSG) that covers its cell surface. We compared the genome of Trypanosoma brucei with two closely related parasites Trypanosoma congolense and Trypanosoma vivax, to reveal how the variant antigen repertoire has evolved and how it might affect contemporary antigenic diversity. We reconstruct VSG diversification showing that Trypanosoma congolense uses variant antigens derived from multiple ancestral VSG lineages, whereas in Trypanosoma brucei VSG have recent origins, and ancestral gene lineages have been repeatedly co-opted to novel functions. These historical differences are reflected in fundamental differences between species in the scale and mechanism of recombination. Using phylogenetic incompatibility as a metric for genetic exchange, we show that the frequency of recombination is comparable between Trypanosoma congolense and Trypanosoma brucei but is much lower in Trypanosoma vivax. Furthermore, in showing that the C-terminal domain of Trypanosoma brucei VSG plays a crucial role in facilitating exchange, we reveal substantial species differences in the mechanism of VSG diversification. Our results demonstrate how past VSG evolution indirectly determines the ability of contemporary parasites to generate novel variant antigens through recombination and suggest that the current model for antigenic variation in Trypanosoma brucei is only one means by which these parasites maintain chronic infections.
Proceedings of the National Academy of Sciences 02/2012; 109(9):3416-21. · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Schistosomiasis is one of the most prevalent parasitic diseases, affecting millions of people in developing countries. Amongst the human-infective species, Schistosoma mansoni is also the most commonly used in the laboratory and here we present the systematic improvement of its draft genome. We used Sanger capillary and deep-coverage Illumina sequencing from clonal worms to upgrade the highly fragmented draft 380 Mb genome to one with only 885 scaffolds and more than 81% of the bases organised into chromosomes. We have also used transcriptome sequencing (RNA-seq) from four time points in the parasite's life cycle to refine gene predictions and profile their expression. More than 45% of predicted genes have been extensively modified and the total number has been reduced from 11,807 to 10,852. Using the new version of the genome, we identified trans-splicing events occurring in at least 11% of genes and identified clear cases where it is used to resolve polycistronic transcripts. We have produced a high-resolution map of temporal changes in expression for 9,535 genes, covering an unprecedented dynamic range for this organism. All of these data have been consolidated into a searchable format within the GeneDB (www.genedb.org) and SchistoDB (www.schistodb.net) databases. With further transcriptional profiling and genome sequencing increasingly accessible, the upgraded genome will form a fundamental dataset to underpin further advances in schistosome research.
[Show abstract][Hide abstract] ABSTRACT: PomBase (www.pombase.org) is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance.
Nucleic Acids Research 01/2012; 40(Database issue):D695-9. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: TriTrypDB (http://tritrypdb.org) is an integrated database providing access to genome-scale datasets for kinetoplastid parasites, and supporting a variety of complex queries driven by research and development needs. TriTrypDB is a collaborative project, utilizing the GUS/WDK computational infrastructure developed by the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) to integrate genome annotation and analyses from GeneDB and elsewhere with a wide variety of functional genomics datasets made available by members of the global research community, often pre-publication. Currently, TriTrypDB integrates datasets from Leishmania braziliensis, L. infantum, L. major, L. tarentolae, Trypanosoma brucei and T. cruzi. Users may examine individual genes or chromosomal spans in their genomic context, including syntenic alignments with other kinetoplastid organisms. Data within TriTrypDB can be interrogated utilizing a sophisticated search strategy system that enables a user to construct complex queries combining multiple data types. All search strategies are stored, allowing future access and integrated searches. 'User Comments' may be added to any gene page, enhancing available annotation; such comments become immediately searchable via the text search, and are forwarded to curators for incorporation into the reference annotation when appropriate.
Nucleic Acids Research 01/2010; 38(Database issue):D457-62. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Trypanosoma brucei gambiense is the causative agent of chronic Human African Trypanosomiasis or sleeping sickness, a disease endemic across often poor and rural areas of Western and Central Africa. We have previously published the genome sequence of a T. b. brucei isolate, and have now employed a comparative genomics approach to understand the scale of genomic variation between T. b. gambiense and the reference genome. We sought to identify features that were uniquely associated with T. b. gambiense and its ability to infect humans.
An improved high-quality draft genome sequence for the group 1 T. b. gambiense DAL 972 isolate was produced using a whole-genome shotgun strategy. Comparison with T. b. brucei showed that sequence identity averages 99.2% in coding regions, and gene order is largely collinear. However, variation associated with segmental duplications and tandem gene arrays suggests some reduction of functional repertoire in T. b. gambiense DAL 972. A comparison of the variant surface glycoproteins (VSG) in T. b. brucei with all T. b. gambiense sequence reads showed that the essential structural repertoire of VSG domains is conserved across T. brucei.
This study provides the first estimate of intraspecific genomic variation within T. brucei, and so has important consequences for future population genomics studies. We have shown that the T. b. gambiense genome corresponds closely with the reference, which should therefore be an effective scaffold for any T. brucei genome sequence data. As VSG repertoire is also well conserved, it may be feasible to describe the total diversity of variant antigens. While we describe several as yet uncharacterized gene families with predicted cell surface roles that were expanded in number in T. b. brucei, no T. b. gambiense-specific gene was identified outside of the subtelomeres that could explain the ability to infect humans.
[Show abstract][Hide abstract] ABSTRACT: Candida dubliniensis is the closest known relative of Candida albicans, the most pathogenic yeast species in humans. However, despite both species sharing many phenotypic characteristics, including the ability to form true hyphae, C. dubliniensis is a significantly less virulent and less versatile pathogen. Therefore, to identify C. albicans-specific genes that may be responsible for an increased capacity to cause disease, we have sequenced the C. dubliniensis genome and compared it with the known C. albicans genome sequence. Although the two genome sequences are highly similar and synteny is conserved throughout, 168 species-specific genes are identified, including some encoding known hyphal-specific virulence factors, such as the aspartyl proteinases Sap4 and Sap5 and the proposed invasin Als3. Among the 115 pseudogenes confirmed in C. dubliniensis are orthologs of several filamentous growth regulator (FGR) genes that also have suspected roles in pathogenesis. However, the principal differences in genomic repertoire concern expansion of the TLO gene family of putative transcription factors and the IFA family of putative transmembrane proteins in C. albicans, which represent novel candidate virulence-associated factors. The results suggest that the recent evolutionary histories of C. albicans and C. dubliniensis are quite different. While gene families instrumental in pathogenesis have been elaborated in C. albicans, C. dubliniensis has lost genomic capacity and key pathogenic functions. This could explain why C. albicans is a more potent pathogen in humans than C. dubliniensis.
Genome Research 10/2009; 19(12):2231-44. · 13.85 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
[Show abstract][Hide abstract] ABSTRACT: Trypanosoma congolense is one of the most economically important pathogens of livestock in Africa. Culture-derived parasites of each of the three main insect stages of the T. congolense life cycle, i.e., the procyclic, epimastigote and metacyclic stages, and bloodstream stage parasites isolated from infected mice, were used to construct stage-specific cDNA libraries and expressed sequence tags (ESTs or cDNA clones) in each library were sequenced. Thirteen EST clusters encoding different variant surface glycoproteins (VSGs) were detected in the metacyclic library and 26 VSG EST clusters were found in the bloodstream library, 6 of which are shared by the metacyclic library. Rare VSG ESTs are present in the epimastigote library, and none were detected in the procyclic library. ESTs encoding enzymes that catalyze oxidative phosphorylation and amino acid metabolism are about twice as abundant in the procyclic and epimastigote stages as in the metacyclic and bloodstream stages. In contrast, ESTs encoding enzymes involved in glycolysis, the citric acid cycle and nucleotide metabolism are about the same in all four developmental stages. Cysteine proteases, kinases and phosphatases are the most abundant enzyme groups represented by the ESTs. All four libraries contain T. congolense-specific expressed sequences not present in the Trypanosoma brucei and Trypanosoma cruzi genomes. Normalized cDNA libraries were constructed from the metacyclic and bloodstream stages, and found to be further enriched for T. congolense-specific ESTs. Given that cultured T. congolense offers an experimental advantage over other African trypanosome species, these ESTs provide a basis for further investigation of the molecular properties of these four developmental stages, especially the epimastigote and metacyclic stages for which it is difficult to obtain large quantities of organisms. The T. congolense EST databases are available at: http://www.sanger.ac.uk/Projects/T_congolense/EST_index.shtml. The sequence data have been submitted to EMBL under the following accession numbers: FN263376-FN292969.
Molecular and Biochemical Parasitology 07/2009; 168(1):34-42. · 2.24 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The increasing availability of genomic data for pathogens that cause tropical diseases has created new opportunities for drug discovery and development. However, if the potential of such data is to be fully exploited, the data must be effectively integrated and be easy to interrogate. Here, we discuss the development of the TDR Targets database (http://tdrtargets.org), which encompasses extensive genetic, biochemical and pharmacological data related to tropical disease pathogens, as well as computationally predicted druggability for potential targets and compound desirability information. By allowing the integration and weighting of this information, this database aims to facilitate the identification and prioritization of candidate drug targets for pathogens.
dressNature Reviews Drug Discovery 11/2008; 7(11):900-7. · 37.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only approximately 200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader-associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.
[Show abstract][Hide abstract] ABSTRACT: In this review, we present an overview of the Gene Ontology (GO) structure and describe how the GO is implemented for Sz. pombe and made available via Sz. pombe GeneDB (http://www.genedb.org/genedb/pombe/). We give a detailed progress report of Sz. pombe GO annotation, providing the current status of both manual and automatic annotations. Fission yeast has at least one GO annotation for 98.3% of its genes (excluding annotations to 'unknown' terms), greater than the current percentage coverage for any other organism. Approximately 65% (3225 gene products) have at least one annotation to each of the three ontologies (biological process, cellular component and molecular function). Approximately 30% (1443 gene products) have GO terms derived directly from small-scale experiments in fission yeast, supporting the validity of fission yeast as a model eukaryote and a reference organism.
[Show abstract][Hide abstract] ABSTRACT: Theileria annulata and T. parva are closely related protozoan parasites that cause lymphoproliferative diseases of cattle. We sequenced the genome of T. annulata and compared it with that of T. parva to understand the mechanisms underlying transformation and tropism. Despite high conservation of gene sequences and synteny, the analysis reveals unequally expanded gene families and species-specific genes. We also identify divergent families of putative secreted polypeptides that may reduce immune recognition, candidate regulators of host-cell transformation, and a Theileria-specific protein domain [frequently associated in Theileria (FAINT)] present in a large number of secreted proteins.
[Show abstract][Hide abstract] ABSTRACT: African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including approximately 900 pseudogenes and approximately 1700 T. brucei-specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.
[Show abstract][Hide abstract] ABSTRACT: Centralisation of tools for analysis of genomic data is paramount in ensuring that research is always carried out on the latest currently available data. As such, World Wide Web sites providing a range of online analyses and displays of data can play a crucial role in guaranteeing consistency of in silico work. In this respect, the protozoan parasite research community is served by several resources, either focussing on data and tools for one species or taking a broader view and providing tools for analysis of data from many species, thereby facilitating comparative studies. In this paper, we give a broad overview of the online resources available. We then focus on the GeneDB project, detailing the features and tools currently available through it. Finally, we discuss data curation and its importance in keeping genomic data 'relevant' to the research community.
International Journal for Parasitology 05/2005; 35(5):481-93. · 3.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: GeneDB (http://www.genedb.org/) is a genome database for prokaryotic and eukaryotic organisms. The resource provides a portal through which data generated by the Pathogen Sequencing Unit at the Wellcome Trust Sanger Institute and other collaborating sequencing centres can be made publicly available. It combines data from finished and ongoing genome and expressed sequence tag (EST) projects with curated annotation, that can be searched, sorted and downloaded, using a single web based resource. The current release stores 11 datasets of which six are curated and maintained by biologists, who review and incorporate information from the scientific literature, public databases and the respective research communities.
Nucleic Acids Research 02/2004; 32(Database issue):D339-43. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The Malaria Genome Sequencing Consortium Meeting was held on 5–6 June 2001 in Cambridge, UK, and was followed by a workshop discussing the generation of new parasite-specific gene ontology (GO) terms. Present at the meeting, which focused primarily on malaria, were representatives from the research community, the sequencing centres, including the Sanger Centre (Cambridge, UK) and The Institute for Genome Research (TIGR; Rockville, MD, USA), the European Bioinformatics Institute (EBI; Cambridge, UK), Plasmo DB and the GO consortium (Cambridge, UK).
Trends in Parasitology 11/2001; 17(10):463-4. · 6.22 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: During 1993-1994, scientists from developing and developed countries planned and initiated a number of parasite genome projects and several consortiums for the mapping and sequencing of these medium-sized genomes were established, often based on already ongoing scientific collaborations. Financial and other support came from WHO/TDR, Wellcome Trust and other funding agencies. Thus, the genomes of Plasmodium falciparum, Schistosoma mansoni, Trypanosoma cruzi, Leishmania major, Trypanosoma brucei, Brugia malayi and other pathogenic nematodes are now under study. From an initial phase of network formation, mapping efforts and resource building (EST, GSS, phage, cosmid, BAC and YAC library constructions), sequencing was initiated in gene discovery projects but soon also on a small chromosome, and now on a fully fledged genome scale. Proteomics, functional analysis, genetic manipulation and microarray analysis are ongoing to different degrees in the respective genome initiatives, and as the funding for the whole genome sequencing becomes secured, most of the participating laboratories, apart from larger sequencing centres, become oriented to post-genomics. Bioinformatics networks are being expanded, including in developing countries, for data mining, annotation and in-depth analysis.
International Journal for Parasitology 06/2001; 31(5-6):532-6. · 3.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The initiation of genome projects on helminths of medical importance promises to yield new drug targets and vaccine candidates in unprecedented numbers. In order to exploit this emerging data it is essential that the user community is aware of the scope and quality of data available, and that the genome projects provide analyses of the raw data to highlight potential genes of interest. Core bioinformatics support for the parasite genome projects has promoted these approaches. In the Brugia genome project, a combination of expressed sequence tag sequencing from multiple cDNA libraries representing the complete filarial nematode lifecycle, and comparative analysis of the sequence dataset, particularly using the complete genome sequence of the model nematode C. elegans, has proved very effective in gene discovery.