[Show abstract][Hide abstract] ABSTRACT: GeneDB (http://www.genedb.org) is a genome database for prokaryotic and eukaryotic pathogens and closely related organisms. The resource provides a portal to genome sequence and annotation data, which is primarily generated by the Pathogen Genomics group at the Wellcome Trust Sanger Institute. It combines data from completed and ongoing genome projects with curated annotation, which is readily accessible from a web based resource. The development of the database in recent years has focused on providing database-driven annotation tools and pipelines, as well as catering for increasingly frequent assembly updates. The website has been significantly redesigned to take advantage of current web technologies, and improve usability. The current release stores 41 data sets, of which 17 are manually curated and maintained by biologists, who review and incorporate data from the scientific literature, as well as other sources. GeneDB is primarily a production and annotation database for the genomes of predominantly pathogenic organisms.
Nucleic Acids Research 11/2012; 40(Database issue):D98-108. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: TriTrypDB (http://tritrypdb.org) is an integrated database providing access to genome-scale datasets for kinetoplastid parasites, and supporting a variety of complex queries driven by research and development needs. TriTrypDB is a collaborative project, utilizing the GUS/WDK computational infrastructure developed by the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) to integrate genome annotation and analyses from GeneDB and elsewhere with a wide variety of functional genomics datasets made available by members of the global research community, often pre-publication. Currently, TriTrypDB integrates datasets from Leishmania braziliensis, L. infantum, L. major, L. tarentolae, Trypanosoma brucei and T. cruzi. Users may examine individual genes or chromosomal spans in their genomic context, including syntenic alignments with other kinetoplastid organisms. Data within TriTrypDB can be interrogated utilizing a sophisticated search strategy system that enables a user to construct complex queries combining multiple data types. All search strategies are stored, allowing future access and integrated searches. 'User Comments' may be added to any gene page, enhancing available annotation; such comments become immediately searchable via the text search, and are forwarded to curators for incorporation into the reference annotation when appropriate.
Nucleic Acids Research 01/2010; 38(Database issue):D457-62. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Candida dubliniensis is the closest known relative of Candida albicans, the most pathogenic yeast species in humans. However, despite both species sharing many phenotypic characteristics, including the ability to form true hyphae, C. dubliniensis is a significantly less virulent and less versatile pathogen. Therefore, to identify C. albicans-specific genes that may be responsible for an increased capacity to cause disease, we have sequenced the C. dubliniensis genome and compared it with the known C. albicans genome sequence. Although the two genome sequences are highly similar and synteny is conserved throughout, 168 species-specific genes are identified, including some encoding known hyphal-specific virulence factors, such as the aspartyl proteinases Sap4 and Sap5 and the proposed invasin Als3. Among the 115 pseudogenes confirmed in C. dubliniensis are orthologs of several filamentous growth regulator (FGR) genes that also have suspected roles in pathogenesis. However, the principal differences in genomic repertoire concern expansion of the TLO gene family of putative transcription factors and the IFA family of putative transmembrane proteins in C. albicans, which represent novel candidate virulence-associated factors. The results suggest that the recent evolutionary histories of C. albicans and C. dubliniensis are quite different. While gene families instrumental in pathogenesis have been elaborated in C. albicans, C. dubliniensis has lost genomic capacity and key pathogenic functions. This could explain why C. albicans is a more potent pathogen in humans than C. dubliniensis.
Genome Research 10/2009; 19(12):2231-44. · 14.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
[Show abstract][Hide abstract] ABSTRACT: Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences.
Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text.
Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/ http://www.sanger.ac.uk/Software/ACT/
[Show abstract][Hide abstract] ABSTRACT: Plasmodium knowlesi is an intracellular malaria parasite whose natural vertebrate host is Macaca fascicularis (the 'kra' monkey); however, it is now increasingly recognized as a significant cause of human malaria, particularly in southeast Asia. Plasmodium knowlesi was the first malaria parasite species in which antigenic variation was demonstrated, and it has a close phylogenetic relationship to Plasmodium vivax, the second most important species of human malaria parasite (reviewed in ref. 4). Despite their relatedness, there are important phenotypic differences between them, such as host blood cell preference, absence of a dormant liver stage or 'hypnozoite' in P. knowlesi, and length of the asexual cycle (reviewed in ref. 4). Here we present an analysis of the P. knowlesi (H strain, Pk1(A+) clone) nuclear genome sequence. This is the first monkey malaria parasite genome to be described, and it provides an opportunity for comparison with the recently completed P. vivax genome and other sequenced Plasmodium genomes. In contrast to other Plasmodium genomes, putative variant antigen families are dispersed throughout the genome and are associated with intrachromosomal telomere repeats. One of these families, the KIRs, contains sequences that collectively match over one-half of the host CD99 extracellular domain, which may represent an unusual form of molecular mimicry.
[Show abstract][Hide abstract] ABSTRACT: Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only approximately 200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader-associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.
[Show abstract][Hide abstract] ABSTRACT: Eimeria tenella is an intracellular protozoan parasite that infects the intestinal tracts of domestic fowl and causes coccidiosis, a serious and sometimes lethal enteritis. Eimeria falls in the same phylum (Apicomplexa) as several human and animal parasites such as Cryptosporidium, Toxoplasma, and the malaria parasite, Plasmodium. Here we report the sequencing and analysis of the first chromosome of E. tenella, a chromosome believed to carry loci associated with drug resistance and known to differ between virulent and attenuated strains of the parasite. The chromosome--which appears to be representative of the genome--is gene-dense and rich in simple-sequence repeats, many of which appear to give rise to repetitive amino acid tracts in the predicted proteins. Most striking is the segmentation of the chromosome into repeat-rich regions peppered with transposon-like elements and telomere-like repeats, alternating with repeat-free regions. Predicted genes differ in character between the two types of segment, and the repeat-rich regions appear to be associated with strain-to-strain variation.
Genome Research 04/2007; 17(3):311-9. · 14.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Theileria annulata and T. parva are closely related protozoan parasites that cause lymphoproliferative diseases of cattle. We sequenced the genome of T. annulata and compared it with that of T. parva to understand the mechanisms underlying transformation and tropism. Despite high conservation of gene sequences and synteny, the analysis reveals unequally expanded gene families and species-specific genes. We also identify divergent families of putative secreted polypeptides that may reduce immune recognition, candidate regulators of host-cell transformation, and a Theileria-specific protein domain [frequently associated in Theileria (FAINT)] present in a large number of secreted proteins.
[Show abstract][Hide abstract] ABSTRACT: African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including approximately 900 pseudogenes and approximately 1700 T. brucei-specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.
[Show abstract][Hide abstract] ABSTRACT: The social amoebae are exceptional in their ability to alternate between unicellular and multicellular forms. Here we describe the genome of the best-studied member of this group, Dictyostelium discoideum. The gene-dense chromosomes of this organism encode approximately 12,500 predicted proteins, a high proportion of which have long, repetitive amino acid tracts. There are many genes for polyketide synthases and ABC transporters, suggesting an extensive secondary metabolism for producing and exporting small molecules. The genome is rich in complex repeats, one class of which is clustered and may serve as centromeres. Partial copies of the extrachromosomal ribosomal DNA (rDNA) element are found at the ends of each chromosome, suggesting a novel telomere structure and the use of a common mechanism to maintain both the rDNA and chromosomal termini. A proteome-based phylogeny shows that the amoebozoa diverged from the animal-fungal lineage after the plant-animal split, but Dictyostelium seems to have retained more of the diversity of the ancestral genome than have plants, animals or fungi.
[Show abstract][Hide abstract] ABSTRACT: Centralisation of tools for analysis of genomic data is paramount in ensuring that research is always carried out on the latest currently available data. As such, World Wide Web sites providing a range of online analyses and displays of data can play a crucial role in guaranteeing consistency of in silico work. In this respect, the protozoan parasite research community is served by several resources, either focussing on data and tools for one species or taking a broader view and providing tools for analysis of data from many species, thereby facilitating comparative studies. In this paper, we give a broad overview of the online resources available. We then focus on the GeneDB project, detailing the features and tools currently available through it. Finally, we discuss data curation and its importance in keeping genomic data 'relevant' to the research community.
International Journal for Parasitology 05/2005; 35(5):481-93. · 3.64 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei . The 26-megabase genome contains 9068 predicted genes, including ~900 pseudogenes and ~1700 T. brucei –specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei , T. cruzi , and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major . Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.
[Show abstract][Hide abstract] ABSTRACT: GeneDB (http://www.genedb.org/) is a genome database for prokaryotic and eukaryotic organisms. The resource provides a portal through which data generated by the Pathogen Sequencing Unit at the Wellcome Trust Sanger Institute and other collaborating sequencing centres can be made publicly available. It combines data from finished and ongoing genome and expressed sequence tag (EST) projects with curated annotation, that can be searched, sorted and downloaded, using a single web based resource. The current release stores 11 datasets of which six are curated and maintained by biologists, who review and incorporate information from the scientific literature, public databases and the respective research communities.
Nucleic Acids Research 02/2004; 32(Database issue):D339-43. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Since the sequencing of the first two chromosomes of the malaria parasite, Plasmodium falciparum, there has been a concerted effort to sequence and assemble the entire genome of this organism. Here we report the sequence of chromosomes 1, 3-9 and 13 of P. falciparum clone 3D7--these chromosomes account for approximately 55% of the total genome. We describe the methods used to map, sequence and annotate these chromosomes. By comparing our assemblies with the optical map, we indicate the completeness of the resulting sequence. During annotation, we assign Gene Ontology terms to the predicted gene products, and observe clustering of some malaria-specific terms to specific chromosomes. We identify a highly conserved sequence element found in the intergenic region of internal var genes that is not associated with their telomeric counterparts.
[Show abstract][Hide abstract] ABSTRACT: We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.