ArticlePDF Available

TcSNP: A database of genetic variation in Trypanosoma cruzi

Authors:

Abstract and Figures

The TcSNP database (http://snps.tcruzi.org) integrates information on genetic variation (polymorphisms and mutations) for different stocks, strains and isolates of Trypanosoma cruzi, the causative agent of Chagas disease. The database incorporates sequences (genes from the T. cruzi reference genome, mRNAs, ESTs and genomic sequences); multiple sequence alignments obtained from these sequences; and single-nucleotide polymorphisms and small indels identified by scanning these multiple sequence alignments. Information in TcSNP can be readily interrogated to arrive at gene sets, or SNP sets of interest based on a number of attributes. Sequence similarity searches using BLAST are also supported. This first release of TcSNP contains nearly 170 000 high-confidence candidate SNPs, derived from the analysis of annotated coding sequences. As new sequence data become available, TcSNP will incorporate these data, mapping new candidate SNPs onto the reference genome sequences.
Content may be subject to copyright.
A preview of the PDF is not available
... A number of aspects were considered to reduce these risks i) we decided to focus our analysis on coding sequences where possible, as they are generally less variable than non-coding sequences; ii) we used the T. cruzi SNP database (TcSNP, http://snps.tcruzi.org) [46], to select the best regions for primer design, avoiding regions with candidate SNPs; and iii) we also limited the size of amplification products, to optimize the quality of the final sequence (see Methods). In this latter case, an optimal size was set to ,750-800 bp, which would allow us to get complete sequence coverage, with good quality on both strands. ...
... Sequence polymorphisms identified between different strains/clones are available as supplementary material, and will be also available in a future release of the TcSNP database (http:// snps.tcruzi.org, [46]). Figure S1 Distribution of observed SNPs in the TcSMOlike genes of T. cruzi. ...
Article
Full-text available
In Trypanosoma cruzi the isoprenoid and sterol biosynthesis pathways are validated targets for chemotherapeutic intervention. In this work we present a study of the genetic diversity observed in genes from these pathways. Using a number of bioinformatic strategies, we first identified genes that were missing and/or were truncated in the T. cruzi genome. Based on this analysis we obtained the complete sequence of the ortholog of the yeast ERG26 gene and identified a non-orthologous homolog of the yeast ERG25 gene (sterol methyl oxidase, SMO), and we propose that the orthologs of ERG25 have been lost in trypanosomes (but not in Leishmanias). Next, starting from a set of 16 T. cruzi strains representative of all extant evolutionary lineages, we amplified and sequenced ∼24 Kbp from 22 genes, identifying a total of 975 SNPs or fixed differences, of which 28% represent non-synonymous changes. We observed genes with a density of substitutions ranging from those close to the average (∼2.5/100 bp) to some showing a high number of changes (11.4/100 bp, for the putative lathosterol oxidase gene). All the genes of the pathway are under apparent purifying selection, but genes coding for the sterol C14-demethylase, the HMG-CoA synthase, and the HMG-CoA reductase have the lowest density of missense SNPs in the panel. Other genes (TcPMK, TcSMO-like) have a relatively high density of non-synonymous SNPs (2.5 and 1.9 every 100 bp, respectively). However, none of the non-synonymous changes identified affect a catalytic or ligand binding site residue. A comparative analysis of the corresponding genes from African trypanosomes and Leishmania shows similar levels of apparent selection for each gene. This information will be essential for future drug development studies focused on this pathway.
... Regarding the tissue parasitism quantification by PCR and RT-PCR, several authors mentioned some of the limitations that further complicate the choice of primers for quantification of this parasite [24][25][26][27]: high parasitic genetic variability, the considerable size of its genome, which is not completely encoded in the available databases, cross-reactions with other parasites, including Leishmania spp. And Trypanosoma rangeli, and the geographical specificity of each T. cruzi strain. ...
Article
Full-text available
Symptoms in the acute phase of Chagas disease are usually mild and nonspecific. However, after several years, severe complications like dilated heart failure and even death may arise in the chronic phase. Due to the lack of specific symptoms in the acute phase, the aim of this work was to describe and analyze the cardiac histopathology during this phase in a CD1 mouse model by assessing parasitism, fibrotic damage, and the presence and composition of a cellular infiltrate, to determine its involvement in the pathogenesis of lesions in the cardiac tissue. Our results indicate that the acute phase lasts about 62 days post-infection (dpi). A significant increase in parasitemia was observed since 15 dpi, reaching a maximum at 33 dpi (4.1 × 106). The presence of amastigote nests was observed at 15–62 dpi, with a maximum count of 27 nests at 35 dpi. An infiltrate consisting primarily of macrophages and neutrophils was found in the cardiac tissue within the first 30 days, but the abundance of lymphocytes showed an 8 ≥ fold increase at 40–62 dpi. Unifocal interstitial fibrosis was identified after 9 dpi, which subsequently showed a 16 ≥ fold increase at 40–60 dpi, along with a 50% mortality rate in the model under study. The increased area of fibrotic lesions revealed progression in the extent of fibrosis, mainly at 50–62 dpi. The presence of perivasculitis and thrombus circulation disorders was seen in the last days (62 dpi); finally, cases of myocytolysis were observed at 50 and 62 dpi. These histopathological alterations, combined with collagen deposition, seem to lead to the development of interstitial fibrosis and damage to the cardiac tissue during the acute phase of infection. This study provides a more complete understanding of the patterns of histopathological abnormalities involved in the acute phase, which could help the development of new therapies to aid the preclinical tests of drugs for their application in Chagas disease.
... (76) In 2009, the TcSNP database was created to integrate information on genetic variation of different T. cruzi genotypes. (77) In 2007, Andersson and colleagues built a database for T. cruzi repeated genes which could help understand the complexity of parasite genome. (78) Unfortunately, besides their relevance and usefulness for the research community, all these initiatives were discontinued, with the TriTrypDB (79) and the NCBI databases (80) currently being the main sources of T. cruzi genetic data. ...
Article
Full-text available
Chagas disease is an enduring public health issue in many Latin American countries, receiving insufficient investment in research and development. Strategies for disease control and management currently lack efficient pharmaceuticals, commercial diagnostic kits with improved sensitivity, and vaccines. Genetic heterogeneity of Trypanosoma cruzi is a key aspect for novel drug design since pharmacological technologies rely on the degree of conservation of parasite target proteins. Therefore, there is a need to expand the knowledge regarding parasite genetics which, if fulfilled, could leverage Chagas disease research and development, and improve disease control strategies. The growing capacity of whole-genome sequencing technology and its adoption as disease surveillance routine may be key for solving this long-lasting problem.
... The most surprising revelations from this comparative analysis were not the variability in unique gene content between these isolates, but rather the extremes of the high similarity in core gene content and the comparative huge diversity in gene family-rich portions of the genomes. As anticipated based upon previous strain-based screens [39,61], a considerable degree of variation exists in the form of SNPs/Indels and additionally, a substantial number of strain-specific copy number variations were identified. However, the core (non-gene family) genome, contains only~20 strain-unique gene models, and in all cases, these are hypothetical genes encoding proteins with no recognizable protein domain structures. ...
Article
Full-text available
The protozoan Trypanosoma cruzi almost invariably establishes life-long infections in humans and other mammals, despite the development of potent host immune responses that constrain parasite numbers. The consistent, decades-long persistence of T . cruzi in human hosts arises at least in part from the remarkable level of genetic diversity in multiple families of genes encoding the primary target antigens of anti-parasite immune responses. However, the highly repetitive nature of the genome–largely a result of these same extensive families of genes–have prevented a full understanding of the extent of gene diversity and its maintenance in T . cruzi . In this study, we have combined long-read sequencing and proximity ligation mapping to generate very high-quality assemblies of two T . cruzi strains representing the apparent ancestral lineages of the species. These assemblies reveal not only the full repertoire of the members of large gene families in the two strains, demonstrating extreme diversity within and between isolates, but also provide evidence of the processes that generate and maintain that diversity, including extensive gene amplification, dispersion of copies throughout the genome and diversification via recombination and in situ mutations. Gene amplification events also yield significant copy number variations in a substantial number of genes presumably not required for or involved in immune evasion, thus forming a second level of strain-dependent variation in this species. The extreme genome flexibility evident in T . cruzi also appears to create unique challenges with respect to preserving core genome functions and gene expression that sets this species apart from related kinetoplastids.
... tanpaku.org/tdb/), TriTrypDB [10], TrypanoCyc [11], TcSNP [12], and TcruziDB [13]. Nevertheless, none of the databases include details about the inhibitors or small molecules active against Trypanosoma sp. ...
Article
African Trypanosomiasis and American Trypanosomiasis are the diseases affecting more than thousands of people yearly and more than twenty-five million people risk acquiring the disease. The treatment for the disease is generally expensive, and most of the available drugs are of high-toxicity and cause fatal side-effects. Hence, there is a constant need for finding new treatment strategies for Trypanosomiasis. Combination therapy and repurposing or redesigning of existing inhibitors for new drugs are of high importance to address these hurdles, particularly the drug resistance. Hence, here we report TrypInDB, a searchable online resource of small molecule inhibitors having a varying degree of activity towards Trypanosoma sp. Information of about >14,000 small molecules from >700 published research articles was collected and made as an easy-to-search database. Four major sets of information were made available for each collected inhibitors viz., General information (activity values; source of the inhibitors; enzyme targets; etc.,), Structural information, Toxicity information, and Literature information. More than 25 different information about each inhibitor were collected or predicted and made accessible for searching. The database is designed to be queried easily with multiple-field filters with the provisions to perform sub-structure search and similar FDA approved drug searches. The database supports the easy export of queried records and structure in multiple formats. In addition, the TrypInDB is actively integrated into LeishInDB. We believe that the scope of TrypInDB permits the research community to exploit the available data for repurposing the inhibitors as well as for the investigation of new therapeutics. Database URL: http://trypindb.biomedinformri.com/
... Homologous proteins were removed from this group, as described for Group 1. Group 3 is composed of selected members of the MASP (mucin-associated surface proteins) family. This is a multigene family composed of ϳ1300 genes, with a high level of polymorphisms (26,27), localized at the surface of infective forms of the parasite. This protein family has earlier been proposed to participate in host-parasite interactions (28). ...
Article
Complete characterization of antibody specificities associated to natural infections is expected to provide a rich source of serologic biomarkers with potential applications in molecular diagnosis, follow-up of chemotherapeutic treatments, and prioritization of targets for vaccine development. Here, we developed a highly-multiplexed platform based on next-generation high-density peptide microarrays to map these specificities in Chagas Disease, an exemplar of a human infectious disease caused by the protozoan Trypanosoma cruzi. We designed a high-density peptide microarray containing more than 175,000 overlapping 15mer peptides derived from T. cruzi proteins. Peptides were synthesized in situ on microarray slides, spanning the complete length of 457 parasite proteins with fully overlapped 15mers (1 residue shift). Screening of these slides with antibodies purified from infected patients and healthy donors demonstrated both a high technical reproducibility as well as epitope mapping consistency when compared with earlier low-throughput technologies. Using a conservative signal threshold to classify positive (reactive) peptides we identified 2,031 disease-specific peptides and 97 novel parasite antigens, effectively doubling the number of known antigens and providing a tenfold increase in the number of fine mapped antigenic determinants for this disease. Finally, further analysis of the chip data showed that optimizing the amount of sequence overlap of displayed peptides can increase the protein space covered in a single chip by at least ~3 fold without sacrificing sensitivity. In conclusion, we show the power of high-density peptide chips for the discovery of pathogen-specific linear B-cell epitopes from clinical samples, thus setting the stage for high-throughput biomarker discovery screenings and proteome-wide studies of immune responses against pathogens. Copyright © 2015, The American Society for Biochemistry and Molecular Biology.
... Similarly, the number of SNP markers in kinetoplastida has revealed a few loci, but only data for T. cruzi are available at dbSNP (http://www.ncbi.nlm.nih.gov/snp/) and TcSNP (http://snps.tcruzi.org/) (Ackermann et al., 2009). Because the number of both MS and SNP markers is relatively small for T. rangeli (Grisard et al., 1999; Urrea et al., 2011), we searched for and used MS and SNP markers to assess the genetic variability and population structure of T. rangeli strains isolated from distinct geographical regions, hosts and vectors. ...
Article
Assessment of the genetic variability and population structure of Trypanosoma rangeli, a non-pathogenic American trypanosome, was carried out through microsatellite (MS) and single-nucleotide polymorphism (SNP) analyses. Two approaches were used for MS typing: data mining in expressed sequence tag (EST)/open reading frame expressed sequence tags (ORESTES) libraries and PCR-based Isolation of Microsatellite Arrays (PIMA) from genomic libraries. All microsatellites found were evaluated for their abundance, frequency and usefulness as markers. Genotyping of T. rangeli strains and clones was performed for 18 loci amplified by PCR from EST/ORESTES libraries. The presence of SNPs in the nuclear, multi-copy, spliced leader gene was assessed in 18 T. rangeli strains, and the results show that T. rangeli has a predominantly clonal population structure, allowing a robust phylogenetic analysis. MS typing revealed a subdivision of the KP1(-) genetic group, which may be influenced by geographical location and/or by the co-evolution of parasite and vectors occurring within the same geographical areas. The hypothesis of parasite-vector co-evolution was corroborated by SNP analysis of the spliced leader gene. Taken together, the results suggest three T. rangeli groups: i) the T. rangeli Amazonian group; ii) the T. rangeli KP1(-) group; and iii) the T. rangeli KP1(+) group. The latter two groups possibly evolved from the Amazonian group to produce KP1(+) and KP1(-) strains.
... Some large gene families such as the transsialidase superfamily, mucins, mucin-associated proteins (MASP) and dispersed gene family protein 1 (DGF-1), initially assigned to multiple OrthoMCL gene clusters, were merged according to the their current annotation and sequence similarity. Information on non synonymous SNPs between allelic copies of T. cruzi genes were obtained from the TcSNP Database of T. cruzi genetic variation [60]. To calculate genome-wide Codon Adaptation Indexes, we used EMBOSS CAI [51,61]. ...
Article
Full-text available
The availability of complete pathogen genomes has renewed interest in the development of diagnostics for infectious diseases. Synthetic peptide microarrays provide a rapid, high-throughput platform for immunological testing of potential B-cell epitopes. However, their current capacity prevent the experimental screening of complete "peptidomes". Therefore, computational approaches for prediction and/or prioritization of diagnostically relevant peptides are required. In this work we describe a computational method to assess a defined set of molecular properties for each potential diagnostic target in a reference genome. Properties such as sub-cellular localization or expression level were evaluated for the whole protein. At a higher resolution (short peptides), we assessed a set of local properties, such as repetitive motifs, disorder (structured vs natively unstructured regions), trans-membrane spans, genetic polymorphisms (conserved vs. divergent regions), predicted B-cell epitopes, and sequence similarity against human proteins and other potential cross-reacting species (e.g. other pathogens endemic in overlapping geographical locations). A scoring function based on these different features was developed, and used to rank all peptides from a large eukaryotic pathogen proteome. We applied this method to the identification of candidate diagnostic peptides in the protozoan Trypanosoma cruzi, the causative agent of Chagas disease. We measured the performance of the method by analyzing the enrichment of validated antigens in the high-scoring top of the ranking. Based on this measure, our integrative method outperformed alternative prioritizations based on individual properties (such as B-cell epitope predictors alone). Using this method we ranked [Formula: see text]10 million 12-mer overlapping peptides derived from the complete T. cruzi proteome. Experimental screening of 190 high-scoring peptides allowed the identification of 37 novel epitopes with diagnostic potential, while none of the low scoring peptides showed significant reactivity. Many of the metrics employed are dependent on standard bioinformatic tools and data, so the method can be easily extended to other pathogen genomes.
Preprint
Full-text available
Background. Sterols such as cholesterol, are important components of cellular membranes. But unlike mammalian cells, the main sterols found in the membranes of trypanosomes and fungi are ergosterol, and other 24-methyl sterols, which are required for growth and viability. In spite of this strict requirement, this group of organisms have evolved different strategies to produce and/or obtain sterols. Trypanosoma cruzi is the causative agent of Chagas Disease. In this parasite, one of the few validated targets for chemotherapeutic intervention is the sterol biosynthesis pathway. In this work we present a study of the genetic diversity observed in genes of the isoprenoid and sterol biosynthesis pathways in T. cruzi, and a comparative analysis of the diversity found in other trypanosomatids. Methodology/Principal Findings. Using a number of bioinformatic strategies, we first completed a number of holes in the pathway by identifying the sequences of genes that were missing and/or were truncated in the draft T. cruzi genome. Based on this analysis we identified a non-orthologous homolog of the yeast ERG25 gene (sterol methyl oxidase, SMO) and propose that the orthologs of ERG25 have been lost in trypanosomes (but not in leishmanias). Next, starting from a set of 16 T. cruzi strains representative of six major evolutionary lineages, we have amplified and sequenced ~ 24Kbp from 18 genes of the pathway, and identified a total of 975 SNPs or fixed differences, of which 28% represent nonsynonymous changes. We observed different patterns of accumulation of nucleotide changes for different genes of the pathway, from genes with a density of substitutions ranging from those close to the average (~2.5/100 bp) to some showing a high number of changes (11.4/100 bp, for a putative lathosterol oxidase gene). The majority of genes are under apparent purifying selection. However, two genes (TcPMK, TcSMO-like) have a ratio of nonsynonymous to synonymous changes that is close to neutrality. None of the nonsynonymous changes identified affect a catalytic or a ligand binding site residue. However, after mapping these changes on top of available structural data, we identified a number of changes that are in the close vicinity (7 Angstrom) of key residues, and that could therefore be functionally important. A comparative analysis of the corresponding T. brucei and Leishmania genes, obtained from available complete genomes highlights a high degree of conservation of the pathway, but with differences in the genes that are under apparent purifying selection in each case. Conclusions/Significance. We have identified a number of genes of the sterol biosynthesis pathway that were missing from the T. cruzi genome assembly. Also, we have identified unequal apparent selection acting on these genes, which may provide essential information for the future of drug development studies focused on this pathway.
Article
Phospholipase A(1) (PLA(1)) has been described in the infective stages of Trypanosoma cruzi as a membrane-bound/secreted enzyme that significantly modified host cell lipid profile with generation of second lipid messengers and concomitant activation of Protein Kinase C. In the present work we determined higher levels of PLA(1) expression in the infective amastigotes and trypomastigotes than in the non-infective epimastigotes of lethal RA strain. In addition, we found similar expression patterns but distinct PLA(1) activity levels in bloodstream trypomastigotes from Cvd and RA (lethal) and K98 (non-lethal) T. cruzi strains, obtained at their corresponding parasitemia peaks. This fact was likely due to the presence of different levels of anti-T. cruzi PLA(1) antibodies in sera of infected mice, that modulated the enzyme activity. Moreover, these antibodies significantly reduced in vitro parasite invasion indicating the participation of T. cruzi PLA(1) in the early events of parasite-host cell interaction. We also demonstrated the presence of Lysophospholipase activity in live infective stages that could account for self-protection against the toxic lysophospholipids generated by T. cruzi PLA(1) action. At the genome level, we identified at least eight putative genes that codify for T. cruzi PLA(1) with high amino acid sequence variability in their amino and carboxy-terminal regions; a putative PLA(1) selected gene was cloned and expressed as a recombinant protein that possessed PLA(1) activity. Collectively, the results presented here point out at T. cruzi PLA(1) as a novel virulence factor implicated in parasite invasion.
Article
Full-text available
Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large, families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei, and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.
Article
Full-text available
The increasing availability of genomic data for pathogens that cause tropical diseases has created new opportunities for drug discovery and development. However, if the potential of such data is to be fully exploited, the data must be effectively integrated and be easy to interrogate. Here, we discuss the development of the TDR Targets database (http://tdrtargets.org), which encompasses extensive genetic, biochemical and pharmacological data related to tropical disease pathogens, as well as computationally predicted druggability for potential targets and compound desirability information. By allowing the integration and weighting of this information, this database aims to facilitate the identification and prioritization of candidate drug targets for pathogens.
Article
Full-text available
Single-nucleotide polymorphisms (SNPs) are the most abundant form of human genetic variation and a resource for mapping complex genetic traits. The large volume of data produced by high-throughput sequencing projects is a rich and largely untapped source of SNPs (refs 2, 3, 4, 5). We present here a unified approach to the discovery of variations in genetic sequence data of arbitrary DNA sources. We propose to use the rapidly emerging genomic sequence as a template on which to layer often unmapped, fragmentary sequence data and to use base quality values to discern true allelic variations from sequencing errors. By taking advantage of the genomic sequence we are able to use simpler yet more accurate methods for sequence organization: fragment clustering, paralogue identification and multiple alignment. We analyse these sequences with a novel, Bayesian inference engine, POLYBAYES, to calculate the probability that a given site is polymorphic. Rigorous treatment of base quality permits completely automated evaluation of the full length of all sequences, without limitations on alignment depth. We demonstrate this approach by accurate SNP predictions in human ESTs aligned to finished and working-draft quality genomic sequences, a data set representative of the typical challenges of sequence-based SNP discovery.
Article
Designing PCR and sequencing primers are essential activities for molecular biologists around the world. This chapter assumes acquaintance with the principles and practice of PCR, as outlined in, for example, refs. 1, 2, 3, 4.
Chapter
Designing PCR and sequencing primers are essential activities for molecular biologists around the world. This chapter assumes acquaintance with the principles and practice of PCR, as outlined in, for example, refs. 1–4.
Article
We review recent advances in the study of population structure and phylogenetic diversity of parasites belonging to the genera Trypanosoma and Leishmania. In all species properly analyzed, these parasites exhibit a basically clonal population structure, with occasional bouts of genetic exchange or hybridization, and a strong structuration of their populations into discrete evolutionary lineages. On an evolutionary scale, the impact of sex appears to be greater in African than in American trypanosomes. The taxonomic status of some Leishmania `species' is questionable.
Article
Chagas disease, caused by the protozoan Trypanosoma cruzi, has a variable clinical course, ranging from symptomless infection to severe chronic disease with cardiovascular or gastrointestinal involvement or even overwhelming acute episodes. The factors influencing this clinical variability have not been elucidated, but genetic variation of both the host and parasite is likely to be important. Here, Andréa M. Macedo and Sérgio D.J. Pena review the evidence showing a role for the genetic constitution of T. cruzi in determining the clinical characteristics of Chagas disease, and propose a `clonal-histotropic model' for the pathogenesis of this disease.
Article
We have assessed the phylogenetic status of the Trypanosoma cruzi Genome Project CL Brener reference strain by multilocus enzyme electrophoresis (MLEE) and multiprimer random amplified polymorphic DNA (RAPD) including a set of cloned stocks representative of the whole genetic diversity of T. cruzi. MLEE and RAPD data gave congruent phylogenetic results. The CL Brener reference strain fell into the second major phylogenetic subdivision of T. cruzi, and was genetically very close to the Tulahuen reference strain. No reliable RAPD character and only one MLEE character permitted us to distinguish between the CL Brener and Tulahuen reference strains. In contrast, many RAPD and MLEE characters were able to distinguish between the CL Brener reference strain and the other T. cruzi genotypes analyzed here, in particular the formerly described principal zymodemes I, II and III. It is suspected that both CL Brener and Tulahuen are hybrid genotypes, a fact that should be taken into account when interpreting sequence data. Moreover, our study confirms that the species T. cruzi is genetically very heterogeneous. We recommend future comparison of sequencing data from the CL Brener reference strain with those of at least one radically distinct T. cruzi genotype, belonging to the other major phylogenetic subdivision of this species.