Trypanosomatid comparative genomics: Contributions to the study of parasite biology and different parasitic diseases.
ABSTRACT In 2005, draft sequences of the genomes of Trypanosoma brucei, Trypanosoma cruzi and Leishmania major, also known as the Tri-Tryp genomes, were published. These protozoan parasites are the causative agents of three distinct insect-borne diseases, namely sleeping sickness, Chagas disease and leishmaniasis, all with a worldwide distribution. Despite the large estimated evolutionary distance among them, a conserved core of ~6,200 trypanosomatid genes was found among the Tri-Tryp genomes. Extensive analysis of these genomic sequences has greatly increased our understanding of the biology of these parasites and their host-parasite interactions. In this article, we review the recent advances in the comparative genomics of these three species. This analysis also includes data on additional sequences derived from other trypanosmatid species, as well as recent data on gene expression and functional genomics. In addition to facilitating the identification of key parasite molecules that may provide a better understanding of these complex diseases, genome studies offer a rich source of new information that can be used to define potential new drug targets and vaccine candidates for controlling these parasitic infections.
Article: Genome-wide gene expression profiling analysis of Leishmania major and Leishmania infantum developmental stages reveals substantial differences between the two species.[show abstract] [hide abstract]
ABSTRACT: Leishmania parasites cause a diverse spectrum of diseases in humans ranging from spontaneously healing skin lesions (e.g., L. major) to life-threatening visceral diseases (e.g., L. infantum). The high conservation in gene content and genome organization between Leishmania major and Leishmania infantum contrasts their distinct pathophysiologies, suggesting that highly regulated hierarchical and temporal changes in gene expression may be involved. We used a multispecies DNA oligonucleotide microarray to compare whole-genome expression patterns of promastigote (sandfly vector) and amastigote (mammalian macrophages) developmental stages between L. major and L. infantum. Seven per cent of the total L. infantum genome and 9.3% of the L. major genome were differentially expressed at the RNA level throughout development. The main variations were found in genes involved in metabolism, cellular organization and biogenesis, transport and genes encoding unknown function. Remarkably, this comparative global interspecies analysis demonstrated that only 10-12% of the differentially expressed genes were common to L. major and L. infantum. Differentially expressed genes are randomly distributed across chromosomes further supporting a posttranscriptional control, which is likely to involve a variety of 3'UTR elements. This study highlighted substantial differences in gene expression patterns between L. major and L. infantum. These important species-specific differences in stage-regulated gene expression may contribute to the disease tropism that distinguishes L. major from L. infantum.BMC Genomics 02/2008; 9:255. · 4.07 Impact Factor
Article: Transcription of Leishmania major Friedlin chromosome 1 initiates in both directions within a single region.[show abstract] [hide abstract]
ABSTRACT: Almost nothing is known about the sequences involved in transcription initiation of protein-coding genes in the parasite Leishmania. We describe here the transcriptional analysis of chromosome 1 (chr1) from Leishmania major Friedlin (LmjF) which encodes the first 29 genes on one DNA strand, and the remaining 50 on the opposite strand. Strand-specific nuclear run-on assays showed that a low level of nonspecific transcription probably takes place over the entire chromosome, but an approximately 10-fold higher level of coding strand-specific RNA polymerase II (Pol II)-mediated transcription initiates within the strand-switch region. 5' RACE studies localized the initiation sites to a <100 bp region. Transfection studies support the presence of a bidirectional promoter within the strand-switch region, but suggest that other factors are also involved in Pol II transcription. Thus, while in most eukaryotes each gene possesses its own promoter, a single region seems to drive the expression of the entire chr1 in LmjF.Molecular Cell 05/2003; 11(5):1291-9. · 14.18 Impact Factor
Article: Advances in leishmaniasis.[show abstract] [hide abstract]
ABSTRACT: Governed by parasite and host factors and immunoinflammatory responses, the clinical spectrum of leishmaniasis encompasses subclinical (inapparent), localised (skin lesions), and disseminated infection (cutaneous, mucosal, or visceral). Symptomatic disease is subacute or chronic and diverse in presentation and outcome. Clinical characteristics vary further by endemic region. Despite T-cell-dependent immune responses, which produce asymptomatic and self-healing infection, or appropriate treatment, intracellular infection is probably life-long since targeted cells (tissue macrophages) allow residual parasites to persist. There is an epidemic of cutaneous leishmaniasis in Afghanistan and Pakistan and of visceral infection in India and Sudan. Diagnosis relies on visualising parasites in tissue or serology; culture and detection of parasite DNA are useful in the laboratory. Pentavalent antimony is the conventional treatment; however, resistance of visceral infection in India has spawned new treatment approaches--amphotericin B and its lipid formulations, injectable paromomycin, and oral miltefosine. Despite tangible advances in diagnosis, treatment, and basic scientific research, leishmaniasis is embedded in poverty and neglected. Current obstacles to realistic prevention and proper management include inadequate vector (sandfly) control, no vaccine, and insufficient access to or impetus for developing affordable new drugs.The Lancet 366(9496):1561-77. · 38.28 Impact Factor
Trypanosomatid comparative genomics:
Contributions to the study of parasite biology and different parasitic diseases
Santuza M. Teixeira1, Rita Márcia Cardoso de Paiva1, Monica M. Kangussu-Marcolino2
and Wanderson D. DaRocha2
1Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais,
Belo Horizonte, MG, Brazil.
2Departamento de Bioquímica e Biologia Molecular, Universidade Federal do Paraná, Curitiba, PR, Brazil.
In 2005, draft sequences of the genomes of Trypanosoma brucei, Trypanosoma cruzi and Leishmania major, also
known as the Tri-Tryp genomes, were published. These protozoan parasites are the causative agents of three dis-
tinct insect-borne diseases, namely sleeping sickness, Chagas disease and leishmaniasis, all with a worldwide
distribution. Despite the large estimated evolutionary distance among them, a conserved core of ~6,200 trypanoso-
matid genes was found among the Tri-Tryp genomes. Extensive analysis of these genomic sequences has greatly
increased our understanding of the biology of these parasites and their host-parasite interactions. In this article, we
review the recent advances in the comparative genomics of these three species. This analysis also includes data on
tional genomics. In addition to facilitating the identification of key parasite molecules that may provide a better under-
potential new drug targets and vaccine candidates for controlling these parasitic infections.
Key words: Trypanosoma brucei, Trypanosoma cruzi, Leishmania major, genome, RNAseq.
Received: August 8, 2011; Accepted: October 18, 2011.
Tri-Tryp Diseases and The Tri-Tryp Genomes
Leishmania major are unicellular protozoa of considerable
medical importance since they are the etiologic agents of
sleeping sickness (African trypanosomiasis), Chagas dis-
ease (American trypanosomiasis) and leishmaniasis, re-
mined by their insect vectors: in the case of sleeping
sickness, a blood sucking fly of the genus Glossina, also
known as the tsetse fly, for Chagas disease, a reduviid bug
tomine sandfly. While sleeping sickness occurs in sub-
Saharan Africa, Chagas disease is prevalent in Latin Amer-
ica. Leishmaniasis is considered to be endemic in 88 coun-
tries, 72 of which are developing countries in Asia, South
America and Africa. Together, these three parasitic dis-
eases represent a huge burden since approximately 0.5 mil-
lion people are infected with T. brucei, 10 million with T.
cruzi and an estimated 12 million with different species of
cies, T. brucei gambiense and T. b. rhodesiense, other spe-
cies and subspecies of African trypanosomes cause the dis-
ease known as nagana in domestic animals, imposing a fur-
ther economic burden on several African countries.
Different forms of leishmaniasis are caused by at least 20
leishmanial species: cutaneous leishmaniasis, with an esti-
mated 1.5 million cases, and visceral leishmaniasis, with
about 500,000 new cases annually, are the most common.
Although control of the arthropod vectors of these diseases
is an achievable goal and has been successful against the T.
cruzi vector in parts of Latin America, the alarming resur-
parts of Asia and Latin America is a constant reminder of
the need for better forms of chemotherapy and prevention
American trypanosomiasis and the different forms of
can befound at
Trypanosoma brucei, T. cruzi and Leishmania spp.
are hemoflagellates of the family Trypanosomatidae (order
Kinetoplastida) that is characterized by the presence of a
single flagellum and one mitochondrion containing a uni-
que organelle known as the kinetoplast which contains the
Genetics and Molecular Biology, 35, 1, 1-17 (2012)
Copyright © 2012, Sociedade Brasileira de Genética. Printed in Brazil
Send correspondence to Wanderson Duarte DaRocha. Departa-
mento de Bioquímica e Biologia Molecular, Universidade Federal
do Paraná, Caixa Postal 19046, 81531-990 Curitiba, PR, Brazil.
mitochondrial DNA (Simpson et al., 2006). Each parasite
has a complex life cycle that involves humans as one of
gent members of the Eukaryotae (Haag et al., 1998), these
parasites have peculiar aspects of gene expression, includ-
ing polycistronic transcription of most of their genomes
(Martínez-Calvillo et al., 2010), RNA polymerase I-me-
diated transcription of protein-coding genes (Gunzl et al.,
2003), RNA trans-splicing to generate mature, capped
mRNAs (LeBowitz et al., 1993) and extensive RNA edit-
ing to generate functional mRNAs transcribed from mito-
chondrial genes (Hajduk et al., 1993). Apart from their
medical relevance, these peculiar characteristics make
these parasites very interesting models for studying ge-
nome evolution and other aspects of genome function. On
the other hand, the early evolutionary divergence of these
organisms has resulted in biochemical characteristics that
are not common in higher eukaryotes, such as enzymes re-
lated to antioxidant metabolism (Olin-Sandoval et al.,
2010) as well as sterol and glycosylphosphatidylinositol
(GPI) biosynthesis (Lepesheva et al., 2011; Koeller and
Heise, 2011) that have been exploited as promising drug
Genome sequencing of Tri-Tryp parasites began in
the early 90s with the analyses of 518 expressed sequence
tags (ESTs) generated from mRNA isolated from blood-
stream forms of T. b. rhodesiense (El-Sayed et al., 1995).
was as efficient as EST analyses for discovering new genes
in the African trypanosome (El-Sayed and Donelson,
1997). In 1996, an EST analysis of cDNA libraries con-
lished (Levick et al., 1996), and the first EST analysis of T.
cruzi epimastigote forms was published in 1997 (Brandão
et al., 1997). During this period, pulsed-field gel electro-
phoretic analysis of chromosomes and the sequencing of
large DNA fragments from cosmid, bacterial artificial
chromosome and yeast artificial chromosome libraries
genomes (Blackwell and Melville, 1999). In 1999, the se-
quence of a 257-kilobase region spanning almost the entire
chromosome 1 of L. major revealed the unusual distribu-
acteristic of all Tri-Tryp genomes. The complete sequence
of L. major chromosome 1 revealed 79 protein-coding
genes, with the first 29 genes all encoded on one DNA
strand and the remaining 50 genes encoded on the opposite
strand (Myler et al., 1999).
terial operons, with protein coding genes densely packed
within directional clusters in one strand separated by strand
switch regions (i.e., changes in the coding strand) (Figu-
tiates bi-directionally between two divergent gene clusters
(Martínez-Calvillo et al., 2003, 2004) to produce polycis-
tronic pre-mRNAs that are subsequently processed. Re-
markably, with the exception of the spliced leader (SL)
promoter, no promoter is recognized by RNA polymerase
II and only a few transcription factors have been identified
(Cribb and Serra, 2009; Cribb et al., 2010). Even more sur-
prisingly, although orthologs of all conserved components
of the RNA polymerase II complex were identified in the
Tri-Tryp genome (Ivens et al., 2005), the transcription of
some trypanosomatid genes such as VSG (Variant Surface
Glycoprotein) and the procyclin genes of T. brucei, as well
as several exogenous genes transfected into T. cruzi, are
mediated by RNA polymerase I (Gunzl et al., 2003). Once
the polycistronic pre-mRNA is produced, two coupled re-
actions (trans-splicing and poly-adenylation) result in ma-
ture monocistronic transcripts.
Trans-splicing means that every mature mRNA has
an identical capped sequence of 39 nucleotides, known at
the spliced leader (SL), at the 5’ end (Liang et al., 2003).
2Teixeira et al.
Figure 1 - The Tri-Tryp life cycles. Representation of the life cycles of
Leishmania major, Trypanosoma cruzi and T. brucei, the etiological
agents of leishmaniasis, Chagas disease and sleeping sickness, respec-
tively, are shown, with the parasitic forms that are present in the insect
vectors and the mammalian hosts. Leishmania major proliferates as pro-
mastigotes (P) in the sand fly midgut. The parasite is transmitted during
bites by this fly and invades mammalian macrophages in the metacyclic
promastigote (M) form. Inside the cell, the M form is converted into
amastigotes (A) and divides before been released during cell lysis.
Trypanosoma cruzi replicates as epimastigotes (E) in the reduviid bug
the mammalian host. After differentiation into proliferative amastigotes
cell lysis and invade new cells. Trypanosoma brucei differentiates from
being transformed into infective, metacyclic forms (M) in the salivary
glands. After being injected into the host during a blood meal, M forms
and can reach the central nervous system. After increase of parasite num-
bers these last forms are replaced by non-proliferative stumpy forms (S).
Whilst no sequence consensus for polyadenylation or SL
addition has been found, several studies have demonstrated
gions guide SL addition and poly-adenylation, resulting in
mature mRNAs (LeBowitz et al., 1993) (Figure 3).
Intergenic sequences involved in the processing of T. cruzi,
T. brucei and Leishmania mRNA have been thoroughly in-
vestigated by comparing mRNA with genomic sequences,
initially using EST databases (Benz et al., 2005; Campos et
al., 2008; Smith et al., 2008) and, more recently, using
2010; Kolev et al., 2010; Nilsson et al., 2010). In addition
to providing valuable information on the mechanisms of
gene expression in these organisms, these analyses also
yielded data that allowed the optimization of transfection
vectors used to express foreign genes and genetic manipu-
lation in trypanosomatids.
quences have already provided interesting insights into the
genetic and evolutionary bases of the distinct and shared
lifestyles of these parasites. Probably the most striking
finding is that the three genomes display high levels of
synteny and share a conserved set of ~6,200 genes, 94% of
which are arranged in syntenic directional gene clusters
(El-Sayed et al., 2005a). Alignment of the deduced protein
sequences of the majority of the clusters of orthologous
genes across the three organisms reveals an average 57%
Tri-Tryp comparative genomics3
polycistronic transcription units (blue arrows: plus strand encoded open reading frames or ORFs; red arrows: minus strand encoded ORFs). In panel B, a
genomic region at around 960 kb is magnified to show the gene synteny in the genomes of various trypanosomatids (blue and red boxes correspond to +
and – strand-encoded ORFs, respectively). The orange line in both panels corresponds to the chromosome position. Sequence information used to draw
panel A and the graphic representation in panel B were obtained from the Tri-Tryp database (Aslett et al., 2010).
tween T. cruzi and L. major that reflected the expected
phylogenetic relationships (Lukes et al., 1997; Haag et al.,
1998; Stevens et al., 1999; Wright et al., 1999). The major-
ity of species-specific genes occurs on non-syntenic chro-
mosomes and consists of members of large surface antigen
families. Structural RNAs, retroelements and gene family
expansion are also often associated with breaks in the con-
servation of gene synteny (El-Sayed et al., 2005a). Multi-
gene family expansions are generally species-specific and
most pronounced in the T. cruzi genome. As discussed be-
low, a number of T. cruzi multi-gene families encode sur-
face proteins, such as trans-sialidases, mucin-associated
surface proteins (MASP) and mucins TcMUC and GP63
that likely play important roles in host-parasite interactions
Bartholomeu et al., 2009). Based on their location in re-
gions of synteny breaks these arrays may be subject to ex-
tensive rearrangements during the parasite’s evolution and
are thus directly associated with the specificities of each of
the three parasitic diseases.
The Genetic Diversity of T. Cruzi and the
Genomes of Different Parasite Strains
Chagas disease, caused by T. cruzi, is endemic in
10 million people are infected and the “domiciliation” of
risk of infection. With no vaccine or effective drug treat-
ment available, the main strategy for control must rely on
the prevention of transmission by the insect vectors and
blood transfusions. The parasite proliferates in the midgut
of several species of a triatomid hematophagous vector.
After reaching the insect’s hindgut, epimastigote forms
differentiate into non-dividing, infective metacyclic trypo-
mastigotes that are excreted in the insect’s feces. Trypo-
4 Teixeira et al.
that are separated by divergent or convergent strand-switch regions. RNA Pol II transcription start sites (TSS) are usually located upstream of the first
gene of the PTU (Martínez-Cavillo et al., 2004) or can be located as an internal TSS (Kolev et al., 2010). At the TSS (large bent arrow), the histone vari-
ants H2AZ and H2BV (Siegel et al., 2009), modified histones [K9/K14 acetylated and K4 tri-methylated histone (Respuela et al., 2008; Thomas et al.,
2009; Wright et al., 2010) and K10 acetylated histone H4 (Siegel et al., 2009)], bromodomain factor BDF3 (Siegel et al., 2009) and transcription factors
capped splice leader RNA through a trans-splicing reaction coupled to polyadenylation. These processing reactions are guided by polypyrimidine tracts
(PolyPy) that are present in every intergenic region. Mature mRNAs are exported to the cytoplasm where their stability and translation efficiencies are
largely dependent on cis-acting elements present in their untranslated region (UTR) (Araujo et al., 2011). Transcriptomic analyses also showed that
polycistronic pre-mRNAs can suffer alternative RNA processing that may result in changes in the initiator AUG, thereby altering protein translation (A),
targeting and/or function (B). Alternative splicing and poly-adenylation can also result in the inclusion/exclusion of regulatory elements present in the 5’
UTRs (C) or 3’ UTRs (D), thereby altering gene expression (Kolev et al., 2010; Nilsson et al., 2010; Siegel et al., 2010).
mucous membranes or skin lesions during feeding by the
insect. Once inside the mammalian host, trypomastigotes
invade different types of cells where they transform into
proliferative intracellular amastigotes. After a number of
entiate into trypomastigotes that are released into the
by an insect during a blood meal, they start a new cycle
population consists of a large number of strains with dis-
tinct characteristics related to morphology, growth rate,
parasitemia curves, virulence, pathogenicity, drug sensitiv-
ity, antigenic profile, metacyclogenesis and tissue tropism
(Buscaglia and Di Noia, 2003).
Despite the broad genetic diversity observed among
different strains and isolates, early studies based on differ-
ent genotyping strategies identified two major lineages in
the parasite population, named T. cruzi I and T. cruzi II
(Souto et al., 1996; Momen 1999). These divergent lin-
sylvatic cycle (T. cruzi I) and the domestic cycle (T. cruzi
II) of Chagas disease (Zingales et al., 1998), as well as dis-
tinct sylvatic host associations (Buscaglia and Di Noia,
2003). Further analyses led some authors to propose the
sub-division of T. cruzi II into five sub-groups: T. cruzi IIa,
ses of the T. cruzi strains became more confusing when ad-
ditional data indicated the existence of not just two, but
three major groups in the T. cruzi population, in addition to
hybrid strains (Miles et al., 1978; Augusto-Pinto et al.,
2003; de Freitas et al., 2006). After intense debate, in 2009
an international consensus recognized the existence of six
major strains, also known as discrete typing units (DTUs)
I-VI (Zingales et al., 2009) (Table 1). Since Chagas disease
spawns a variety of clinical forms, these studies are highly
can potentially explain differences in disease pathogenesis,
host preferences and, most importantly, provides essential
information for the identification of new drug targets and
good antigenic candidates for better diagnosis and vaccine
strains belonging to T. cruzi V and VI are the predominant
causes of human disease in South America (Zingales et al.,
2009), whereas T. cruzi I strains are more abundant among
ological and molecular factors underlying T. cruzi popula-
tion structure and the epidemiology of Chagas disease are
the genetic variability found in the T. cruzi population is an
essential aspect to be considered when analyzing this para-
CL Brener, a clone derived from a hybrid T. cruzi
strain belonging to T. cruzi VI, was chosen as a reference
strain for the initial T. cruzi genome project. The hybrid na-
ture of the CL Brener clone became clear only after the ge-
nome sequencing had begun, when analyses of nuclear and
mitochondrial sequences showed that this strain resulted
from a fusion event that had occurred between ancient ge-
notypes corresponding to strains belonging to T. cruzi II
and III groups (El-Sayed et al., 2005a; de Freitas et al.,
2006). Prior to this knowledge, the choice of the clone CL
Brener, initially classified as a member of sub-group IIe,
was based on five characteristics: (1) it was isolated from
the domiciliary vector Triatoma infestans, (2) its pattern of
infectivity in mice was very well known, (3) it had prefer-
ential tropism for heart and muscle cells, (4) it showed a
was susceptible to drugs used to treat Chagas disease (Zin-
gales et al., 1997). In addition, several genomic studies had
previously used this strain for karyotype analyses (Branche
et al., 2006) and the generation of physical maps and ESTs
from all three stages of the parasite life cycle (Cano et al.,
et al., 1998; Porcel et al., 2000; Cerqueira et al., 2005).
The T. cruzi CL Brener haploid genome, estimated to
be 55 Mb, was sequenced using the WGS (whole genome
shotgun) strategy. Because of its hybrid nature and the high
level of allelic polymorphism, a 14X coverage, much
higher than the usual 8-10X coverage, was required to dis-
tinguish the ambiguities derived from allelic variations
from those produced by sequencing errors. In contrast to
(El-Sayed et al., 2005b) was published as an assembly of
5,489 scaffolds built by 8,740 contigs. Four years later,
based on synteny maps for the T. brucei chromosomes,
Weatherly et al. (2009) assembled the T. cruzi contigs and
scaffolds initially in 11 pairs of homologous “T. brucei-
like” chromosomes and, ultimately, in 41 T. cruzi chromo-
sate during mitosis and are therefore not visualized in
metaphasic cells the predicted number of T. cruzi chromo-
somes was based on studies of pulsed-field gel electropho-
resis (PFGE) analyses (Branche et al., 2006), which turned
largely syntenic with the other Tri-Tryp (T. brucei and L.
Tri-Tryp comparative genomics5
Table 1 - Classification of T. cruzi strains.
Equivalence to former
T. cruzi I
T. cruzi I/DTU ISylvio X-10, Dm28c
T. cruzi II
T. cruzi II/DTU IIb Esmeraldo, Y
T. cruzi III
T. cruzi III/DTU IIcCM17
T. cruzi IVDTU IIaCanIII
T. cruzi Vc
T. cruzi VIc
DTU IIeCL Brener
DTU = discrete typing unit.aZingales et al. (2009),bMomem (1999) (T.
cruzi I and II classification), Brisse et al. (2000) (DTU I, IIa-e), de Freitas
et al. (2006) (T. cruzi I, II and III),cHybrid strains.
major) genomes, with most species-specific genes, such as
surface protein gene families, occurring in internal and
(El-Sayed et al., 2005a).
represented by a redundant dataset since homologous re-
bled separately, generating two set of contigs, each corres-
ponding to one haplotype. To identify the two haplotypes,
reads from the genome of the cloned Esmeraldo strain, a
member of T. cruzi II, and representing one of the CL
Brener parental strain (de Freitas et al., 2006), were gener-
the two haplotypes are referred to as “Esmeraldo-like” or
“non-Esmeraldo-like” sequences (Aslett et al., 2010).
The haploid CL Brener genome has an estimated
12,000 genes. As with the other Tri-Tryps, the T. cruzi
genes are organized in long polycistronic clusters that are
transcribed by RNA polymerase II and processed into
monocistronic mRNAs that accumulate differentially dur-
plete sequence of the T. cruzi genome was the dramatic ex-
pansion of families encoding surface proteins (El-Sayed et
al., 2005a). Compared to T. brucei and L. major, T. cruzi
has the largest set of multi-gene families, perhaps because
of its unique capacity to invade and multiply within differ-
ent types of host cells. Long terminal repeat (LTR) and
non-LTR retroelements and other sub-telomeric also con-
tribute to the large proportion of repetitive sequences (50%
of the genome) in this genome. The largest protein gene
family encodes a group of surface proteins known as trans-
cules identified as virulent factors of T. cruzi that are re-
sponsible for transferring sialic acid from host sialogly-
coconjugates to the terminal ß-galactose on T. cruzi
mucins. Mucin-associated surface proteins (MASP) are the
second largest T. cruzi gene family, with a total of 1,377
members. Although MASP sequences correspond to ~6%
of the parasite diploid genome, they were only identified
during annotation of the T. cruzi genome. MASPs are
glycosylphosphatidylinositol (GPI)-anchored surface pro-
teins that are preferentially expressed in trypomastigotes;
C-terminal domains and a strikingly variable and repetitive
central region (Bartholomeu et al., 2009). Together with
account for ~17% of all protein-coding genes and are orga-
nized as dispersed clusters of tandem and interspersed re-
Other large families consist of the previously de-
scribed RHS and DGF-1 genes whose functions are un-
known and which, like the TS genes, occur mostly at
sub-telomeric locations. Examples of other gene families
with more than 10 members also present in the T. cruzi ge-
nome include glycosyltransferases, protein kinases and
phosphatases, kinesins, amino acid transporters and heli-
cases, in addition to several gene families encoding hypo-
thetical proteins (El-Sayed et al., 2005a). The collapse of
nearly identical repeats in some gene families, such as the
gene cluster encoding ?- and ?-tubulins, meant that not all
copies of the family were included in the original genome
Arner et al. (2007) described an analysis of the total
concluded that 18% of all protein coding sequences existed
in 14 or more copies. In addition to the need to evade the
families in the T. cruzi genome in which a large number of
gene copies can lead to the enhanced expression of various
proteins may help to overcome a major problem in this ge-
erating high levels of mRNA from single copy genes. It is
also likely that many of the striking polymorphisms among
T. cruzi isolates that are reflected in several epidemiologi-
tributable to variability within regions containing gene
families. Whole genome comparisons of distinct T. cruzi
eral groups began sequencing the genome of representative
strains of other major T. cruzi lineages. As indicated above,
the hybrid nature of the CL Brener genome provided data
for two genomes, with “Esmeraldo-like” and “non-Esme-
from T. cruzi II and III groups, respectively (see Table 1).
Recently, Franzén et al. (2011) published a draft genome
sequence of Sylvio X10, a strain belonging to T. cruzi I
group, which is the predominant agent of Chagas disease in
Central America and in the Amazon. Although rarely iso-
lated from humans in endemic areas in southern countries
of Latin America where most cases of Chagas disease with
mega-syndromes occur, T. cruzi I strains are highly abun-
dant among wild hosts and vectors (Zingales et al., 1998;
Buscaglia and Di Noia, 2003). Thus, the distinct ecological
niches occupied by T. cruzi I and II strains, together with
the fact these strains are highly divergent in terms of phylo-
genetic analysis, prompted Franzén et al. (2011) to se-
quence the genome of a representative of T. cruzi I group
In agreement with previous analyses, the Sylvio X10
genome was estimated to be ~44 Mb in size, i.e., smaller
than the CL Brener genome. Indeed, smaller genomes
seems to be a general feature of T. cruzi I strains (Branche
et al., 2006; Franzén et al., 2011). As expected, the archi-
tectures of the two genomes were very similar, with highly
conserved syntenic regions corresponding to the gene-
dense “core” of the coding regions organized for long
6 Teixeira et al.
the presence of repetitive sequences meant that the Sylvio
X10 genome was represented as fragmented contigs. The
technical difficulties associated with the assembly of repet-
itive sequences meant that only about 49% of the generated
Sylvio X10 sequence data was incorporated into contigs,
leaving 710,109 reads that were not included in the assem-
bly. Consequently, the draft genome of Sylvio X10 was as-
sembled into 7,092 contigs, which is slightly less than the
number of contigs reported for the draft genome of CL
Brener. The alignment of these contigs to both CL Brener
haplotypes showed that the mean nucleotide identity was
greater between Sylvio X10 and non-Esmeraldo (98.2%)
than between Sylvio X10 and Esmeraldo (97.5%). This
ing that sequences from T. cruzi I strains are more closely
related to T. cruzi III (represented by the non-Esmeraldo
Esmeraldo-like haplotype) (Cerqueira et al., 2008; Ruval-
caba-Trejo and Sturm, 2011).
In contrast to the hybrid CL Brener genome, for
which the amount of heterozygosity in the core genome
was estimated to be 5.5% (El-Sayed et al., 2005a), the dip-
loid Sylvio X10 genome was homozygous (< 0.08%
heterozygosity). Most importantly, analysis of the core
gene content of CL Brener and Sylvio X10 revealed six
open reading frames that were missing in the Sylvio X10
sequence reads indicated that several multicopy gene fami-
lies, including DGF, mucin, MASP and GP63 contained
substantially fewer genes in Sylvio X10 than in CL Brener.
A 5.9 Mb size difference between the Sylvio X10/1 and CL
Brener genomes largely reflected the expansion of these
gene families. However, the extent to which these genomic
variations are related to strain differences in host prefer-
ence and the ability to cause Chagas disease remains to be
The advent of next-generation sequencing technolo-
gies has ushered in a new era in comparative sequencing by
allowing the exploration of a wide range of evolutionary
and pathological questions within the T. cruzi lineage. Sev-
T. cruzi isolates. A consortium of laboratories funded by
the National Institutes of Health/National Institutes of Al-
lergy and Infectious Diseases (NIAID) and the National
T. cruzi strains representative of each one of the six main
groups, such as Esmeraldo (T. cruzi II), 3869 (T. cruzi III),
Can III (T. cruzi IV), NRcl3 (T. cruzi V) and Tula cl2 (T.
cruzi VI) (N. El-Sayed, personal communication). Our lab-
oratory has been involved in the sequencing of another T.
cruzi I strain (Dm28c) and CL-14, a non-virulent strain that
belongs to the T. cruzi VI group (S. Teixeira, unpublished).
In contrast to CL Brener, BALB/c mice injected with
CL-14 trypomastigotes showed no parasitemia but devel-
trypomastigotes from the CL Brener or Y strains (Lima et
al., 1995). Our goal in this work is to use comparative anal-
yses of the CL Brener and CL-14 genomes to identify po-
tential sequences that can restore the virulence of CL-14
and then test these in transfection protocols.
In addition to investigations of the nuclear genome,
several studies have examined the mitochondrial genome
of kinetoplastids which contains a mass of concatenated
DNA known as kinetoplast DNA (kDNA) that is easily
identified near the insertion of the flagellum (Brener,
1973). In T. cruzi, kDNA consists of a highly structured
disk-shaped network of thousands of concatenated mini-
circles 0.5-10 kb in size and dozens of concatenated
are present exclusively in kinetoplastids, maxicircles are
the homologues of mtDNA molecules found in other euka-
ryotes (Lukes et al., 1997). Following publication of the T.
cruzi genome, Westenberger et al. (2006) described the
complete sequences of maxicircle DNAs corresponding to
groups T. cruzi II (from sequences of the Esmeraldo strain)
somatid mitochondrial genes, sequence analyses showed
most of its genes that were corrected at the RNA level by a
complex U-insertion/deletion process known as RNA edit-
ing (Hajduk et al., 1993). Key elements of this repair pro-
cess include gRNAs (guide RNAs) which are encoded
also present in maxicircles. The gRNAs hybridize to the 3’
end of a target message and undertake direct U insertion
and deletion by the so-called editosome machinery (Stuart
and Panigrahi, 2002).
The complete sequences of the 25 kb T. cruzi maxi-
circles revealed 18 tightly clustered mitochondrial pro-
tein-coding genes and two rRNA genes that were syntenic
with previously sequenced maxicircles of T. brucei and
Leishmania tarentolae. Fifteen of the 18 protein-coding
genes were edited. Outside the coding region, strain-spe-
cific repetitive regions and a variable region that was
unique for each strain were identified (Westenberger et al.,
2006). More recently, comparative analyses of the mito-
chondrial genomes of T. cruzi I, II and III were reported af-
ter Ruvalcaba-Trejo and Sturm (2011) generated the se-
quence of the coding region of the maxicircle from Sylvio
X10. In agreement with the nuclear genomic analysis,
ported a close evolutionary relationship between T. cruzi I
and III. Based on their mitochondrial DNA analyses, these
authors proposed a model in which an ancestral strain be-
of a TcI-TcII hybridization event that resulted in the gener-
ation of T. cruzi III and T. cruzi IV strains. A subsequent
‘back-cross’ hybridization between T. cruzi II and T. cruzi
III strains resulted in the T. cruzi V and VI strains, such as
Tri-Tryp comparative genomics7
CL Brener, that carry the maxicircle from their T. cruzi III
Comparative Genomics of Leishmania Species
That Cause Distinct Forms of Leishmaniasis
the bites of phlebotomine sand flies that are endemic in
tropical and subtropical regions worldwide. More than 20
species are responsible for a wide spectrum of diseases,
known as leishmaniasis (Murray et al., 2005). Parasites in
this genus are classified into two subgenera according to
the part of the sandfly gut where colonization and develop-
ment occur: the subgenus Leishmania (Leishmania) con-
sists of parasites with mid and foregut development,
whereas the subgenus Leishmania (Viannia) consists of
1977; Bates, 2007). Depending on the species of
ical forms of leishmaniasis with symptoms ranging from
self-healing cutaneous lesions (L. major/L. tropica/L.
mexicana) to fatal visceral leishmaniasis (L. donovani/L.
infantum/L. chagasi). Infection by Leishmania can also re-
sult in mucosal leishmaniasis (mainly caused by L.
braziliensis) and diffuse cutaneous leishmaniasis (mainly
caused by L. amazonensis/L. guyanensis/L. aethiopica)
(Desjeux, 1996). In addition to the species of Leishmania,
other factors such as the genetic variability of the human
host may determine the disease tropism and clinical mani-
festations in leishmaniasis (Blackwell et al., 2009;
Sakthianandeswaren et al., 2009).
The World Health Organization (WHO) estimates
that there are over two million new cases of leishmaniasis
tracting this disease in 88 countries on five continents
(Asia, Africa, Europe, North America and South America)
Leishmania spp. alternate between the alimentary tract of
the sandfly vector, where they grow as extracellular flagel-
lated promastigotes and differentiate into infective non-
dividing metacyclic forms, and the phagolysosome of the
vertebrate host macrophages, where they differentiate into
aflagellated, replicative amastigotes (Figure 1). There is no
effective vaccine against Leishmania and the available
therapeutic arsenal is extremely limited (Mauel, 2002).
Thus, completion of the genome sequences of several
Leishmania species (Ivens et al., 2005; Peacock et al.,
2007) represents a long awaited aspiration for groups in-
volved in the discovery and development of new drugs and
Leishmania major Friedlin was chosen as the
Leishmania reference strain for the Tri-Tryp genome pro-
ject. The L. major haploid genome (~32.8 Mb) is distrib-
uted among 36 relatively small chromosomes ranging from
0.28 to 2.8 Mb in size (Wincker, 1996) and was sequenced
lication of the Tri-Tryp genome sequence, the complete se-
quences of chromosomes 1 and 3 from L. major were
published (Myler et al., 1999; Worthey et al., 2003) and an
optical map of the entire genome was generated (Zhou et
al., 2004). In 2007, the complete genomes of two other
Leishmania species, L. infantum and L. braziliensis, were
also described (Peacock et al., 2007). Leishmania
infantum, also known as L. chagasi in Latin America, was
chosen as the second Leishmania species to have its ge-
nome sequenced on the basis of its virulence in animals,
transmissibility in sandflies and adaptability to laboratory
experimentation (Denise et al., 2006). This species is the
causative agent of visceral leishmaniasis, the most serious
form of the disease and frequently fatal if left untreated.
The New World species L. braziliensis, within the subge-
nus L. (Viannia), is the third and most divergent species se-
quenced. The L. infantum and L. braziliensis genome
proach with five- and six-fold coverage, respectively (Pea-
are from strains that cause distinct types of leishmanial dis-
eases, are adapted for maintenance and manipulation in the
laboratory and are also frequently used in studies in vitro
and in animal models of infection (Laurentino et al., 2004;
Ivens et al., 2005; Denise et al., 2006). The complete se-
quences of all three Leishmania genomes can be accessed
in the Tri-Tryp database; sequencing of the genome from a
fourth species (L. mexicana) is in progress.
Surprisingly, comparative genomic studies of the
three evolutionarily and geographically distinct species, L.
major and L. infantum (Old World species) and L.
braziliensis (New World species), showed very little diver-
gence among the species in terms of genomic sequence and
organization (Peacock et al., 2007; Lynn and McMaster,
2008), despite a divergence of 20-100 million years within
the Leishmania genus (Lukes et al., 1997). The haploid ge-
nome of the three species has an estimated 8,300 genes,
with more than 99% of them maintaining synteny in the
three Leishmania genomes (Peacock et al., 2007). As with
the other Tri-Tryps, the majority of Leishmania genes are
annotated as genes of unknown function. Of the 8,300
genes, only 200 were identified as being differentially dis-
tributed between the three genomes. The L. braziliensis ge-
nome possesses 47 genes that are absent from the other two
species, while 27 and 5 genes are specific for the L.
infantum and L. major genomes, respectively. Such high
conservation in overall genome sequences and genome
ble and has not undergone large genomic re-arrangements
species-specific parasite genes may contribute to differen-
tial pathogenesis and tissue tropism (Peacock et al., 2007;
Smith et al., 2007).
8Teixeira et al.
Thus far, the only gene for which there is experimen-
tal evidence indicating direct involvement in the differen-
tial tropism among Leishmania diseases is the A2 locus.
Initially identified as an amastigote-specific gene family in
asite virulence and visceralization (Zhang et al., 2003).
Multiple copies of the A2 gene alternating with a distinct
eral species of the L. donovani group that is responsible for
visceral diseases. Although its precise function is still un-
known, the product of the A2 gene, an endoplasmic reticu-
lum protein with a large repetitive domain, may be related
to the parasite stress response (McCall and Matlashewski,
2010). In L. major, which does not cause visceral
leishmaniasis, the A2 gene is a pseudogene and introduc-
tion of the L. donovani A2 gene into L. major enhanced the
ability of L. major to survive in visceral organs of suscepti-
ble BALB/c mice (Zhang et al., 2003). More recently, the
expression of A2 in L. tarentolae, a lizard parasite that is
not pathogenic in mammals, significantly increased the
in the liver of BALB/c mice (Mizbani et al., 2011). In addi-
tion to experiments suggesting a possible role for the A2
gene in the differential tropism of cutaneous and visceral
leishmania parasites, the A2 gene has been used as a prom-
ising vaccine candidate and diagnostic antigen (Fernandes
et al., 2008).
sion and that varies considerably among the three
Leishmania genomes is the GP63 locus. Also known as the
surface metalloproteases expressed in all trypanosomatids
examined so far (Yao et al., 2003). GP63 sequences identi-
fied in the three Leishmania genomes and in the T. cruzi
and T. brucei genomes vary in their gene copy number, and
this may have implications for differences in the disease
phenotype (Voth et al., 1998). A similar conclusion may
apply to the amastin multi-gene family that encodes a fam-
ily of amastigote-specific, highly glycosylated hydropho-
bic surface proteins also present in T. cruzi but which has
al., 1995; Jackson, 2010). Interestingly, in the T. brucei ge-
nome, which does not have an intracellular stage, only two
copies of a highly divergent amastin sequence are present
(Jackson, 2010). The identification of these species-
specific genes represents an initial step towards the charac-
terization of parasite factors that may determine the speci-
ficities of each type of parasitic infection. On the other
hand, antigens common to all Leishmania species could be
used as potential vaccine candidates (Peacock et al., 2007).
However, apart from differences in gene sequences, it is
the different parasitic diseases may be a consequence of
differential gene expression that occurs throughout the var-
ious stages of life cycle in each parasite species.
As discussed above, Leishmania genes are arranged
in the genome as directional gene clusters that resemble
prokaryotic polycistronic transcription units (Martínez-
Calvillo et al., 2004). This type of gene organization and
polycistronic transcription have profound implications on
the regulation of gene expression, which must rely on
post-transcriptional mechanisms (Boucher et al., 2002;
Since the dependency on promoter-based transcription ini-
tiation mechanisms for the control of mRNA levels is
greatly reduced, greater emphasis is placed on post-
transcriptional regulatory mechanisms controlling mRNA
stability and translation, as well as protein turnover (Clay-
phology, biochemical properties and disease phenotypes,
ingly few differences in gene expression when mRNA lev-
els in different life-cycle stages or in the same stage but in
different Leishmania species were compared (Peacock et
al., 2007). Global interspecies analyses in L. major and L.
infantum have shown that only 10%-12% of differentially
expressed genes are unique to each species (Rochette et al.,
2008; Depledge et al., 2009). A more careful examination
of the protein expression patterns and the elucidation of
regulatory mechanisms will provide unique insights into
this important aspect of the parasite’s biology. This infor-
mation will also improve our understanding of the host-
parasite interaction and lead to the development of new
strategies for contolling leishmaniasis.
Eukaryotic genomes contain an abundance of re-
peated DNA and some of these repeated sequences are mo-
bile elements. Transposable elements (TEs) are defined as
DNA sequences that are able to move from one location to
occupy a high proportion of a species’ genome. Retro-
posons, also known as non-long-terminal-repeat (LTR)
retrotransposons, are ubiquitous elements that transpose
of most eukaryotes (Ivens et al., 2005; Peacock et al.,
2007). Trypanosoma brucei and T. cruzi contain long au-
tonomous retroposons of the ingi clade (Tbingi and L1Tc,
respectively) and short nonautonomous truncated versions
(TbRIME and NARTc, respectively), as well as degenerate
that represent the most abundant transposable elements in
these genomes (< 3% of the nuclear genome).
In contrast, L. major contains only remnants of ex-
tinct retroposons (LmDIREs) and short nonautonomous
heterogenous elements (LmSIDERs). Recently, small de-
generate retroposons (< 0.55 kb) containing the “79-bp sig-
nature” known as LmSIDERs (for short interspersed
degenerate retroposons) have also been identified in the
braziliensis (Peacock et al., 2007). Unexpectedly, in L.
Tri-Tryp comparative genomics9
SLACS/CZAR, which is associated with tandemly re-
peated spliced leader sequences in an arrangement similar
to that of the SLACS or CZAR element in T. brucei or T.
cruzi, respectively, is fully active (Aksoy et al., 1987;
Villanueva et al., 1991). However, in contrast to the Afri-
can trypanosome genomes and similar to L. major (Ivens et
retroposons were detected in the L. braziliensis genome
(Bringaud et al., 2009).
Mobile elements are involved in creating mutations
and genomic rearrangements and, in many eukaryotes,
these effects can be regulated through an RNA silencing
mechanism such as RNA interference (RNAi) (Shi et al.,
2004b; Girard and Hannon, 2008). Since first reported in
1998 (Fire et al., 1998), RNAi has swept through all fields
of eukaryotic biology and has proven to be a very useful
which the introduction or expression of short double-
stranded RNAs leads to the rapid destruction of cognate
number of advantages: it is fast, requires very little se-
quence information, and reduces the expression of multiple
gene copies, an action that is especially advantageous in
asexual diploid organisms (LaCount et al., 2000; Shi et al.,
2000; Wang et al., 2000).
Soon after it was described in C. elegans, RNAi was
rapidly identified in T. brucei (Ngo et al., 1998). As dis-
cussed below, RNAi has proven to be a powerful new tool
for functional genomic studies in T. brucei. Unexpectedly,
experimental evidence as well as searches of genome data-
bases quickly showed that RNAi is absent in L. major and
T. cruzi (Robinson and Beverley, 2003; DaRocha et al.,
2004). It came as a surprise when Peacock et al. (2007) re-
involved with RNAi. These authors showed that L.
braziliensis genome contains orthologs carrying domains
characteristic of the Dicer protein as well as genes with the
typical argonaute domains PAZ and PIWI, the latter con-
taining conserved amino acid residues that are essential for
10Teixeira et al.
DICER into small interfering dsRNA molecules (siRNA) 19-23 nt long. siRNAs associate with Argonaute (AGO), the catalytic core of the RISC
(RNA-induced silencing complex), with one siRNA strand being released and the guide siRNA strand then mediating the degradation or inhibiting the
genetic analysis of trypanosomatid species based on the predicted protein sequences of the housekeeping gene GAPDH (glyceraldehyde-3-phosphate
dehydrogenase) and ubiquitin. The neighbor-joining tree was generated using sequences found in the parasite genome databases (www.genedb.org) and
MEGA4 software. The absence or presence of genes encoding components of active RNAi machinery (as identified by Lye et al., 2010) is indicated on
functional TbAGO1 (Shi et al., 2004a). In addition, an
N-terminal RGG domain, present in TbAGO1 and shown
to be essential for its association with polyribosomes, is
also present in the L. braziliensis Ago1 gene (Shi et al.,
2004a). More recently, Lye et al. (2010) demonstrated that
the RNAi pathway is functional in L. braziliensis and in
other species within the Leishmania subgenus Viannia (L.
guyanensis and L. panamensis) (Figure 4B). Thus, this di-
vergent species of Leishmania appear to have retained not
only the mechanisms for (RNAi)-mediated regulation but
also potentially active retroposons (Peacock et al., 2007),
which might have assisted to create the greater divergence
within the L. braziliensis genome compared with the other
Leishmania species. For molecular parasitologists, these
findings came as very good news since, as shown in T.
brucei, the efficacy of RNAi knockdown as a tool for sys-
tematic analysis of gene function is now also applicable in
some species of Leishmania.
Functional Genomics of T. Brucei and Gene
Expression Studies in a High Throughput Era
Infection by T. brucei occurs after metacyclic trypo-
mastigotes are injected into the bloodstream by the bite of a
into long slender trypomastigotes (bloodstream form -
BSF) that divide, colonize the body fluids and can trans-
form into a nonproliferative bloodstream form, also known
as the short stump form. After a blood meal, trypomas-
tigotes acquired by the insect vector differentiate into pro-
to salivary glands where they transform into metacyclic
trypomastigotes (Matthews, 2005) (Figure 1). The ability
neurological manifestations associated with sleeping sick-
In contrast to T. cruzi and Leishmania, T. brucei de-
rect exposure to a strong antibody response in the
bloodstream requires a sophisticated immune evasion pro-
tocol, known as variant surface protein (VSG) switching
(Pays et al., 2007). VSGs are bloodstream-specific surface
proteins with a hypervariable N-terminal domain that is ex-
posed extracellularly and a more conserved C-terminal do-
main buried in the parasite’s surface coat (Van der Ploeg et
al., 1982; Turner and Barry, 1989). Since they are tightly
packed at the surface, VSGs are able to shield other invari-
tactic for evasion is based on antigenic variation in which
the expression of a specific VSG is replaced by another
gene from a supply of almost thousand copies at a rate of
approximately one event per 100 cell divisions (Van der
Ploeg et al., 1982; Turner and Barry, 1989). VSG genes are
expression distinguishes African trypanosomes from the
other two groups of pathogenic trypanosomatids; the lack
of antigenic variation in T. cruzi and Leishmania spp.
tem by entering and multiplying inside host cells.
After completion of the T. b. brucei genome, it was
found that of the 9,068 predicted genes, including 904
pseudogenes, there were 1,700 T. brucei-specific genes,
806 of them (or 9% of the whole genome) corresponding to
VSGs (El-Sayed et al., 2005b). Several VSG genes occur
within hundreds of mini-chromosomes (50-150 kb in size)
that are also part of the T. brucei genome. Surprisingly,
analyses of all VSG sequences showed that only 57 are
fully functional genes and have all the recognizable fea-
tures of a typical VSG (Berriman et al., 2005). It has been
long known that VSG variability can be generated in T.
brucei by creating mosaic genes that include pseudogenes
(Roth et al., 1989). Besides producing variability in VSG
epitopes, these rearrangements may have the capacity to
generate protein with new functions, as in the case of the
SRA gene discussed below (Vanhamme et al., 2003).
As part of their “life in the bloodstream”, African
trypanosomes that are pathogenic to humans have devel-
oped a mechanism to withstand other types of attack from
the host immune system. In contrast to T. brucei brucei,
which do not infect humans, T. b. gambiense and T. b.
ent in normal human serum (NHS). TLF activity has been
attributed to apolipoprotein L1 (ApoL1) and haptoglobin
review, see Wheeler, 2010). NHS resistance in T. b.
rhodesiense is conferred by a truncated VSG known as se-
endosomes and binds and neutralizes TLF (De Greef and
Hamers, 1994; De Greef et al., 1992; Vanhamme et al.,
2003; Pérez-Morga et al., 2005; Wheeler, 2010).
Trypanosoma b. gambiense lacks the SRA gene but still in-
fects humans. The killing of T. b. brucei requires the bind-
ing of TLF-1 and trafficking to the parasite acidic
lysosome. It has been proposed that, in T. b. gambiense,
this T. brucei sub-species to attack by the human innate im-
mune system (Kieft et al., 2010).
Since T. b. brucei is harmless to humans and T. b.
gambiense is the most clinically relevant subspecies, Jack-
son et al. (2010) sequenced the genome of T. b. gambiense
using the whole-genome shotgun approach combined with
bacterial artificial chromosome large insert sequencing.
Comparison of the sequences from a draft genome assem-
bly of 281 contigs (~22.1 Mb) with the T. b. brucei genome
data (Berriman et al., 2005) revealed a very similar compo-
sition in terms of gene content, gene synteny and sequence
identity (86.4% of T. b. gambiense coding sequences vary
by < 1% from their orthologs in T. b. brucei). Even though
these subspecies behave differently in humans, there were
Tri-Tryp comparative genomics 11
very few differences in their genomes. One of these differ-
ences involved a gene encoding a putative iron-ascorbate
oxidoreductase that is specific to a few T. b. brucei strains
cruzi). Thus, it seems that not only differences in gene con-
tent per se, but also individual single nucleotide poly-
morphisms (indels) and variations in gene expression, or a
combination of these factors, may contribute to this pheno-
typic variation (Jackson et al., 2010).
Coordinated changes in gene expression are vital for
trypanosomes since they must deal with rapid changes in
their environment (nutrient availability, temperature, host
defenses, the presence of drug, etc.) and be able to dissemi-
nate by alternating through different hosts. In addition,
unique mechanisms for regulating gene expression are re-
quired since there is an almost complete absence of trans-
criptional control at the level of initiation. The lack of
transcriptional regulatory elements, including a typical
RNA polymerase II promoter, led to the conclusion that
most factors involved in controlling gene expression act at
the post-transcriptional level. By using approaches such as
differential display, RNA fingerprinting, differential
screening of cDNA libraries, random sequencing of cDNA
clones and DNA microarrays, several groups have shown
that significant changes in mRNA levels occur during the
al., 1995; Mathieu-Daudé et al., 1996; Diehl et al., 2002;
Cerqueira et al., 2005; Kabani et al., 2009). Experimental
evidence also indicates that, in addition to mRNA stability,
changes in polysomal mobilization constitute an important
mechanism for regulating gene expression (Alves et al.,
2010). With the advent of next generation sequencing tech-
nologies such as RNAseq, a global description of gene ex-
pression patterns in T. brucei has finally been achieved
(Kolev et al., 2010; Nilsson et al., 2010; Siegel et al., 2010;
Veitch et al., 2010). This approach has provided much
more complete information about mRNA structure and ex-
pression levels, including the patterns of spliced leader
(SL) and poly-A additions, as well as alternative pre-
analyses have confirmed that only two T. brucei genes
(poly-A polymerase and DNA/RNA helicase) contain
Nilsson et al. (2010) showed that 2,500 alternative
splicing events occur during processing of T. brucei genes,
with a large number of these being regulated during the life
cycle (a total of 600 genes have transcripts with more than
one trans-spliced variant). The alternative splicing reac-
tions can alter the message dramatically, as shown by
Nilsson et al. (2010) and represented in Figure 3, with
changes in the SL addition site leading to alterations in reg-
ulatory elements in the 5’UTR (resulting in the modifica-
tion of gene expression) or the initiator AUG (generating a
different N-terminus or even causing a complete change in
the translated ORF). Aminoacyl tRNA synthetase tran-
scripts are interesting examples of trans-spliced variants
since differences in the enzyme N-terminus result in
changes in the protein targeting signal: distinct mRNAs
produce proteins that are directed to mitochondria or re-
main in the cytoplasm (Nilsson et al., 2010). Likewise, al-
ternative splicing is a potential mechanism for dual
localization of the T. cruzi LYT1 gene (Benabdellah et al.,
2007). In addition to providing a mechanism for generating
changes in the proteome, such flexibility in the selection of
transcription may also initiate at internal sites in gene clus-
cur at various points in the primary transcript (Kolev et al.,
The mechanisms involved in transcription initiation
in trypanosomes have always been an intriguing question.
Recently, chromatin immunoprecipitation in combination
with conventional Sanger sequencing (Respuela et al.,
2008) or next-generation sequencing technology (Siegel et
vided a much awaited analysis of the distribution patterns
of modified histones throughout the T. brucei genome. As
summarized in Figure 3, the divergent strand-switch re-
gions (SSR) have been found to contain an enrichment of
histone variants (H2AZ and H2BV), acetylated histone 4 at
lysine 10 (H4K10ac), histone 3 acetylated at residues K9
and K14 and trimethylated at K4, the bromodomain factor
BDF3, and the transcription factors TRF4 and SNAP50
(Respuela et al., 2008; Siegel et al., 2009; Thomas et al.,
2009; Wright et al., 2010). The histone variants H3V and
H4V are present at transcription termination sites (Siegel et
al., 2009). Similar patterns of chromatin modifications
from transcription start sites (TSS) located at the beginning
polycistronic unit, suggesting the presence of internal TSS,
in agreement with RNAseq data (Siegel et al., 2009; Kolev
et al., 2010; Wright et al., 2010).
The increasing number of published genomes and
transcriptomic data, partly as a consequence of the intro-
duction of new sequencing platforms, has created enor-
mous challenges for the field of functional genomics. One
of the most useful techniques that has been used to identify
gene function in T. brucei is a combination of the inducible
system mediated by T7 RNA polymerase and the tetracy-
cline repressor (Wirtz et al., 1999) in combination with
RNAi gene knockdown. As indicated before, in contrast to
T. cruzi and L. major, RNAi is functional in T. brucei (Ngo
et al., 1998; Robinson and Beverley 2003; DaRocha et al.,
2004; Lye et al., 2010). The first large scale functional
genomics study using RNAi was reported by Morris et al.
(2002), who transfected T. brucei procyclic forms with a
random RNAi library. This library was generated by using
a construct to inducibly express dsRNA via two head-
to-head tetracycline-regulated T7 RNA polymerase pro-
moters. By selecting parasites that expressed the dsRNA
12Teixeira et al.
but were unable to bind lectin, these authors identified
clones with a reduced glycosylation of surface proteins and
showed that these clones had lower expression of hexo-
kinase (Morris et al., 2002).
A few years later, the functions of 197 ORFs from T.
b. brucei chromosome I were tested by RNAi. RNAi-
induced parasites were tested for growth, nuclear and
kinetoplast abnormalities by DAPI staining, and a pleio-
tropic phenotype, morphology and motility. At least one of
these phenotypes was found in 68 individual knockdowns
et al. (2009) identified novel components of the para-
flagellar rod (PFR) structure by comparing proteomic pro-
files before and after ablating the expression of individual
ated by two-dimensional difference gel electrophoresis,
and the spots with a two-fold change in volume were sub-
ysis combined with RNAi knockdown led to the identifica-
tion of 30 proteins as potential PFR components, 20 of
which were novel proteins. High-throughput cloning sys-
tems also enhance functional analyses. By using the Gate-
way®technology to create yeast two-hybrid vectors, eight
non-redundant protein-protein interactions were detected
among proteins with a PFR structure (Lacomble et al.,
2009). These authors were able to construct a map showing
the complex interaction of PFR proteins.
The availability of efficient methods for genetic ma-
nipulation and for testing gene function through RNAi
knockdown has led to major advances in genomic, trans-
this species a model organism for studying basic aspects of
trypanosomatid biology. More recently, with the discovery
of functional RNAi machinery in L. braziliensis and other
Leishmania species, gene function studies can now be con-
ducted in this group and will soon provide valuable new in-
formation about Leishmania-specific genes. However, as is
becoming increasingly apparent from the data generated by
comparative genomic analyses, each of the trypanosomatid
species has its peculiarities. Researchers thus face the chal-
parasite and its corresponding diseases. With our current
knowledge of each Tri-Tryp disease and the new research
methods that are being developed we can expect many new
studies to emerge from hypotheses based on the Tri-Tryp
genomic data. As a consequence of these new findings,
better methods of controlling and preventing these diseases
The work from SMT and RMCP was funded by
Conselho Nacional de Desenvolvimento Científico e Tec-
nológico (CNPq), Fundação de Amparo a Pesquisa do
Estado de Minas Gerais (FAPEMIG), Instituto Nacional de
Ciencia e Tecnologia de Vacinas (INCTV) and the Howard
Hughes Medical Institute (HHMI). The work from WDR
and MMKM was supported by FAPEMIG, Fundação
Araucária de Apoio ao Desenvolvimento Científico e Tec-
nológico do Paraná (Fundação Araucária), CAPES/Reuni,
PPSUS/MS and CNPq.
Aksoy S, Lalor TM, Martin J, Van der Ploeg LH and Richards FF
(1987) Multiple copies of a retroposon interrupt spliced
leaderRNA genes in
Trypanosoma gambiense. EMBO J 6:3819-3826.
Alves LR, Avila AR, Correa A, Holetz FB, Mansur FC, Manque
PA, de Menezes JP, Buck GA, Krieger MA and Goldenberg
of proteins with translated mRNAs in Trypanosoma cruzi.
Araujo PR, Burle-Caldas GA, Silva-Pereira RA, Bartholomeu
DC, DaRocha WD and Teixeira SM (2011) Development of
a dual reporter system to identify regulatory cis-acting ele-
ments in untranslated regions of Trypanosoma cruzi
mRNAs. Parasitol Int 60:161-169.
Arner E, Kindlund E, Nilsson D, Farzana F, Ferella M, Tammi
MT and Andersson B (2007) Database of Trypanosoma
cruzi repeated genes: 20,000 additional gene variants. BMC
Aslett M, Aurrecoechea C, Berriman M, Brestelli J, Brunk BP,
Carrington M, Depledge DP, Fischer S, Gajria B, Gao X, et
al. (2010) TriTrypDB: A functional genomic resource for
the Trypanosomatidae. Nucleic Acids Res 38:D457-D462.
Augusto-Pinto L, Teixeira SM, Pena SD and Machado CR (2003)
MSH2 gene support the existence of three phylogenetic lin-
eages presenting differences in mismatch-repair efficiency.
Baida RC, Santos MR, Carmo MS, Yoshida N, Ferreira D, Fer-
reira AT, El Sayed NM, Andersson B and da Silveira JF
(2006) Molecular characterization of serine-, alanine-, and
proline-rich proteins of Trypanosoma cruzi and their possi-
ble role in host cell infection. Infect Immun 74:1537-1546.
Bartholomeu DC, Cerqueira GC, Leão AC, DaRocha WD, Pais
FS, Macedo C, Djikeng A, Teixeira SM and El-Sayed NM
(2009) Genomic organization and expression profile of the
mucin-associated surface protein (masp) family of the hu-
man pathogen Trypanosoma cruzi. Nucleic Acids Res
tigotes by phlebotomine sand flies. Int J Parasitol 37:1097-
Benabdellah K, González-Rey E and González A (2007) Alterna-
tive trans-splicing of the Trypanosoma cruzi LYT1 gene
the encoded protein. Mol Microbiol 65:1559-1567.
Benz C, Nilsson D, Andersson B, Clayton C and Guilbride DL
(2005) Messenger RNA processing sites in Trypanosoma
brucei. Mol Biochem Parasitol 143:125-134.
Berriman M, Ghedin E, Hertz-Fowler C, Blandin G, Renauld H,
et al. (2005) The genome of the African trypanosome
Trypanosoma brucei. Science 309:416-422.
Tri-Tryp comparative genomics 13
Blackwell JM and Melville SE (1999) Status of protozoan ge-
Blackwell JM, Fakiola M, Ibrahim ME, Jamieson SE, Jeronimo
SB, Miller EN, Mishra A, Mohamed HS, Peacock CS, Raju
M, et al. (2009) Genetics and visceral leishmaniasis: Of
mice and man. Parasite Immunol 31:254-266.
Boucher N, McNicoll F, Dumas C and Papadopoulou B (2002)
integrated into different loci of Leishmania. Mol Biochem
ative karyotyping as a tool for genome structure analysis of
Trypanosoma cruzi. Mol Biochem Parasitol 147:30-38.
Brandão A, Urmenyi T, Rondinelli E, Gonzalez A, de Miranda
AB and Degrave W (1997) Identification of transcribed se-
quences (ESTs) in the Trypanosoma cruzi genome project.
Mem Inst Oswaldo Cruz 92:863-866.
Brener Z (1973) Biology of Trypanosoma cruzi. Annu Rev Mi-
Bringaud F, Berriman M and Hertz-Fowler C (2009) Trypa-
nosomatid genomes contain several subfamilies of ingi-
related retroposons. Eukaryot Cell 8:1532-1542.
Brisse S, Dujardin JC and Tibayrenc M (2000) Identification of
six Trypanosoma cruzi lineages by sequence-characterised
amplified region markers. Mol Biochem Parasitol 111:95-
Buscaglia CA and Di Noia JM (2003) Trypanosoma cruzi clonal
diversity and the epidemiology of Chagas’ disease. Mi-
crobes Infect 5:419-427.
Campos PC, Bartholomeu DC, DaRocha WD, Cerqueira GC and
Teixeira SM (2008) Sequences involved in mRNA process-
ing in Trypanosoma cruzi. Int J Parasitol 38:1383-1389.
Cano MI, Gruber A, Vazquez M, Cortés A, Levin MJ, González
A, Degrave W, Rondinelli E, Zingales B, Ramirez JL, et al.
(1995) Molecular karyotype of clone CL Brener chosen for
the Trypanosoma cruzi genome project. Mol Biochem
Cerqueira GC, DaRocha WD, Campos PC, Zouain CS and Tei-
xeira SM (2005) Analysis of expressed sequence tags from
Trypanosoma cruzi amastigotes. Mem Inst Oswaldo Cruz
Cerqueira GC, Bartholomeu DC, DaRocha WD, Hou L, Frei-
tas-Silva DM, Machado CR, El-Sayed NM and Teixeira SM
(2008) Sequence diversity and evolution of multigene fami-
lies in Trypanosoma cruzi. Mol Biochem Parasitol
Clayton CE (2002) Life without transcriptional control? From fly
to man and back again. EMBO J 21:1881-1888.
Cribb P and Serra E (2009) One- and two-hybrid analysis of the
interactions between components of the Trypanosoma cruzi
spliced leader RNA gene promoter binding complex. Int J
Cribb P, Esteban L, Trochine A, Girardini J and Serra E (2010)
Trypanosoma cruzi TBP shows preference for C/G-rich
DNA sequences in vitro. Exp Parasitol 124:346-349.
DaRocha WD, Otsu K, Teixeira SM and Donelson JE (2004)
Tests of cytoplasmic RNA interference (RNAi) and con-
struction of a tetracycline-inducible T7 promoter system in
Trypanosoma cruzi. Mol Biochem Parasitol 133:175-186.
Gonçalves VF, Teixeira SM, Chiari E, Junqueira AC, Fer-
nandes O, Macedo AM, et al. (2006) Ancestral genomes,
sex, and the population structure of Trypanosoma cruzi.
PLoS Pathog 2:e24.
(SRA) gene of Trypanosoma brucei rhodesiense encodes a
variant surface glycoprotein-like protein. Mol Biochem
De Greef C, Chimfwembe E, Kihang’a Wabacha J, Bajyana
Songa E and Hamers R (1992) Only the serum-resistant
bloodstream forms of Trypanosoma brucei rhodesiense ex-
press the serum resistance associated (SRA) protein. Ann
Soc Belg Med Trop 72(Suppl 1):13-21.
Denise H, Poot J, Jimenez M, Ambit A, Hermman DC, Ver-
the CPA cysteine peptidase in the Leishmania infantum ge-
nome strain JPCM5. BMC Mol Biol 7:42.
Depledge DP, Evans KJ, Ivens AC, Aziz N, Maroof A, Kaye PM
and Smith DF (2009) Comparative expression profiling of
Leishmania: Modulation in gene expression between spe-
cies and in different host genetic backgrounds. PLoS Negl
Trop Dis 3:e476.
Desjeux P (1996) Leishmaniasis. Public health aspects and con-
trol. Clin Dermatol 14:417-423.
Di Noia JM, Sánchez DO and Frasch AC (1995) The protozoan
Trypanosoma cruzi has a family of genes resembling the
mucin genes of mammalian cells. J Biol Chem 270:24146-
Diehl S, Diehl F, El-Sayed NM, Clayton C and Hoheisel JD
(2002) Analysis of stage-specific gene expression in the
bloodstream and the procyclic form of Trypanosoma brucei
using a genomic DNA-microarray. Mol Biochem Parasitol
El-Sayed NM and Donelson JE (1997) A survey of the
Trypanosoma brucei rhodesiense genome using shotgun se-
quencing. Mol Biochem Parasitol 84:167-178.
JE (1995) cDNA expressed sequence tags of Trypanosoma
brucei rhodesiense provide new insights into the biology of
the parasite. Mol Biochem Parasitol 73:75-90.
El-Sayed NM, Myler PJ, Blandin G, Berriman M, Crabtree J,
Aggarwal G, Caler E, Renauld H, Worthey EA, Hertz-
Fowler C, et al. (2005a) Comparative genomics of trypa-
nosomatid parasitic protozoa. Science 309:404-409.
El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal
G, Tran AN, Ghedin E, Worthey EA, Delcher AL, Blandin
G, et al. (2005b) The genome sequence of Trypanosoma
cruzi, etiologic agent of Chagas disease. Science 309:409-
Fernandes AP, Costa MM, Coelho EA, Michalick MS, de Freitas
E, Melo MN, Luiz Tafuri W, Resende DM, Hermont V,
lenge with Leishmania (Leishmania) chagasi in beagle dogs
vaccinated with recombinant A2 protein. Vaccine 26:5888-
Fire A, Xu S, Montgomery MK, Kostas SA, Driver SE and Mello
CC (1998) Potent and specific genetic interference by dou-
ble-stranded RNA in Caenorhabditis elegans. Nature
Franzén O, Ochaya S, Sherwood E, Lewis MD, Llewellyn MS,
Miles MA and Andersson B (2011) Shotgun sequencing
analysis of Trypanosoma cruzi I Sylvio X10/1 and compari-
14Teixeira et al.
Girard A and Hannon GJ (2008) Conserved themes in small-
RNA-mediated transposon control. Trends Cell Biol
Gunzl A, Bruderer T, Laufer G, Schimanski B, Tu LC, Chung
HM, Lee PT and Lee MG (2003) RNA polymerase I tran-
scribes procyclin genes and variant surface glycoprotein
gene expression sites in Trypanosoma brucei. Eukaryot Cell
Haag J, O’hUigin C and Overath P (1998) The molecular phylog-
eny of trypanosomes: Evidence for an early divergence of
the Salivaria. Mol Biochem Parasitol 91:37-49.
Hajduk SL, Harris ME and Pollard VW (1993) RNA editing in
kinetoplastid mitochondria. FASEB J 7:54-63.
Henriksson J, Porcel B, Rydåker M, Ruiz A, Sabaj V, Galanti N,
Cazzulo JJ, Frasch AC and Pettersson U (1995) Chromo-
some specific markers reveal conserved linkage groups in
Trypanosoma cruzi. Mol Biochem Parasitol 73:63-74.
Holzer TR, McMaster WR and Forney JD (2006) Expression pro-
filing by whole-genome interspecies microarray hybridiza-
tion reveals differential gene expression in procyclic pro-
amastigotes in Leishmania mexicana. Mol Biochem
Ivens AC, Peacock CS, Worthey EA, Murphy L, Aggarwal G,
Berriman M, Sisk E, Rajandream MA, Adlem E, Aert R, et
al. (2005) The genome of the kinetoplastid parasite,
Leishmania major. Science 309:436-442.
Jackson AP (2010) The evolution of amastin surface glycopro-
teins in trypanosomatid parasites. Mol Biol Evol 27:33-45.
Jackson AP, Sanders M, Berry A, McQuillan J, Aslett MA, Quail
al. (2010) The genome sequence of Trypanosoma brucei
nosomiasis. PLoS Negl Trop Dis 4:e658.
Kabani S, Fenn K, Ross A, Ivens A, Smith TK, Ghazal P and
vivo-derived bloodstream parasite stages and dynamic anal-
ysis of mRNA alterations during synchronous differentia-
tion in Trypanosoma brucei. BMC Genomics 10:e427.
Kieft R, Capewell P, Turner CM, Veitch NJ, MacLeod A and
Hajduk S (2010) Mechanism of Trypanosoma brucei
gambiense (group 1) resistance to human trypanosome lytic
factor. Proc Natl Acad Sci USA 107:16137-16141.
Koeller CM and Heise N (2011) The sphingolipid biosynthetic
pathway is a potential target for chemotherapy against Cha-
gas disease. Enzyme Res 2011:e648159.
Kolev NG, Franklin JB, Carmi S, Shi H, Michaeli S and Tschudi
C (2010) The transcriptome of the human pathogen
Trypanosoma brucei at single-nucleotide resolution. PLoS
Lacomble S, Portman N and Gull K (2009) A protein-protein in-
teraction map of the Trypanosoma brucei paraflagellar rod.
PLoS One 4:e7685.
LaCount DJ, Bruse S, Hill KL and Donelson JE (2000) Dou-
ble-stranded RNA interference in Trypanosoma brucei us-
ing head-to-head promoters. Mol Biochem Parasitol
Lainson R, Ward RD and Shaw JJ (1977) Leishmania in phlebo-
tomid sandflies: VI. Importance of hindgut development in
parasites of the
Laurentino EC, Ruiz JC, Fazelini G, Myler PJ, Degrave W,
Leishmania braziliensis genome by shotgun sequencing.
Mol Biochem Parasitol 13:81-86.
pling of poly(A) site selection and trans-splicing in
Leishmania. Genes Dev 7:996-1007.
(2007) Genomic and proteomic expression analysis of
Leishmania promastigote and amastigote life stages: The
Leishmania genome is constitutively expressed. Mol
Biochem Parasitol 152:35-46.
Lepesheva GI, Villalta F and Waterman MR (2011) Targeting
Trypanosoma cruzi sterol 14a -demethylase (CYP51). Adv
Levick MP, Blackwell JM, Connor V, Coulson RM, Miles A,
Smith HE, Wan KL and Ajioka JW (1996) An expressed se-
quence tag analysis of a full-length, spliced-leader cDNA li-
brary from Leishmania major promastigotes. Mol Biochem
splicing in trypanosomatids: Mechanism, factors, and regu-
lation. Eukaryot Cell 2:830-840.
Lima MT, Lenzi HL and Gattass CR (1995) Negative tissue para-
sitism in mice injected with a noninfective clone of
Trypanosoma cruzi. Parasitol Res 81:6-12.
Lukes J, Jirkû M, Dolezel D, Kral’ová I, Hollar L and Maslov DA
(1997) Analysis of ribosomal RNA genes suggests that
trypanosomes are monophyletic. J Mol Evol 44:521-527.
Lye LF, Owens K, Shi H, Murta SM, Vieira AC, Turco SJ,
Tschudi C, Ullu E and Beverley SM (2010) Retention and
zoans. PLoS Pathog 6:e1001161.
lution – Diverse diseases. Trends Parasitol 24:103-105.
Martínez-Calvillo S, Yan S, Nguyen D, Fox M, Stuart K and
chromosome 1 initiates in both directions within a single re-
gion. Mol Cell 11:1291-1299.
Martínez-Calvillo S, Nguyen D, Stuart K and Myler PJ (2004)
Transcription initiation and termination on Leishmania ma-
jor chromosome 3. Eukaryot Cell 3:506-517.
Martínez-Calvillo S, Vizuet-de-Rueda JC, Florencio-Martínez
LE, Manning-Cela RG and Figueroa-Angulo EE (2010)
Mathieu-Daudé F, Cheng R, Welsh J and McClelland M (1996)
Screening of differentially amplified cDNA products from
conformation polymorphism (SSCP) gels. Nucleic Acids
Matthews KR (2005) The developmental cell biology of
Trypanosoma brucei. J Cell Sci 118:283-290.
Mauel J (2002) Vaccination against Leishmania infections. Curr
Drug Targets Immune Endocr Metabol Disord 2:201-226.
of the A2 virulence factor in Leishmania: Evidence that A2
is a stress response protein. Mol Microbiol 77:518-530.
Miles MA, Souza A, Povoa M, Shaw JJ, Lainson R and Toye PJ
(1978) Isozymic heterogeneity of Trypanosoma cruzi in the
Tri-Tryp comparative genomics15