Phylogeny of the CDC25 homology domain reveals rapid differentiation of Ras pathways between early animals and fungi.
ABSTRACT The members of the Ras-like superfamily of small GTP-binding proteins are molecular switches that are in general regulated in time and space by guanine nucleotide exchange factors and GTPase activating proteins. The Ras-like G-proteins Ras, Rap and Ral are regulated by a variety of guanine nucleotide exchange factors that are characterized by a CDC25 homology domain. Here we study the evolution of the Ras pathway by determining the evolutionary history of CDC25 homology domain coding sequences. We identified CDC25 homology domain coding sequences in animals, fungi and a wide range of protists, but not in plants. This suggests that the CDC25 homology domain originated in or before the last eukaryotic ancestor but was subsequently lost in plant. We provide evidence that at least seven different ancestral Ras guanine nucleotide exchange factors were present in the ancestor of fungi and animals. Differences between present day fungi and animals are the result of loss of ancestral Ras guanine nucleotide exchange factors early in fungal and animal evolution combined with lineage specific duplications and domain acquisitions. In addition, we identify Ral guanine exchange factors and Ral in early diverged fungi, dating the origin of Ral signaling back to before the divergence of animals and fungi. We conclude that the Ras signaling pathway evolved by gradual change as well as through differential sampling of the ancestral CDC25 homology domain repertoire by both fungi and animals. Finally, a comparison of the domain composition of the Ras guanine nucleotide exchange factors shows that domain addition and diversification occurred both prior to and after the fungal-animal split.
Article: CFGP 2.0: a versatile web-based platform for supporting comparative and evolutionary genomics of fungi and Oomycetes.[show abstract] [hide abstract]
ABSTRACT: In 2007, Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr/) was publicly open with 65 genomes corresponding to 58 fungal and Oomycete species. The CFGP provided six bioinformatics tools, including a novel tool entitled BLASTMatrix that enables search homologous genes to queries in multiple species simultaneously. CFGP also introduced Favorite, a personalized virtual space for data storage and analysis with these six tools. Since 2007, CFGP has grown to archive 283 genomes corresponding to 152 fungal and Oomycete species as well as 201 genomes that correspond to seven bacteria, 39 plants and 105 animals. In addition, the number of tools in Favorite increased to 27. The Taxonomy Browser of CFGP 2.0 allows users to interactively navigate through a large number of genomes according to their taxonomic positions. The user interface of BLASTMatrix was also improved to facilitate subsequent analyses of retrieved data. A newly developed genome browser, Seoul National University Genome Browser (SNUGB), was integrated into CFGP 2.0 to support graphical presentation of diverse genomic contexts. Based on the standardized genome warehouse of CFGP 2.0, several systematic platforms designed to support studies on selected gene families have been developed. Most of them are connected through Favorite to allow of sharing data across the platforms.Nucleic Acids Research 11/2012; · 8.03 Impact Factor
Phylogeny of the CDC25 homology domain reveals rapid differentiation of Ras
pathways between early animals and fungi
Teunis J.P. van Dama,b,⁎, Holger Rehmannb, Johannes L. Bosb, Berend Snela
aTheoretical Biology and Bioinformatics, Department of Biology, Science Faculty, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
bDepartment of Physiological Chemistry, Centre for Biomedical Genetics and Cancer Genomics Centre, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG Utrecht,
a b s t r a c t a r t i c l ei n f o
Received 8 June 2009
Accepted 22 June 2009
Available online 28 June 2009
Ras guanine exchange factors
CDC25 homology domain
The members of the Ras-like superfamily of small GTP-binding proteins are molecular switches that are in
general regulated in time and space by guanine nucleotide exchange factors and GTPase activating proteins.
The Ras-like G-proteins Ras, Rap and Ral are regulated by a variety of guanine nucleotide exchange factors
that are characterized by a CDC25 homology domain. Here we study the evolution of the Ras pathway by
determining the evolutionary history of CDC25 homology domain coding sequences. We identified CDC25
homology domain coding sequences in animals, fungi and a wide range of protists, but not in plants. This
suggests that the CDC25 homology domain originated in or before the last eukaryotic ancestor but was
subsequently lost in plant. We provide evidence that at least seven different ancestral Ras guanine nucleotide
exchange factors were present in the ancestor of fungi and animals. Differences between present day fungi
and animals are the result of loss of ancestral Ras guanine nucleotide exchange factors early in fungal and
animal evolution combined with lineage specific duplications and domain acquisitions. In addition, we
identify Ral guanine exchange factors and Ral in early diverged fungi, dating the origin of Ral signaling back
to before the divergence of animals and fungi. We conclude that the Ras signaling pathway evolved by
gradual change as well as through differential sampling of the ancestral CDC25 homology domain repertoire
by both fungi and animals. Finally, a comparison of the domain composition of the Ras guanine nucleotide
exchange factors shows that domain addition and diversification occurred both prior to and after the fungal–
© 2009 Elsevier Inc. All rights reserved.
Ras signaling pathways are involved in the regulation of a wide
movement, division, secretion and cell differentiation. It is therefore
not surprising that Ras G-proteins and their up- or downstream
pathways a prime target for cancer research.
The Ras G-proteins are members of the Ras-like superfamily of
small G-proteins, which includes Ras, Rho, Rab, Arf and Ran proteins
. These proteins cycle between an inactive GDP-bound and an
active GTP-bound conformation. This cycling is regulated by guanine
nucleotide exchange factors (GEFs) and GTPase activating proteins
(GAPs). GEFs exchange the G-protein bound GDP for the cellular more
abundant GTP, while GAPs increase the intrinsic GTP hydrolyzing
activity of G-proteins by several orders of magnitude. Each family of
small G-proteins has its own set of GEFs and GAPs that are not
homologous , e.g. the catalytic region of RhoGEFs is a tandem PH–
DH domain, whereas RasGEFs have a CDC25 homology domain
(CDC25 HD) with an unrelated protein fold.
The CDC25 HD was first identified in the budding yeast Saccharo-
myces cerevisiae protein CDC25 and GEFs containing this domain
regulate exclusively the Ras, Rap and Ral proteins (commonly called
here the Ras⁎ proteins), a subset of the Ras-like superfamily of small
G-proteins. In the yeast S. cerevisiae three different proteins with
CDC25 HD are present for five Ras⁎ proteins, whereas in humans cells
thirty CDC25 HD proteins are present for sixteen Ras⁎ proteins. S.
cerevisiae does not contain Ral proteins, suggesting that Ral is an
animal innovation. Fungi and animals have very different sets of
RasGEFs, but we do not know how these differences arose.
An implicit consensus within the field was that Ras⁎ G-proteins
were a fungal–animal (Opisthokont) invention, but recent genomic
evidence seems to suggest that Ras⁎ is much older and its origins can
be traced back to the last eukaryotic common ancestor (LECA) .
Cellular Signalling 21 (2009) 1579–1585
Abbreviations: CDC25 HD, CDC25 Homology domain; LECA, Last eukaryotic
common ancestor; HMM, Hidden Markov model.
⁎ Corresponding author. Theoretical Biology and Bioinformatics, Dept. of Biology,
Science Faculty, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands.
Tel.: +31 302533004; fax: +31 302513655.
E-mail addresses: T.J.P.vanDamfirstname.lastname@example.org (T.J.P. van Dam),
H.Rehmann@umcutrecht.nl (H. Rehmann), J.L.Bos@umcutrecht.nl (J.L. Bos),
B.Snel@uu.nl (B. Snel).
0898-6568/$ – see front matter © 2009 Elsevier Inc. All rights reserved.
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/cellsig
However no study has been done to investigate the origins of the
CDC25 HD. Given the role of CDC25 HD-containing proteins as the
principal activator of the Ras⁎ G-proteins, knowledge of its origin and
evolution may give new insight into the evolution of the Ras⁎
signaling pathways. We therefore studied the origin and evolution of
this domain using the increasing amount of sequenced genomes of
eukaryote species from all major phyla.
We found that, similar to Ras⁎ G-proteins, the CDC25 HD is an
ancient protein domain for which the origin dates back to the LECA.
Furthermore, the data suggests that at least seven and possibly twelve
ancestral domains were already present at that stage in evolution.
Interestingly, we observe an unusually strong functional and evolu-
tionary relationship of Ras⁎ and the CDC25 HD. Finally, we found that
both the Ral protein and RalGEFs are present in early diverged fungi
(e.g. primitive fungi) suggesting an earlier origin of these proteins
than originally thought.
We have acquired a number of genomes from the string database,
version 7 . In addition we have acquired best model protein
sequences of the remaining genomes from their respective project
download sites. For full overview of genomes and their source, see
Supplementary material Table 1.
2.2. Genome search and CDC25 domain identification
The SMART RasGEF hidden Markov model (HMM) was used to
initially search 30 genomes. All CDC25 HD sequences with an E-value
lower than 1e−5were gathered. All gathered sequences were aligned
using Muscle 3.6 . Bad aligning sequences or sequences which
introduce large insertions in the alignment were removed and the
remaining sequences were re-aligned. A custom HMM profile was
created from this alignment using hmmbuild and hmmcalibrate of the
HMMER package version 2.3.2 . The HMMer profile is provided as
Supplementary data. The genome protein dataset was then searched
using the custom CDC25 HD HMM profile. All proteins with E-value of
1e−5or lower were collected and the sequences with E-values above
1e−05were manually checked for significant sequence similarity
using local PSI-BLAST.
2.3. Multiple sequence alignment and phylogenetic tree construction
All gathered protein sequences were aligned using MAFFT  with
the “--localpair” option. As many gaps were introduced by only a few
sequences, positions were discarded where less than 10% of the
sequences had a residue. Phylogenetic trees were constructed using
maximum likelihood (PhyML  and RaxML ) and neighbor
joining (Quicktree ). For PhyML the WAG model for amino acid
substitutionwas used witha discretegammamodel using 6 categories
with estimated gamma parameter and estimated proportions of
invariable sites. The bootstrap analysis is based on 100 iterations.
RaxML was run using the PROTGAMMAIWAG model and 100
iterations for bootstrap analysis were performed with 1688 as seed
number. The phylogenetic trees were analyzed by manual annotation
in terms of duplication, loss and speciation using the species tree (see
Fig.1) bootstrap values and domain compositions. The alignment and
the phylogenetic trees (in Newick format) are provided as Supple-
2.4. Domain evolution analysis
The domain compositions of the RasGEF sequences were analyzed
by projecting domain structures of the RasGEF sequences onto the
phylogenetic trees. Firstly, domains were identified in the full-length
RasGEF sequences by using the hmmpfam program of the HMMer
package and the Pfam_ls collection of HMM profiles from the Pfam
database using the provided gathering cut-off values. Secondly,
insignificant Pfam hits were called ‘true’ when the hsp in a blast all
vs. all RasGEFs returned a hit (E-valueb1) within a protein with a
significant Pfam hit of the same domain model. Thirdly, all RasGEF
sequences where blasted against a sequence database containing
sequences of all previouslycalled domainsin steps 1 and 2 using a cut-
off value of 1e−5. Hits which overlap more than 20% with Pfam hits
were excluded. A custom Perl script was used to draw domain
structures in a phylogenetic tree in Scalable Vector Graphics format.
Sequences were analyzed manually where needed, to detect domains
which are not detected using the described method but for which
contextual evidence exists that they should be present.
2.5. Phylogenetic analysis of Ras-like G-proteins
The genome protein dataset was searched using the Pfam HMM
profile for Ras. All sequences with a bit score above zero were
selected. Due to high similarity of Ras to other small G-protein
sequences the HMM profile will not distinguish between the various
small G-protein super families and this step is therefore very
inclusive. In order to select only the Ras-like G-protein subfamily
members an alignment of all selected protein sequences was made
using MAFFT with the “--localpair” option. A neighbor joining tree
was constructed with the Quicktree program and analyzed. All
known Ras-like sub family members were contained in one clade
which did not contain proteins of the other sub families. All
sequences in this clade were selected and human RhoA/B/C
sequences were added as outgroup. These selected sequences were
aligned using MAFFT with the “--localpair” option. To construct the
final phylogenetic tree PhyML was used with a discrete gamma
model using four categories with estimated gamma parameter and
estimated proportions of invariable sites.
3.1. Presence of CDC25 homology domain containing proteins in
We have analyzed the genomes of a large variety of different
eukaryotic species representing all major phyla for the presence of the
to genomes inwhich CDC25 HD was already identified (i.e. animals and
Phytophthora infestans ), ciliates (Tetrahymena thermophila ) and
excavates (Naegleria gruberi , Trichomonas vaginalis ). As these
divergent eukaryotes represent phyla probably diverged from the
Unikonts (animals, fungi and amoebozoa) at the eukaryotic root ,
the CDC25 HD was most likely present in the LECA (see Fig. 1). This
within this phylum.
The number of CDC25 HD coding sequences per genome is highly
variable throughout species (7 to 30 in animals, see Fig. 1). Complex
RasGEFrepertoire — theycontain upto 30 CDC25 HDcodingsequences.
Interestingly, we observed similar high numbers of CDC25 HD coding
sequences in the early diverging fungus Rhyzopus oryzae (31), the
amoeba Dictyostelium discoideum (29) and the excavate N. gruberi (26).
Most notable is the amoebaD.discoideum, which lifecycle includes both
single cell and multi cellular stages and N. gruberi, which lifecycle
T.J.P. van Dam et al. / Cellular Signalling 21 (2009) 1579–1585
speculate that the CDC25 HD paralogy number might correlate with
cellular lifestyle complexity rather than organismal complexity.
3.2. Co-occurrence of Ras⁎ G-proteins and the CDC25 homology domain
Previously the origin of Ras⁎ G-proteins has been firmly put in the
LECA [4,18,19]. Since our results also suggest an early origin of the CDC25
HD coding sequence also have Ras⁎ G-proteins and vice versa, with the
sole exception of Encephalitozoon cuniculi (Fig. 1). This microsporidium
We observe that Ras⁎ G-proteins and CDC25 HD-containing
proteins have a high degree of phylogenetic co-occurrence. Such
high co-occurrence is not generally observed in eukaryotes  and is
thus indicative of an unusually strong functional and evolutionarylink
between the CDC25 HD and Ras⁎ proteins.
3.3. Phylogenetic reconstruction of CDC25 HD evolution
Due to high variability in the domain composition of RasGEFs in
general, large parts of these proteins are not homologous and it is
Fig. 1. Species tree according to Simpson and Roger  and NCBI taxonomy database of the selected genomes with number of identified CDC25 HD-containing proteins and
occurrence of Ras⁎ G-proteins. Possible alternative rooting of the eukaryotic tree of life are indicated with a star.
T.J.P. van Dam et al. / Cellular Signalling 21 (2009) 1579–1585
therefore impossible to make full-length alignments. Therefore, to
determine how all CDC25 HD-containing proteins are related and to
time duplication and loss events we constructed multiple phyloge-
netic trees based on the CDC25 HD. Additionally, the use of only the
CDC25 HD allows us to analyze the evolution of domain compositions
independently and to infer when domains have been acquired or lost
relative to the duplication and speciation events of the CDC25 HD.
We constructed phylogenetic trees from a multiple sequence
alignment using various phylogenetic methods such as maximum
likelihood and neighbor joining (see methods). We analyzes a total of
three gene trees basedondifferent methodsbycomparing themtothe
species phylogeny (tree of life) and annotated the nodes in the trees in
terms of duplication and speciation events. This allows for a
comparative genomics interpretation of the resulting large phylogeny.
Species that can be used to define orthologous groups for fungi and
animals (the amoeba D. discoideum, the Phytophthora species P. sojae,
P. infestans, the ciliate T. thermophila, and the excavata N. gruberi and T.
vaginalis) where used as outgroups.
We derived twelve fungal/animal orthologous groups of CDC25
HD coding sequences (Fig. 2) by annotating the phylogenetic trees in
terms of duplication and speciation events. We will refer to these
orthologous groups as classes henceforth. Of the twelve classes, five
(C3G, SOS, RasGRP, RasGRF and LTE) consistently contained outgroup
species, suggesting with high certainty that the origin of these classes
lies within LECA (Fig. 2). Two classes (RalGEF and RapGEF) consist of
fungal and animal sequences, strongly suggesting that these classes
represent ancestral genes in the Opisthokont ancestor (ancestor of
animals and fungi). The remaining five classes contain either
exclusively fungal sequences (SH3-GEF, Bem2) or animal sequences
(NSP, PLCε and Very-KIND). However, since no close relationship with
each of these five classes with any other class is observed and their
connection to other classes suggest an origin by duplication that
predates the LECA, we postulate here that these five classes are
potentially older than either the fungal ancestor or animal ancestor.
There is no evidence, such as domain composition or consistent
proximity of clusters within the trees that indicate whether separate
clusters should be merged.
By analyzing the phylogenetic trees we derived the gene family
dynamics for each of the twelve classes (See the Supplementary
material for a phylogenetic description for each class). If the LECA
indeed contained all 12 classes, it implies that not four, but nine
classes have been lost in either the animal or fungal ancestor (Fig. 2).
In the case that the five classes which contain only fungal or animal
sequences should have been merged into one or two classes, our
results are affected only quantitatively. In addition to loss in the
animal and fungal ancestors we also find two classes (RalGEF and
RapGEF) which contain animals and early diverged fungi, therebyalso
displaying a similar pattern of losses early in fungal evolution,
although not immediately after the animal–fungal split.
The selective loss of the CDC25 HD classes during animal and
fungal evolution offers an explanation on the substantial difference in
the RasGEF repertoire between fungi and animals. Our results
complement the prevailing notion that in animals Ras⁎ signaling has
gained complexity largely by duplications of CDC25 HD coding
sequences as our results imply that fungi have greatly reduced their
Ras⁎ signaling complexity.
3.4. Ral signaling in fungi
The RalGEF class contains CDC25 HD coding sequences of RasGEF
proteins that have unique specificity for the Ral proteins but not for
the other Ras⁎ proteins. The presence of orthologs of RalGEF in
primitive fungi is fascinating because Ral proteins and the RalGEFs
(RalGDS and RalGPS) were previously only identified in animals .
The phylogeny of the CDC25 HD displays a total of twelve fungal genes
clustering consistently with RalGDS and RalGPS genes (Fig. 3A).
Eleven of these fungal genes belong to the Zygomycota fungal species
R. oryzae and Phycomyces blackleeanus and consistently cluster at the
base of animal RalGDS-like genes but share no other characteristics,
such as for example the Ras association (RA) domains in animal
RalGDS.A singlegene of B.dendrobatidis(chytrid fungus)is positioned
at the base of animal RalGPS-like genes and very interestingly shares
the Pleckstrin Homology (PH) domain and the SH3 binding motif with
themammalianRalGPS.Thephylogenyof Ras-likeproteins showsthat
these fungal species also contain small G-proteins orthologous to
animal Ral (Fig. 3B). This strongly suggests that the fungal RalGEFs
that cluster with animal RalGEFs are very likely to be functional
RalGEFs. Moreover, it also shows that the origin of Ral signaling does
not lie within the animal ancestor but in the Opisthokont ancestor and
that Ral signaling has been subsequently lost in derived fungi (i.e. the
basidiomycetes and ascomycetes).
3.5. RasGEF domain compositions and domain shuffling
Based on our CDC25 HD phylogeny we next investigated the
conservation of the domain composition of the various classes. Fig. 4
shows the tree and the domain compositions of the RasGRP class, one
of three orthologous groups that contains both animal and fungal
genes (RasGRP, RalGEF and RapGEF). The animal RasGRP has a CDC25
HD, the Ras Exchange Motif (REM, also known as RasGEFN), a Protein
kinase C conserved region 1 domain (C1) and EF-hand motifs, while
the fungal orthologs have a Miro domain followed by the REM and
CDC25 HD. This implies that the various domains were added to the
CDC25 HD domain after the animal–fungal split. Similarly, in the
RalGEF and RapGEF classes we find genes belonging to early diverged
fungi displaying only the basic RasGEF structure (REM followed by the
CDC25 HD) while the animal orthologs contain additional domains
(e.g. cAMP binding domain, RA domain, PH domain) (see Supple-
mentary Figs. S1–3). As indicated, the B. dendrobatidis RalGPS gene is
unique in that it is the only fungal CDC25 HD-containing ortholog
Fig. 2. Evolutionary reconstruction of the CDC25 homology domain. The twelve classes
of CDC25 HD are shown. The earliest origin of a class based on outgroup species is
indicated by the font style: bold, originated in the last eukaryotic common ancestor;
italics, originated in the Opisthokont and plain, originated either in fungi or animals.
The earlier origins of several classes as suggested by phylogenetic tree topology and
annotation are shown in gray. Classes that have been lost are stricken. Classes indicated
with ⁎ havebeen lost in derived fungi (basidiomycetesand ascomycetes, but notin early
derived fungi e.g. chytrids and zygomycetes).
T.J.P. van Dam et al. / Cellular Signalling 21 (2009) 1579–1585
whose domain composition is identical to its animal orthologs. Since
additional domains play a critical role in the spatial and temporal
control of GEFs  the difference in acquired domains between
animal and fungal CDC25 HD-containing proteins likely represent the
evolving complexity of regulation.
The projection of the domain composition also reveals three
potential cases of convergent evolution of domain composition in the
RasGEF family (RA-, PH- and RhoGEF domains). For instance the RA
domain is present in RalGDS-like proteins but also in Epac and PDZ-
GEF proteins. Yet these genes do not cluster together in the
phylogenetic trees and instead are separated by ancient duplications.
Convergent evolution is also supported by the position of the RA
domain in these classes: the RalGDS genes have the RA domain at the
C-terminus while Epac and PDZ-GEFs have their RA domain N-
terminally of the CDC25 HD. Indeed phylogenetic analysis of the RA
domain reveals that the Epac and PDZ-GEF RA domains are more
similar to human unconventional Myosin-9b and to fungal Cyr1 (S.
cerevisiae), an adenylate cyclase, than to RalGDS-like proteins (data
not shown). This indicates that identical domains have been acquired
independently in different RasGEF classes early in animal evolution.
The Bem2 gene in S. cerevisiae and its orthologs in other yeasts
have a peculiar domain composition: the REM domain is located C-
terminal of the CDC25 HD according to Pfam  and SMART 
(Fig. 5A), while it is normally located N-terminal to the CDC25 HD.
However, by comparing the full-length sequences to a database of
single domains previously identified using HMMs (this is similar to
the snipsel approach as used in the SMART database, see methods) we
detected a second CDC25 HD C-terminal to the REM domain (Fig. 5B).
Similarly, the Bem2 ortholog in R. oryzae (an early diverged fungus)
contains a second REM domain N-terminal to the first CDC25 HD
(Fig. 5C). The identification of a second CDC25 HD domain in yeast
and early diverged fungi strongly implies that the Bem2-like genes in
fungi were originally comprised out of two REM-CDC25 HD cassettes.
Such a domain organization is not observed in any other known
RasGEF and may suggest internal domain cassette duplication.
However the phylogeny of the CDC25 HD that includes both CDC25
HDs of Bem2 is inconclusive: the two domains do not cluster together
using RaxML or Quicktree and are intersected by few other sequences
The addition of regulatory domains to CDC25 HD-containing
proteins early in the evolution of animals and fungi represents the
evolving complexity in the regulation of Ras* proteins. Additionally,
with the identification of a tandem REM-CDC25 HD cassette in Bem2
we find that the CDC25 HD itself can be added to existing CDC25 HD-
Fig. 3. A) Sub tree of RasGEFs containing both types of RalGEFs: RalGPS and RalGDS. Zygomycota proteins (black lines) cluster consistently with RalGDS proteins while a
B. dendrobatides gene (black lines) clusters with RalGPS. B) Subtree of Ras-like G-proteins showing Ral-like G-proteins. Fungal Ral-like proteins are indicated by black lines.
T.J.P. van Dam et al. / Cellular Signalling 21 (2009) 1579–1585