Amino acids biosynthesis and nitrogen assimilation pathways: A great genomic deletion during eukaryotes evolution

ArticleinBMC Genomics 12 Suppl 4(Suppl 4):S2 · December 2011with43 Reads
Impact Factor: 3.99 · DOI: 10.1186/1471-2164-12-S4-S2 · Source: PubMed
Abstract

Besides being building blocks for proteins, amino acids are also key metabolic intermediates in living cells. Surprisingly a variety of organisms are incapable of synthesizing some of them, thus named Essential Amino Acids (EAAs). How certain ancestral organisms successfully competed for survival after losing key genes involved in amino acids anabolism remains an open question. Comparative genomics searches on current protein databases including sequences from both complete and incomplete genomes among diverse taxonomic groups help us to understand amino acids auxotrophy distribution. Here, we applied a methodology based on clustering of homologous genes to seed sequences from autotrophic organisms Saccharomyces cerevisiae (yeast) and Arabidopsis thaliana (plant). Thus we depict evidences of presence/absence of EAA biosynthetic and nitrogen assimilation enzymes at phyla level. Results show broad loss of the phenotype of EAAs biosynthesis in several groups of eukaryotes, followed by multiple secondary gene losses. A subsequent inability for nitrogen assimilation is observed in derived metazoans. A Great Deletion model is proposed here as a broad phenomenon generating the phenotype of amino acids essentiality followed, in metazoans, by organic nitrogen dependency. This phenomenon is probably associated to a relaxed selective pressure conferred by heterotrophy and, taking advantage of available homologous clustering tools, a complete and updated picture of it is provided.

Full-text

Available from: Francisco Prosdocimi
PROCEEDINGS Open Access
Amino acids biosynthesis and nitrogen
assimilation pathways: a great genomic deletion
during eukaryotes evolution
RLM Guedes
1
, F Prosdocimi
2,3
, GR Fernandes
1,2
, LK Moura
2
, HAL Ribeiro
1
, JM Ortega
1*
From 6th International Conference of the Brazilian Association for Bioinformatics and Computational Biology
(X-meeting 2010)
Ouro Preto, Brazil. 15-18 November 2010
Abstract
Background: Besides being building blocks for proteins, amino acids are also key metabolic intermediates in living
cells. Surprisingly a variety of organisms are incapable of synthesizing some of them, thus named Essential Amino
Acids (EAAs). How certain ancestral organisms successfully competed for survival after losing key genes involved in
amino acids anabolism remains an open question. Comparative genomics searches on current protein databases
including sequences from both complete and incomplete genomes among diverse taxonomic groups help us to
understand amino acids auxotrophy distribution.
Results: Here, we applied a methodology based on clustering of homologous genes to seed sequences from
autotrophic organisms Saccharomyces cerevisiae (yeast) and Arabidopsis thaliana (plant). Thus we depict evidences
of presence/absence of EAA biosynthetic and nitrogen assimilation enzymes at phyla level. Results show broad loss
of the phenotype of EAAs biosynthesis in several groups of eukaryotes, followed by multiple secondary gene
losses. A subsequent inability for nitrogen assimilation is observed in derived metazoan s.
Conclusions: A Great Deletion model is proposed here as a broad phenomenon generating the phenotype of
amino acids essentiality followed, in metazoans, by organic nitrogen dependency. This phenomenon is probably
associated to a relaxed selective pressure conferred by heterotrophy and, taking advant age of available
homologous clustering tools, a complete and updated picture of it is provided.
Background
Creationandanalysisofgroupsoforthologousgenes
have been widely used for gene function prediction, evo-
lutionary and divergence time studies [1]. Moreover,
orthology is also a valuable source for evolutionary com-
prehension of pathways through phyloge netic a nalysis.
In respect to a central issue on cellular metabolism, the
order of appear ance for universal cel lular m etabolisms
was estimated by Cunchillos and Lecointre [2,3], with
amino acid catabolism and anabolism being respectively
the first and second pathways to appear, even earlier
than glycolysis and gluconeogenesis. The amino acids
biosynthesis, rather tha n linear and universal series of
reactions with homologues occurring in different o rgan-
isms, sometimes relies on alternative pathways, as
shown by Hernández-Montes et al. [4]. Moreover, gene
loss and pathway depletion, important events in ge nome
evolution, can be inferred from the orthologous groups
through comparative genomics. Tod ay, a vast amount of
information is provided by intensive genome sequen-
cing, and the efforts of grouping homologous genes had
reached great standards.
Amino acid anabolism is responsible for about 20% of
theenergythatcellsspendon protein synthesis [5,6].
The nutritional requirements of essential amino acids
and nitrogen are of striking importance and they have
* Correspondence: miguel@icb.ufmg.br
1
Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas,
Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, MG, Brazil
Full list of author information is available at the end of the article
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
© 2011 Guedes et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Cre ative Commons
Attribu tion License (http://creativecommons.org/licenses/by/ 2.0), which permits unrestrict ed use, dis tribution, and reproduction in
any medium, provided the original work is prope rly cited.
Page 1
been estimated as ~22mg /kg of EAAs and 3mg/kg of N
in human body [7,8]. M ore recent approaches for diet-
ary requirement calculations, using amino acid oxidation
as an indicator, reveal that the requirement is over five
fold what the classical approaches indicated, and the
requirement has now been determined for each of the
nine human EAAs [9]. It is of general understanding
that plant, as w ell as fungi, synthesize all ami no acid s
required for protein synthesis and that evolutionary pro-
cesses culminated in human inability to synthesize nine
amino acids (histidine, phenylalanine, tryptophan, valine,
isoleucine, leucine , lysine, methionine and threoni ne),
thus called essent ial amino acids (EAAs), which must be
obtained through diet. Amino acids also constitute our
source of organic nitrogen. There have been few
attempts to understand why some amino acids have
become essential. However, genome deletion events
have happened in the past and many organisms have
lost a number of important enzymes necessary for de
novo biosynthetic pathways. Hitherto, the pattern of loss
versus retention for amino acids biosynthetic pathways
was analyzed for a few protists and metazoans by Payne
and Loomis [10]. They verified that the set o f essential
amino aci ds is the same in animals and protists. Cur-
iously, most of the retained amino acids are intermedi-
ates in secondary pathways like purine ring bios ynthesis
and nitrogen metabolism.
An overview for the presence/absence o f the enzymes
which compose the amino acid biosynthetic pat hways,
among distinct phyla in the tree of life, could be accom-
plished with (i) rich protein databases such as the Uni-
Prot Knowledgebase (UniProtKB) [11] comprising over
10 million full-length sequences and (ii) the current
initiatives to group these proteins by ev olutionary relat-
edness - c alled homologues - such as COG-Cluster of
Orthologous Groups [12] and KEGG O rthology [13].
Unfortunately these initiatives consider only proteins
derived from complete genomes and thus a large
amount of information is currently lost, with over 6 mil-
lion remaining full-length proteins that be long to orga n-
isms with still incomplete genomes.
Here, we applied a methodology that takes into
account all available protein information to depict, at
phyla level, t he EAA biosynthetic and nitrogen assimila-
tion enzymes scenarios to inspect how and when amino
acid auxotrophy has first appeared along evolution.
A Great Genomic Deletion model is proposed to
explain the phenotypic inability to synthesize amino
acids that appears independently in distinct phylogeneti-
cally distant clades of eukaryotes. Such events should be
followed by subsequent steps of gene loss due to relaxed
selective pressure in already incomp lete pathways, lead-
ing to an eventual loss of all genes for a particular bio-
synthesis pathway in some clades. Accordingly, in
metazoans but Cnidaria, dependence on organic nitro-
gen accompanies the evolution of heterotrophy, thus
organisms become dependent even on NEAA for sup-
plying their nitrogen requirements.
Results
Clustering homologues of amino acid biosynthetic
enzymes
To determine the distribution of amino acid biosyn-
thetic enzymes, a homologue clustering process was
developed to allow the use of both complete and incom-
plete genomes [14 ,15]. The procedure starts with Seed
Linkage software [14] that clusters cognate proteins
from multiple organisms beginning with a single seed
sequence through connectivity saturation with it. Since
basaleukaryotessuchasplantsandfungiareauto-
trophic, sequenc es coding for al l the enzymes used in
the biosynthesis of EAAs from the plant Arabidopsis
thaliana and the fu ngus Saccharomyces cerevisiae we re
manually inspected using KEGG Pathway and used as
seeds to search for homologues. Moreover, our group
has been developing a proc edure to enrich secondary
databasessuchasCOG[12]andKEGGOrthology(to
be published) with UniRef50 cluste rs [16] available from
UniProt, therefore allowing the inclusion of data from
incompletely sequenced genomes. Additional file 1:
Sequences and genome status distribution reflects the
abundance of proteins derived from incomplete gen-
omes and evidences the importance of their inclusion.
In this work we took advantage of a home-built Uni-
Ref50 Enriched KEGG Orthology database (UEKO) to
additionally cluster seque nces with the seed sequences
mentioned above. Since these searches recruit sequences
from diverse clades, which may or may not contain
organisms with completely sequenced genomes, we
represented this information in Figure 1 as: (a) black
filled circles for phyla containing complete genomes; (b)
grey fil led circles comprise clades with at least one draft
genome available, but no complete genome, and (c)
empty circles represent phyla with no complete nor
draft genomes. Protein fragments are not included in
the search for homologues because they may represent
partial sequenced full length proteins at mRNA level or
incompletely modeled from genome. Moreover since
some full length proteins mi ght have not been captured
in databases due to high sequence divergence, a second
search round used UniProt to query all clustered
sequences. This step also ca ptures partial sequences
(entries labeled as fragments in UniProt) which were
appr oved by the cov erage filtering applied (see Methods
for details). These additional significant hits are repre-
sented by triangles in Figure 1. Furthermore, enzymes
required for the biosynthesis of the indicated amino
acids are ordered in the anabolic pathway from left to
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
Page 2 of 13
Page 2
right. All pathways refer to EAAs biosynthesis except
serine and glycine (the rightmost ones) used as experi-
men tal controls. Serine is represented with two alterna-
tive pathways observed in human and other eukaryotes:
S(1), from 3P-D-glyce rate; and S(2), from pyruvate. Gly-
cine is also represented by two pathways: G(1) and G(2),
both coming from serine; and G(3), coming from threo-
nine. As expected, serine and glycin e biosynthesis were
found to be potentially proficient in almost all phyla.
This control supports t he searching mechanism and
attest for t he efficacy of methods applied. A few excep-
tions were observed and deserve comments: (i) Serine
biosynthetic pathways was found to be absent in Rhodo-
phyta, although the complete genome of Cyanidioschy-
zon merolae is available. We manually inspected this
result with regular BLAST searches and did not find
additional evidence, although a translati on of partial
CDS was obtained for glycine biosynthetic enzyme G1
(Figure 1, triangle); (ii) Serine biosynthesis seems absent
in Apicomplexa as well, a clade comprising two Plasmo-
dium complete genomes lacking enzymes S1 and S4;
(iii) Considering t he animals, besides being able to find
serine biosynthetic enzymes, we fail to support the
NEAA character of glycine for Mollusca. However,
evidences could be obtained for ancient organisms such
as Placozoa and Porifera. Fo r the Microsporidia E. cuni-
culi, an obligatory intracellular parasitic fungus with
complete genome, it has been reported that the reper-
toire for the biosynthesis o f amino acids is restricted to
asparagines synthetase and serine hydroxymethyltrans-
ferase genes,thenserinewasknownasanEAA[17].
Thus, absence of evidence may not guarantee the
absence of the gene. However, out of 28 phyla, discard-
ing both the four clades with no genome project or in
progress (open circles) and the ones with complete gen-
ome (filled symbols), we could not provide evidence of
glycine biosynthesis for two phyla (Fornicata and Mol-
lusca). However evidence f or serine has been provided
in all of them.
Data presented in Figure 1 cl early depicts the presence
of complete biosynthetic pa thways for EAAs in both
plants (Chloro phyt a and Streptophyta) and fungi (As co-
mycota and Basidiomycota), as stated above. In previous
work we hypothesized that a great event of genome
deletion on which many of the intermediate enzymes
for biosynthetic pathways for amino acids have vanished,
ended up affecting the usage of EAAs in chordate pro-
teomes [18,19]. In 2006, Payne and Loomis [10] using
Figure 1 Essential amino acid anabolic pathways. Schematic representation for presence/absence of anabolic enzymes for nine essential
amino acids and the non-essential amino acids serine and glycine. Eukaryotic taxonomic tree displayed at phyla level. Circles represent detection
of complete proteins and triangles detection of complete and fragmented proteins. Black: phyla containing complete genomes; Grey: at most
organisms with draft genomes; White: phyla with no complete or draft genomes. Saccharomyces cerevisiae (Ascomycota) and Arabidopsis thaliana
(Streptophyta) were used as seeds. The 4 distinct aminotransferases in phenylalanine pathway are: (i) aspartate aminotransferase (ii) histidinol-
phosphate aminotransferase (iii) aromatic amino acid aminotransferase (iv) tyrosine aminotransferase. The 4 distinct methyltransferases in
methionine pathway are: (i) 5-methyltetrahydropteroyltriglutamatehomocysteine methyltransferase (ii) homocysteine S-methyltransferase (iii)
betaine-homocysteine methyltransferase (iv) 5-methyltetrahydrofolatehomocysteine methyltransferase. The 3 distinct transaminases in glycine
pathway are: alanine-glyoxylate transaminase, serine-glyoxylate transaminase and serine-pyruvate transaminase.
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
Page 3 of 13
Page 3
pFam protein signatures reported that protists and ani-
mals share essentiality for the nine amino acids. Here
we provide a broader analysis covering all genomes
available today and trying to map how and when the
Great Genomic Deletion has happen ed. Evidence was
found suggesting that this loss of capability to synthesize
EAAs i s conspicuo us at the base of metazoan evolution,
simultaneously affecting the complete set of EAAs. The
phenomenon is characterized as an initial phenotypic
deficiency, observed in Choanozoa, followed by multipl e
secondary gene losses. Accordingly, some enzymes
found in Chordata such as K14, M4 and M9 are missing
in Arthropoda. Remarkably, some components such as
VIL1 and M7 are maintained in most metazoan clades,
despite of pathway loss.
Actually, a Great Deletion causing concurrent pheno-
typic loss of amino acid biosynthesis capability affects
both metazoan and non-metazoan eukaryotes. Several
clades containing complete genomes (black filled sym-
bols) such as R hodophyta, Euglenozoa and Apicom-
plexa, show similar EAAs pattern. Moreover, some
evidence is provided suggesting the absence of com-
plete pathways in the non-Dikarya Fungi Microsporidia
and Neocallimastigomycota. This gives support to
separate events of Great Genomic Deletion for the ori-
ginofEAAsauxotrophyinatleastthreeother
branches. Similarly to Choanozoa, clades such as Het-
erokontophyta and Rhizaria present various enzymes
and some complete pathways. Evidences of complete
pathways for all EAAs but histidine (H) were obtained
in Heterokontophyta. Valine (V), isoleucine (I), lysine
(K) and threonine (T) are potentially synthesized in
Rhizaria as well as methionine (M) in Euglenozo a and
Amoebozoa. However it is possible that other EAAs
may also be synthesized in some of these clades. The
anabolic capabilities suggested by the current data
might be u nderestimated because we have only draft
genomes available for most of these organisms. The
Choanozoa clade con tains only draft genomes. Though
we observed m ore enzymes than in metazoan clades, a
final picture of Choanozoan phenylalanine biosynth-
esis, for example, might require completion of genome
sequencing. Further gene loss occurs during metazoan
evolution; however, for Pla cozoa, Porifera and Cni-
daria, the Great Genomic Deletion seems to be well
established.Sincethefirstavailablespongegenomeis
still an ongoing project and its proteins are not yet
deposited in UniProt, we manually inspected the
deduced proteome using regular BLAST alignments
(see Methods) and evidenced auxotro phy for all nine
EAAs. The same simple approach was applied to all
phyla (Figure 1, triangles). Other clades that do not
present any enzymes were omitted from Figure 1, such
as Apusozoa and Jakobida.
Lysine biosynthesis
Inspection of F igure 1 depicts a remarkable diff erence
on lysine (K) biosynthesis pathways present in fungi an d
plants. Since the occurrence of an a-aminoadipate
(AAA) pathway K( 1) in Fungi [20] as opposite to a dia-
minopimelate (D AP) pathway K(2) known to b e present
in plants, algae and bacteria [21,22] has already been
reported, we set up t o depi ct the complete scenario for
K bio synthesis including prokaryo tes (Figure 2) . A third
pathway K(3) preferentially used by Archaea but also
reported to exist i n bac terial groups [23] was a lso con-
sidered, therefore sequences from the Pyrococcus hori-
koshii archaea were also used as seed for homologue
sequence clustering. Data supports the view that the K
(2) p athway, found to b e complete in plants, is often
present in prokaryotic clades of bact eria and archaea, in
agreement with previous findings [21,22]. Curiously,
nine bacterial clades (Acidobacteria, C hlorobi, Deferri-
bacteres, D einococcus-Thermus, Fusobacteria, Chlamy-
diae, Synergistetes, Tenericutes and Thermotogae) all
of which contain complete genomes do not present
K12 enzyme, but t here are three other alternative sub-
sets of enzymes present in prokaryotes that could cir-
cumvent this step in lysine biosynthesis. Chlamydiae
may r epresent an evidence of amino acid essentiality
extended to prokaryotes, since diaminopimelate decar-
boxylase (K14) is absent and there are no known alter-
natives to this reaction. The set of enzymes responsible
for the K(3) pathway, was found to occur in prokaryotes,
and it is complete in the archaeal clades Crenarchaeota
and Euryarcheota, as well as in the bacterial clades
Chloroflexi and Proteobacteria, and p robably in Acti no-
bacteria and Bacteroidetes. R emarkably, the first four
enzymes that constitute this pathway are coincident
with the K(1) pathway (indicated by gray shadin g). The
complete K(1) pathway occurs in Proteobacteria (and
possibly in Act inobacteria, Bacteroidetes and Firmicutes,
as evidenced by regular BLAST) and fungi. Thus, it is
tempting to assume that a variant synthesis of K
occurred in Archaea and, being m odified in one of the
four bacterial phyla above (with the addition of t hree
enzymes: aminoadipate-semialdehyde dehydrogenase,
saccharopine dehydrogenase NADP+ and saccharopine
dehydrogenase NAD+), ended up constituting the fungi-
occurring K biosynthetic pathway. The eukaryotic clades
Rhizaria and H eterokontophyta, which present the K(2)
pathway, appear to group with plants.
Nitrogen auxotrophy
Consumption of amino acids is an important route for
nitrogen assimi lation in other biolo gical compounds for
heterotrophic organisms, such as those comprised by
some of the clades shown in Figure 1 (e.g. Cho rdata).
Assimilation of free ammonium in eukaryotes is done
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
Page 4 of 13
Page 4
Figure 2 Lysine anabolic pathways. Schematic representation for presence/absence of enzymes involved in lysine biosynthesis. K(1) represents
Fungi a-aminoadipate (AAA) pathway; K(2) bacteria, plants, and algae diaminopimelate (DAP) pathway; K(3) archaea a-aminoadipate (AAA)
variant pathway. Taxonomic tree displayed at phyla level. Circles represent detection of complete proteins and triangles detection of complete
and fragmented proteins. Colors are as for Figure 1. Saccharomyces cerevisiae (Ascomycota), Arabidopsis thaliana (Streptophyta) and Pyrococcus
horikoshii (Euryarchaeota) were used as seeds.
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
Page 5 of 13
Page 5
by a cytoplasmatic reaction catalyzed by glutamate dehy-
drogenase (EC:1.4.1.4) which incorporates ammonium
into alpha-ketoglutarate yielding glutamate, using elec-
trons from a reduced cytoplasmatic co-enzyme NADPH.
Two iso forms are present in fungi and one in plants,
the latter h aving the additional option to not only
assimilate nitrogen, but also to fixate it, often with the
ass ociation of nitrogen-fixating bacteria. Thus, to inves-
tigate if the Great Genomic Delet ion of biosynthetic
enzymes for EAAs co-occurred with t he heterotrophy
for nitrogen, we generated clusters of the assimilative
isoforms (EC:1.4.1.4) and, as a control, the mitochon-
drial enzymes (EC:1.4.1.2) which tend to operate in the
reverse direc tion, i.e. glutamate degradation, by oxidiz-
ing it and delivering ammonium, loading electrons in
NAD+ co-enzyme. In yeast, the cytoplasmic assimilative
isoforms are named GDH1 and GDH3, and the catabolic
(mitochondrial) is known as GDH2. Arabidopsis thali-
ana proteins were also used as seed together with the
Saccharomyces cerevisiae sequences: one known as puta-
tive GDH which grouped with the fungi assimilative
ones, and three catabolic GDHs, that grouped with the
human mitochondrial GLUD1, though not with the
yeast catabolic GHD2. Results are shown in Figure 3A.
The left column shows a cluster that groups assimilative
isoforms with the two from yeast and the putativ e GDH
from A. thaliana. T he catabolic mitochondrial isoforms
from yeast (central column) and plant (right column)
formed two independent clusters. In metazoan organ-
isms, an assimilative enzyme was found in the basal
group Cnidaria, all others being dependent on amino
acid consumption to build nitrogenated compounds
such as DNA, Porifera included. Assimilative isoforms
were also lacking in Choanozoa although complete gen-
omes are unav ailable. The same was observed for Placo-
zoa. Comparing these results with those shown in
Figure 1, it is remarkable that Choanozoa, while st ill
registering many amino acid biosynthetic enzymes (37
out of 61, redundancy eliminated) shows a simultaneous
deletion in both EAAs biosynthesis and nitrogen assimi-
lation. It is also apparent that the Great Genomic Dele-
tion attains its almost final broad dist ribution in
Cnidaria, which may be the last metazoan clade still
capable to a ssimilate nitrogen from free ammonium.
Therefore a few biosynthetic enzymes remain, in this
clade and other Meta zoa, probably by connective func-
tions in metabolism (e.g. EC: 1.2.1.31 aminoadipate-
semialdehyde dehydrogenase K5 and EC: 1.5.1.7 sacchar-
opine dehydrogenase K7 also participates in the lysine
degradation pathway). We have also observed that mam-
malian GDH (GLUD1) presents a specialized allosteric
control [24] which might have turned the enzyme
toward glutamate catabolism rather than anabolism.
Such control was first observed in Ciliophora [25] and it
is thought to have been transferred by lateral gene
transfer to the metazoan ancestor [26]. To confirm the
grouping in three clusters of enzymes with so similar
activities, Figure 3B shows a phylogenetic tree built with
eukaryotic glutamate dehydrogenase sequences, whic h
clustered the isoforms in total accordance with data
shown in Figure 3A.
The non-Metazoa eukaryotes with complete g enomes,
such as Alveolata, Apicomplexa and Euglenozoa, lack
EAA biosynthetic enzymes (Figure 1) but keep the cap-
ability of nitrogen assimilation (Figure 3). Fornicata and
Parabasalia, although represented only by draft genomes,
have shown to conta in the nitrogen assimilation enzyme
even if they appear to be auxotrophic for all EAAs.
Lacking detection of any isoform of glutamate dehydro-
genase and with available draft genomes is Rhizaria (no
complete genomes available), which still presents some
EAA biosynthetic capability. It is possible that the
dependency of organic nitrogen has been attained earlier
in Rhizaria, although complete sequencing is required
for a sound conclusion. I n general, data support a ten-
dency for nitrogen heterotrophy succeeding the amino
acid essentiality. In Rhodophyta, a clade con taining
complete genomes sequenced, surprisingly no catabolic
homologues were found; however a sequence that clus-
ters with the assimilative isoforms has been found.
We also investigated nitrogen assimilation in prokar-
yotes. Homologues of assimilative enzymes are present
and detected by our clustering procedure, but besides
finding homologues of the catabolic seeds in bacterial
clades, assimilative enzymes were not found in Aquifi-
cae, Chlamydiae and Synergistetes, all of them contain-
ing complete genomes available. This absence is
consistent with the lysine auxotrophy suggested in Chla-
mydiae (Figure 2) and support the idea that EAA auxo-
trophy is associated with the lack of nitrogen
ass imil atio n even in the prokaryoti c clades. It is hard to
infer differential enzymatic activity in prokaryotes, since
the anno tated se quences available often report mixed
use of coenzyme, either NADPH or NAD, although t he
homol ogous tools had g rouped them distinctively. If the
homology is related to function, it may indicate that
these organisms also demand the consumption of
NEAA to constitute a source of organic nitrogen. The
presented scenario suggests that the loss of nitrogen
assimilation forcing consumption of NEAA shortly suc-
ceeds the Great Genomic Deletion of EAA biosynthetic
enzymes in metazoans. If this hypothesis is true, the
Cnidaria would be an exception.
EAA biosynthetic enzymes maintained
The remain ing EAA biosynthetic enzymes in organisms
that do not have the compl ete amino acid path way (Fig-
ure 1) are more susceptible to evolutionary
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
Page 6 of 13
Page 6
modifications. It is also possible that paralogue subfunc-
tionalization occurred in the common ancestor of ani-
mals, fungi and plants, and thus the divergent copy has
remained in detriment of the original gene. Considering
both hypothesis we set up to analyze enzymes from
EAA and functional NEAA pathways present in metazo-
ans. Phylogenetic trees for acetolactate synthase (VIL1
code in Figure 1) and for a group of alanine-glyoxylate,
serine-glyoxylate an d serine-pyruvate transaminases (G 1
code in Figure 1) are represented in Figure 4. As
expected, the distance between the ancestors of t he two
prototrophic groups var ies, plant (green circles) and
fungi (yellow circles): 0.4 and 0.7, for VIL1 ( Figure 4A)
and G1 (Figure 4B), respectively. The distance fr om the
ancestors of plant (green circles) to metazoans (red cir-
cles) are relatively higher for the remaining enzyme
VIL1: 1.0 (as compared to 0.4 measured fr om plant to
fungi, 2.5 fold) than for the NEAA biosynthetic enzyme
G1:0.7(ascomparedto0.7measuredfromplantto
fungi, 1.0 fo ld). Thus, the remaining EAA enz ymes are
Figure 3 Glutamate dehydrogenases. Schematic representation for presence/absence of glut amate dehydrogena ses. A: Left column :
assimilative GDH1 and GDH3 from Saccharomyces cerevisiae and putative GDH from Arabdopsis thaliana; Central column: catabolic GDH2 from
Saccharomyces cerevisiae; Right column: catabolic GDH1, GDH2 and GDH3 from Arabdopsis thaliana. Taxonomic tree displayed at phyla level.
Circles represent detection of complete proteins and triangles detection of complete and fragmented proteins. Colors are as for Figure 1.
Saccharomyces cerevisiae (Ascomycota) and Arabidopsis thaliana (Streptophyta) were used as seeds. B: Phylogenetic tree with eukaryotic
sequences from glutamate dehydrogenase isoforms. Green branches: EC1.4.1.4; Red branches: EC:1.4.1.2; Blue branches: EC:1.4.1.3.
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
Page 7 of 13
Page 7
Figure 4 Phylogeneti c analyses for EAA and NEAA enzymes. Phylogenetic trees for (A) acetolactate synthase (VIL1 code in Figure 1), an
enzyme for EAA valine, isoleucine and leucine biosynthesis and (B) a group of alanine-glyoxylate, serine-glyoxylate and serine-pyruvate
transaminases (G1 code in Figure 1), a NEAA biosynthetic enzyme for glycine biosynthesis. The green, yellow and red circles are marking the
plant (Streptophyta), fungi (Dikarya) and animals (Metazoa) branches, respectively. In (A), the distance (given by substitutions per site) from the
green circle to the yellow and red circles are, respectively, 0.4 and 1.0. In (B), these values are, respectively, 0.7 and 0.7.
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
Page 8 of 13
Page 8
experiencing higher divergence after the attainment of
amino acids auxotrophy.
To support this observation, Figure 5 shows the ratios
calculated for 12 enzymes. Only trees that show signifi-
cant bootstraps for the branches of interest were consid-
ered. Enzyme codes in bars are descr ibed as in Figu re 1.
The Y axis at the right side corresponds to the distance
measured from plant (Streptophyta) to the ancestor of
fungi (Dikarya). This distance was assumed as a back-
ground distance to normalize the dista nces measured
from plant (green bars) to the clades indicated in the
X axis. Th e three enzymes on th e right, S1, G1 and G2,
belong to NEAA pathways, a nd the ratios are low. For
the enzymes H5, FW7, F8, VIL1, VIL3, MT3 and M7,
the ratio shown by green bars are conversely high, ran-
ging from around 1.5 up to 7 fold. These preliminary
data suggest that the additional evolutionary modifica-
tions h ave occurred in distinct levels in the enzymes
maintained after the loss of biosynthetic capability. M(2)
pathway appears as incomplete in Basidiomycota (Figure
1; M8 is absent), however MT3 enzyme used here is
present in threonine pathway which is complete in this
clade. K6 and K10 are involved in i ncomplete pathways,
respectively, in plants and fungi. Accordingly, the dis-
tance measured from plant to f ungi is high, and so is
the drift between plant to Chordata (K6) or to Arthro-
poda (K10), therefore yielding balanced lower ratios.
Since the ancestor of fungi and plants seems to be
equally distant from both of these two groups, and the
divergence between pl ant and Fungi /Metazoa group
tends to a trifurcation (see Figure 4), the yellow bars
(which represent t he distance f rom fungi to the animal
clades in the X axis divided by the background distance
from plant to fungi ) are similar to the ratios represented
by the green bars, independently of how much modifica-
tion has been occurred to the animal sequences (e.g.
VIL1,MT3,G1).Furthermore,adetailedinspectionof
phylogenetic trees seems to indicate that subfunctiona-
lized par alogues have appeared in basal clades such as
Fungi, and those divergent par alogues r emain in the
more recent groups of organisms, while the copy that
previously participated in the b iosynthesis was actually
deleted in animals. Note some S treptophyta and Asco-
mycota divergent paralogues (outparalogues) [27]
grouped with animal sequences under 100% bootstrap
(Figure 4A). Accordingly, similar divergent paralogues
were observed for M7 enzyme (Ascomycota and Basi-
diomycota divergent paralogues grouped with animal
sequences, 98% boo tstrap, see additional file 2: Phyloge-
netic tree of 5-methyltetrahydropteroyltriglutamate
homocysteine methyltransferase (M7)). Moreover, for
K10 enzyme that participates in the K biosynthetic path-
way which is defective in fungi, a divergent paralogue
from Streptophyta groups with fungi enzymes (92%
bootstrap) near the Arthropoda sequence (Additional
file 3: Phylogenetic tree of dihydrodipicolinate synthase
(K10)). Thus, th e enzymes remaining from biosynthetic
pathways show higher divergence, and this might have
been acquired due to subfunctionalization in ancient
clades.
Discussion
The advance on genom e seq uencing and comp utational
methods for clustering homologous proteins has been
helping the scientific community to reevalua te several
aspects of basic biology. Here we have applied clustering
of protein sequences chosen from two clades of organ-
isms that are kn own to be autotr ophic for the biosynth -
esis of Essential Amino Acids (EAAs). Furthermore, we
searched for the en zymes responsible for nitrogen
assimilation, incorporating ammonium into glutamate.
Lack of cytoplasmic glutamate dehydrogenase leads to a
dependency of amino acids consumption as the source
of organic nitrogen, i.e., the organism in a certain sense
actually becomes auxotrophic to both EAAs and NEAAs
(Non-Essential Amino A cids), in order to build other
nitrogen-containing molecules.
Figure 5 Relative distance of M etazoa enzymes from
homologues of EAA and from NEAA biosynthetic enzymes
present in plant and fungi. Phylogenetic trees were obtained for
12 enzymes, using all eukaryotic clustered proteins. Codes for
enzymes are the same as in Figure 1 and are shown over the bars.
For normalization, a background distance from the plant phylum
Streptophyta to the fungi subkingdom Dikarya was measured and
represented by triangles (right Y axis). The distance from either
Streptophyta (green bars) or Dikarya (yellow bars), to the branches
that group the clades indicated below the bars, were measured and
normalized by the distance Streptophyta/Dikarya, yielding the ratio
represented by bars (left Y axis). Only the three enzymes on the
right (S1, G1 and G2) participate of biosynthesis of NEAAs: serine
(S1) and glycine (G1 and G2). K6 and K10 are enzymes that
compose lysine biosynthetic pathways which are not complete,
respectively, in Streptophyta or Dikarya (see Figure 1). Abbreviations:
Art, Arthropoda; Cho, Choanozoa; Cni, Cnidaria; Nem, Nematoda; Pla,
Placozoa.
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
Page 9 of 13
Page 9
The work presented here takes advantage of both the
Seed Linkage software and a home-built UniProt
Enriched KEGG Orthology database (UEKO) as source
of information, to rapidly group homologues of fungi
and plant amino acid sequences, respectively repre-
sented by Saccharomyces cerevisiae and Arabidopsis
thaliana. KEGG Orthology contains to date more than
1 million sequences from nearly 1,000 genomes and it
was enriched by a procedure develo ped by our group to
attain 2,442,384 seque nces from 25,024 organisms, con-
stituting the UEKO database (Uni Ref50 enriched KEGG
Orthology database, to be published elsewhere and
further distributed). Counting the total r ecruited
sequences reported in this work (31,392 ), the percentag e
of recruitment by (i) Seed Linkage, ( ii) original KO or
(iii) the enriched portion of KO (UEKO) was, respec-
tively, 6%, 44% and 50% . Moreo ver, 26% of a ll detected
enzymes for the phyla represented in Fig ures 1, 2 and 3
were exclusively detected by Seed Linkage software and/
or UEKO database. These numbers reinforce the rele-
vance on the developme nt of homologous searching
capability, improving the ability of KEGG Orthology
database to build a scenario for the biological processes
of interest such as those presented here. Moreover, on
top of the sea rch for homologues repres ented by circles
in the Figures, a complementary search using the 31,392
clustered sequences allowed the investigation of all Uni-
Prot sequences, inclu ding fragme nts (e.g. UniProt acces-
sion B7QGP4, VIL1 from Arthropoda) and some full
length proteins not accessed by the initial search (e.g.
UniProt acc ession D3AYE6 , co mplete protein K14, from
Amoebozoa; actually a more recent version of KO
already incorporates this entry). It is important to notice
that, in UniProt, the technical term fragment is applied
to partial CDS sequences, a product of incompletely
sequenced mRNA, as well as amino acid sequences
modeled from the genome that lack initial methionine.
Thus they might represent additional evidence of the
enzyme presence rather than a remi niscent pseudogene.
Stringent criteria (1x10
-10
e-value, 50% identity and 50%
subject coverage cutoffs) were adjusted with extensive
manual inspection and additional evidences were
included as triangles in the Figures. One evidence col-
lected as triangle c laimed our attention, since it came
from a clade bearing the complete genome of t he well
annotated organism Drosophila melanogaster (Figure 1,
enzyme VIL1, phylum Arthropoda). Manual inspection
reveals that the evidence yielded by the additional search
(represented by triangl e) returned a hit from Ixodes sca-
pularis (a genome under assembly status), but remark-
ably, the gene was found to be missing in the fly. Thus,
this represents a recent gene loss within a non func-
tional pathway.
The main interest of this work was to depict the evo-
lution of amino acids essentiality, or heterotrophy.
Grouping organisms into phyla level allowed easy l abel-
ing of c lades that comprise organisms with sequenced
or draft genomes, as shown in Figures 1, 2 and 3, mak-
ing it possible to infer deletion events distinctively i n
these clades. It is important to notice that many phyla
contain complete genomes, which allowed us t o figure
out the deletion process with more certainty. However,
the picturing of the entire scenario allowed the analysis
to be extended to the branched clades, although this
requires additional caution on interpretation. Even
escaping the scope of this work, it suggests a demand
for planned cho ice of geno mes to be completely
sequenced, since as clearly shown here we lack informa-
tion from several p hyla such as the ones represented
with empty circles (e.g. Cryptophyta, Haptophyta, Neo-
callimastigomycota and Glaucophyta). Enzymes not
found by our analysis requires further attention and
search using more sensitive methods and det ailed man-
ual or even experimental analysis, to detect divergent
sequences; in other words, the absence of evidence is
not evidence of absence. However, the present work
exemplifies a method that can be easily applied to other
scenarios of gene/pathway loss.
The scenario of amino acid auxotrophy supports the
hypothesis of a Great Genomic Deletion model of
amino acid biosynthesis in association with heterotro-
phy. This phenomenon has probably occurred several
times, particul arly at the origin of metazoans. This dele-
tion has been lik ely associated with endosym biotic rela-
tionships or with the development of systems
specialized in nutrient absorption. It seems that amino
acid essentiality has been originated as a phenotypic loss
of pathways early in Choanozoa, followed by multiple
losses during metazoan evolution. Similar progresses o f
deletions o ccur clo ser to Heterokontophyta and Rhi-
zaria, culminating in Apicomplexa. Rhodophyta and
Microsporidia also attain the auxotrophy.
Moreover, remaining enzymes set apart from their ori-
ginal roles in amino acid biosynthetic metabolism seem
to be more prone to e volutionary changes whilst
enzymes present in complete pathways are more struc-
turally conserved among dis tant phyla (Figures 4 and 5).
Although a detailed investigation is needed, our preli-
minary analy sis suggests that the copies which remained
in metazoan genomes may have suffered subfunctionali-
zation and sometimes this might have occurred in mor e
ancestral organisms (Figure 4 and additional files 2 and
3). Thus , in some sense, theorthologueenzymemight
actually have been deleted in animals, and the divergent
copy is the one remaining. These divergent copies are
sometimes named outparalogues. We are currently
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
Page 10 of 13
Page 10
investigating substi tution r ate ratios and promoter ele-
ments in these genes.
Subsequent deletion includes the enzymes implicated
in nitrogen assimilation, which takes place just after the
broad deletion of EAAs biosynthetic enzymes (sinc e
except metazoans, other eukaryotic clades lack biosyn-
thetic pathways and contains a nitrogen assimilative
enzyme), as observed in more derived metazoans, but
not Cnidaria. Mo st Cni daria are carnivo rous, s o on e
possibility is that Cnidaria may benefit fr om the assimi-
lation of organic n itrogen under long periods of fasting,
however this finding needs additional investigation.
Thus, the simplest explanation, is that the loss of nitro-
gen assimilative enzymes are related to lower selective
pressure associated with th e origin of the most hetero-
trophic organisms, animals.
To our knowledge this is the first initiative to clarify
thecompletescenariousingpowerfulhomologous
grouping approaches and the total repertoire of
sequenced genomes.
Conclusions
The procedures described here provide a deeper analysis
of amino acid and nitrog en heterotrophy among distinct
taxa, extended to include the entire set of available pro-
teins. They show that amino acid essentiality was a
broad phenomenon in eukaryotes, followed by the sub-
sequent nutritional requirement of organic nitrogen, in
animals.
Methods
Software and databases
Seed Linkage clustering software [14] and detailed
explanation of usability canbeobtainedathttp://www.
biodados.icb.ufmg.br/eaa/. Seed Linkage requires BLAST
(version used was 2.2.20), MySQL (version 5.0.77) [28]
and PHP (version 5.1.6) [29].
The protein database is composed of UniProtKB
entries (version used was 2010_09) available at http://
www.biodados.icb.ufmg.br/eaa/. Except where otherwise
indicated, all fragmented proteins were removed from
analyses by parsing the description line in FASTA files.
To enrich KEGG Orthology clusters with incomplete
genome proteins UniRef50 Enriched KEGG Orthology
(UEKO) was built with the procedu re described by Fer-
nandes et al [15]. A local MySQL database was used.
Procedure
Amino acid biosynthetic pathways were d epicted with
KEGG Pathway [30] manual inspection where UniProtKB
identifiers f or the enzymes used in this w ork could be
retrieved for the model autotrophic organisms Saccharo-
myces cerevisia e, Arabidopsis thaliana and, for the
archaeal lysine biosynthesis, Pyrococcus horikoshii .The
procedure starts with the selected sequences used as seed
for Seed Linkage search in UniProtKB. The homologous
cluster is enriched by (i) entries in K EGG Orthology
(KO) belonging to the same KO where the seed is found
and (ii) UEKO ent ries for this s ame KO. All steps were
conducted with MySQL consults and PERL v5.8.8 [31]
scripts. To v erify the recruitment, seed sequences were
used in PSI-BLAST alignm ents w ith t he r ecruited
sequences, having the PSI-BLAST iterations stopped
whenever the score obtained for the seed sequence itself
decreases to below 50% of the initial score. Results of
search for homologues are represented by circles in the
Figures. For more details see additional file 4: List of seed
sequences and additional file 5: List of clusters.
Simple BLA STp ana lysis (10
-10
e-value cutoff) were
also conducted with all UniProt proteins, comprising
both UniProt complete and fragment entries, for each
phylum against all clustered proteins in this project.
Resulting output was filtered to remove alignments with
less than both 50% identity and 50% subject coverage.
Results of this analysis are represented by triangles in
the Figures.
Taxonomy information
All UniProtK B identifiers could be associated with an
organism taxonomy ID with the file available at ftp://ftp.
uniprot.org/pub/databases/uniprot/current_release/
knowledgebase/idmapping.
Further a ssociation of organism taxonomy ID with
phyla classification was achieved through a local data-
base built with NCBI taxonomy information obtained at
ftp://ftp.ncbi.nih.gov/pub/taxonomy.
Genome statuses were obtained by NCBI Genome
Project analysis at: http://www.ncbi.nlm.nih.gov/
genomeprj.
Phylogenetic analyses
For phylogenetic analysis Prankster [32] was used for
multiple sequence alignment and MEGA4 [33] to con-
structthephylogenetictreeusingtheneighbor-joining
method [34] with 500 bootstrap replicates. Branch dis-
tances were obtained from phylogenetic trees, from the
ancestors of Streptophyta, Dikarya and clades of metazo-
ans. Only branches with significant bootstrap were used.
With the distances, a ratio was calculated as below:
Distance F - T / Distance S - D
where F (from) is either Streptophyta or Dikarya
ancestor and T (to) i s an animal ancestor (see Figure 5,
X axis); and S and D are the ancestors of Str eptophyta
and Dikarya, respectively. Phylogenetic tree s used to
compose Figure 5 can be accessed at our server at
http://www.biodados.icb.ufmg.br/eaa/.
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
Page 11 of 13
Page 11
Additional material
Additional file 1: Sequences and genome status distributio n.
Distribution of UniProtKB sequences among available genomes in three
sequencing status groups: Complete, Draft plus In Progress and
Incomplete.
Additional file 2: Phylogenetic tree of 5-
methyltetrahydropteroyltriglutamatehomocysteine
methyltransferase (M7). A phylogenetic tree of one of the four
methyltransferases illustrated in Figure 1 for methionine biosynthesis. Red
circle represents Chordata and Cnidaria ancestor; Yellow circle Dikarya
ancestor and green circle Streptophyta ancestor. Available at [http://
www.biodados.icb.ufmg.br/eaa/].
Additional file 3: Phylogenetic tree of dihydrodipicolinate synthase
(K10). A phylogenetic tree of one of the enzymes illustrated in Figure 1
for lysine biosynthesis. Red circle represents Arthropoda; Yellow circle
Dikarya ancestor and green circle Streptophyta and Chlorophyta
ancestor. Available at [http://www.biodados.icb.ufmg.br/eaa/].
Additional file 4: List of seed sequences. A detailed list of sequences
used as initiators for clustering process with UniProtKB identifier, NCBI
taxonomy identifier and Enzyme Commission (EC) number. Available at
[http://www.biodados.icb.ufmg.br/eaa/].
Additional file 5: List of clusters. A detailed list of created clusters for
all enzymes with UniProtKB identifier and NCBI taxonomy identifier.
Available at [http://www.biodados.icb.ufmg.br/eaa/].
List of abbreviations
COG: Cluster of Orthologous Groups; EAAs: Essential Amino Acids; GDH:
Glutamate dehydrogenase; KEGG: Kyoto Encyclopedia of Genes and
Genomes; KO: KEGG Orthology; NEAAs: Non-Essential Amino Acids; UEKO:
UniRef50 Enriched KEGG Orthology.
Acknowledgements
Authors thank Dr. Darren Natale from PIR (USA) and Elisa Donnard (LICR) for
critically reviewing this manuscript, Henrique Velloso for helping with
taxonomic data and Laryssa Santos Queiroz with pathway inspections. This
work has been sponsored by the Brazilian Ministry of Education (CAPES) and
Foundation for Research Support of Minas Gerais State (FAPEMIG).
This article has been published as part of BMC Genomics Volume 12
Supplement 4, 2011: Proceedings of the 6th International Conference of the
Brazilian Association for Bioinformatics and Computational Biology (X-
meeting 2010). The full contents of the supplement are available online at
http://www.biomedcentral.com/1471-2164/12?issue=S4
Author details
1
Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas,
Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, MG, Brazil.
2
Programa de Pós-Graduação em Ciências Genômicas e Biotecnologia,
Universidade Católica de Brasilia, Brasilia, 70790-160, DF, Brazil.
3
Instituto de
Bioquímica Médica, Universidade Federal do Rio de Janeiro, Rio de Janeiro,
21941-902, RJ, Brazil.
Authors contributions
The work presented here was carried out in collaboration between all
authors. FP and JMO defined the research theme. RLMG developed the
clustering procedure, created the dataset and conducted the experiments.
RLMG and GFR created the figures. RLMG, FP and LKM conducted
phylogenetic analyses. GRF created the procedure of Uniref50 enrichment of
KEGG Orthology database. HALR developed the PSI-BLAST validation
method. JMO, FP and RLMG wrote the paper. All authors supervised and
approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Published: 22 December 2011
References
1. Blair J, Shah P, Hedges SB: Evolutionary sequence analysis of complete
eukaryote genomes. BMC Bioinformatics 2005, 6:53.
2. Cunchillos C, Lecointre G: Early steps of metabolism evolution inferred by
cladistic analysis of amino acid catabolic pathways. Comptes Rendus
Biologies 2002, 325:119-129.
3. Cunchillos C, Lecointre G: Ordering events of biochemical evolution.
Biochimie 2007, 89:555-573.
4. Hernández-Montes G, Díaz-Mejía JJ, Pérez-Rueda E, Segovia L: The hidden
universal distribution of amino acid biosynthetic networks: a genomic
perspective on their origins and evolution. Genome Biol 2008, 9:R95.
5. Reeds PJ, Wahle KWJ, Haggarty P: Energy costs of protein and fatty acid
synthesis. Proceedings of the Nutrition Society 1982, 41:155-159.
6. Aoyagi Y, Tasaki I, Okumura J, Muramatsu T: Energy cost of whole-body
protein synthesis measured in vivo in chicks. Comp Biochem Physiol A
Comp Physiol 1988, 91:765-768.
7. Millward DJ: Metabolic Demands for Amino Acids and the Human
Dietary Requirement: Millward and Rivers (1988) Revisited. The Journal of
Nutrition 1998, 128:2563S-2576S.
8. Millward DJ, Rivers JP: The nutritional role of indispensable amino acids
and the metabolic basis for their requirements. Eur J Clin Nutr 1988,
42:367-393.
9. Elango R, Ball R, Pencharz P: Amino acid requirements in humans: with a
special emphasis on the metabolic availability of amino acids. Amino
Acids 2009, 37:19-27.
10. Payne SH, Loomis WF: Retention and Loss of Amino Acid Biosynthetic
Pathways Based on Analysis of Whole-Genome Sequences. Eukaryotic Cell
2006, 5:272-276.
11. Consortium TU: The Universal Protein Resource (UniProt) in 2010. Nucleic
Acids Research 2010, 38:D142-D148.
12. Tatusov R, Fedorova N, Jackson J, Jacobs A, Kiryutin B, Koonin E, Krylov D,
Mazumder R, Mekhedov S, Nikolskaya A, et al: The COG database: an
updated version includes eukaryotes. BMC Bioinformatics 2003, 4:41.
13. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGG resource
for deciphering the genome. Nucleic Acids Research 2004, 32:D277-D280.
14. Barbosa-Silva A, Satagopam V, Schneider R, Ortega JM: Clustering of
cognate proteins among distinct proteomes derived from multiple links
to a single seed sequence.
BMC Bioinformatics 2008, 9:141.
15.
Fernandes GR, Barbosa DVC, Prosdocimi F, Pena IA, Santana-Santos L,
Coelho Junior O, Barbosa-Silva A, Velloso HM, Mudado MA, Natale DA, et al:
A procedure to recruit members to enlarge protein family databases
the building of UECOG (UniRef-Enriched COG Database) as a model.
Genetics and molecular research GMR 2008, 7:910-924.
16. Suzek B, Huang H, McGarvey P, Mazumder R, Wu C: UniRef:
comprehensive and non-redundant UniProt reference clusters.
Bioinformatics 2007, 23:1282-1288.
17. Katinka MD, Duprat S, Cornillot E, Metenier G, Thomarat F, Prensier G,
Barbe V, Peyretaillade E, Brottier P, Wincker P, et al: Genome sequence and
gene compaction of the eukaryote parasite Encephalitozoon cuniculi.
Nature 2001, 414:450-453.
18. Prosdocimi F, Mudado MA, Ortega JM: A set of amino acids found to
occur more frequently in human and fly than in plant and yeast
proteomes consists of non-essential amino acids. Computers in Biology
and Medicine 2007, 37:159-165.
19. Santana-Santos L, Prosdocimi F, Ortega JM: Essential amino acid usage
and evolutionary nutrigenomics of eukaryotesinsights into the
differential usage of amino acids in protein domains and extra-domains.
Genetics and molecular research GMR 2008, 7:839-852.
20. Miyazaki T, Miyazaki J, Yamane H, Nishiyama M: α-Aminoadipate
aminotransferase from an extremely thermophilic bacterium, Thermus
thermophilus. Microbiology 2004, 150:2327-2334.
21. Velasco AM, Leguina JI, Lazcano A: Molecular Evolution of the Lysine
Biosynthetic Pathways. Journal of Molecular Evolution 2002, 55:445-449.
22. Hudson AO, Bless C, Macedo P, Chatterjee SP, Singh BK, Gilvarg C,
Leustek T: Biosynthesis of lysine in plants: evidence for a variant of the
known bacterial pathways. Biochim Biophys Acta 2005, 1721:27-36.
23. Nishida H, Nishiyama M, Kobashi N, Kosuge T, Hoshino T, Yamane H: A
Prokaryotic Gene Cluster Involved in Synthesis of Lysine through the
Amino Adipate Pathway: A Key to the Evolution of Amino Acid
Biosynthesis. Genome Research 1999, 9:1175-1183.
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
Page 12 of 13
Page 12
24. Smith TJ, Schmidt T, Fang J, Wu J, Siuzdak G, Stanley CA: The Structure of
Apo Human Glutamate Dehydrogenase Details Subunit Communication
and Allostery. Journal of Molecular Biology 2002, 318:765-777.
25. Allen A, Kwagh J, Fang J, Stanley CA, Smith TJ: Evolution of Glutamate
Dehydrogenase Regulation of Insulin Homeostasis Is an Example of
Molecular Exaptation. Biochemistry 2004, 43:14431-14443.
26. Andersson J, Roger A: Evolution of glutamate dehydrogenase genes:
evidence for lateral gene transfer within and between prokaryotes and
eukaryotes. BMC Evolutionary Biology 2003, 3:14.
27. Sonnhammer ELL, Koonin EV: Orthology, paralogy and proposed
classification for paralog subtypes. Trends in Genetics 2002, 18:619-620.
28. MySQL. [http://www.mysql.com].
29. PHP. [http://www.php.net].
30. Kanehisa M: KEGG: From genes to biochemical pathways. Bioinfomatics:
Databases and Systems Kluwer Academic Publishers; 1999, 63-76.
31. Perl. [http://www.perl.org/].
32. Löytynoja A, Goldman N: An algorithm for progressive multiple
alignment of sequences with insertions. Proceedings of the National
Academy of Sciences of the United States of America 2005, 102:10557-10562.
33. Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary
Genetics Analysis (MEGA) Software Version 4.0. Molecular Biology and
Evolution 2007, 24:1596-1599.
34. Saitou N, Nei M: The neighbor-joining method: a new method for
reconstructing phylogenetic trees. Molecular Biology and Evolution 1987,
4:406-425.
doi:10.1186/1471-2164-12-S4-S2
Cite this article as: Guedes et al.: Amino acids biosynthesis and nitrogen
assimilation pathways: a great genomic deletion during eukaryotes
evolution. BMC Genomics 2011 12(Suppl 4):S2.
Submit your next manuscript to BioMed Central
and take full advantage of:
Convenient online submission
Thorough peer review
No space constraints or color figure charges
Immediate publication on acceptance
Inclusion in PubMed, CAS, Scopus and Google Scholar
Research which is freely available for redistribution
Submit your manuscript at
www.biomedcentral.com/submit
Guedes et al. BMC Genomics 2011, 12(Suppl 4):S2
http://www.biomedcentral.com/1471-2164/12/S4/S2
Page 13 of 13
Page 13
    • "Here we have performed a conservation study of all genes involved in EAA biosynthesis that remained in metazoan genomes long after the deletion of their pathway partners. We performed manual curation of the enzymes involved in biosynthetic pathways for EAA in autotrophic organisms [2], selected the eight homologous genes present in metazoan genomes and studied the evolutionary fate of seven of them (excluding AadAT). Why are these genes retained in the metazoan genomes if they no longer participate in amino acid biosynthesis? "
    [Show abstract] [Hide abstract] ABSTRACT: Essential amino acids (EAA) consist of a group of nine amino acids that animals are unable to synthesize via de novo pathways. Recently, it has been found that most metazoans lack the same set of enzymes responsible for the de novo EAA biosynthesis. Here we investigate the sequence conservation and evolution of all the metazoan remaining genes for EAA pathways. Initially, the set of all 49 enzymes responsible for the EAA de novo biosynthesis in yeast was retrieved. These enzymes were used as BLAST queries to search for similar sequences in a database containing 10 complete metazoan genomes. Eight enzymes typically attributed to EAA pathways were found to be ubiquitous in metazoan genomes, suggesting a conserved functional role. In this study, we address the question of how these genes evolved after losing their pathway partners. To do this, we compared metazoan genes with their fungal and plant orthologs. Using phylogenetic analysis with maximum likelihood, we found that acetolactate synthase (ALS) and betaine-homocysteine S-methyltransferase (BHMT) diverged from the expected Tree of Life (ToL) relationships. High sequence conservation in the paraphyletic group Plant-Fungi was identified for these two genes using a newly developed Python algorithm. Selective pressure analysis of ALS and BHMT protein sequences showed higher non-synonymous mutation ratios in comparisons between metazoans/fungi and metazoans/plants, supporting the hypothesis that these two genes have undergone non-ToL evolution in animals.
    Full-text · Article · Jan 2015 · Nutrients
    • "See Table S4, lines 14–30, for full protein names, UniProt accession numbers, and transcript numbers. methionine is an amino acid essential in the diet) (Guedes et al. 2011) but surprisingly inconsistent with labeling results indicating synthesis of methionine by starved aposymbiotic anemones (Wang and Douglas 1999). A related puzzle is that the Aiptasia transcriptome and the A. digitifera genome appear to contain genes encoding both a homoserine O-acetyltransferase and a cystathionine g-synthase, which would allow the synthesis of cystathionine from homoserine (Figure 5), but not a cystathionine b-lyase, which in many microorganisms is responsible for the synthesis of homocysteine from cystathionine (Table S5, lines 18–20). "
    [Show abstract] [Hide abstract] ABSTRACT: Coral reefs provide habitats for a disproportionate number of marine species relative to the small area of the oceans that they occupy. The mutualism between the cnidarian animal hosts and their intracellular dinoflagellate symbionts provides the nutritional foundation for coral growth and formation of reef structures, as algal photosynthesis can provide >90% of the host's total energy. Disruption of this symbiosis ("coral bleaching") is occurring on a large scale due primarily to anthropogenic factors and poses a major threat to the future of coral reefs. Despite the importance of this symbiosis, the cellular mechanisms involved in its establishment, maintenance, and breakdown remain largely unknown. Here we report our continued development of genomic tools to study these mechanisms in Aiptasia, a small sea anemone with great promise as a model system for studies of cnidarian-dinoflagellate symbiosis. Specifically, we have generated de novo assemblies of the transcriptomes of both a clonal line of symbiotic anemones and their endogenous dinoflagellate symbionts. We then compared transcript abundances in animals with and without dinoflagellates. This analysis identified >900 differentially expressed genes and allowed us to generate testable hypotheses about the cellular functions affected by symbiosis establishment. The differentially regulated transcripts include >60 encoding proteins that may play roles in transporting various nutrients between the symbiotic partners; many more encoding proteins functioning in several metabolic pathways, providing clues as to how the transported nutrients may be used by the partners; and several encoding proteins that may be involved in host recognition and tolerance of the dinoflagellate.
    Full-text · Article · Dec 2013 · G3-Genes Genomes Genetics
    • "For anabolic biochemistry, the biosynthesis of heme, nucleotides and multiple coenzymes is undertaken by many unicellular and multicellular eukaryotes. In contrast, biosynthetic pathways leading to the ''essential'' amino acids have been lost from animals and many taxonomically diverse protists (Guedes et al., 2011; Payne & Loomis, 2006). The latter organisms tend to be either phagotrophs or parasites. "
    [Show abstract] [Hide abstract] ABSTRACT: Abstract Eukaryogenesis, the origin of the eukaryotic cell, represents one of the fundamental evolutionary transitions in the history of life on earth. This event, which is estimated to have occurred over one billion years ago, remains rather poorly understood. While some well-validated examples of fossil microbial eukaryotes for this time frame have been described, these can provide only basic morphology and the molecular machinery present in these organisms has remained unknown. Complete and partial genomic information has begun to fill this gap, and is being used to trace proteins and cellular traits to their roots and to provide unprecedented levels of resolution of structures, metabolic pathways and capabilities of organisms at these earliest points within the eukaryotic lineage. This is essentially allowing a molecular paleontology. What has emerged from these studies is spectacular cellular complexity prior to expansion of the eukaryotic lineages. Multiple reconstructed cellular systems indicate a very sophisticated biology, which by implication arose following the initial eukaryogenesis event but prior to eukaryotic radiation and provides a challenge in terms of explaining how these early eukaryotes arose and in understanding how they lived. Here, we provide brief overviews of several cellular systems and the major emerging conclusions, together with predictions for subsequent directions in evolution leading to extant taxa. We also consider what these reconstructions suggest about the life styles and capabilities of these earliest eukaryotes and the period of evolution between the radiation of eukaryotes and the eukaryogenesis event itself.
    Full-text · Article · Jul 2013 · Critical Reviews in Biochemistry and Molecular Biology
Show more

Similar publications

Discover more