Phylogenetic mapping of intron positions: a case study of translation initiation factor eIF2gamma.
ABSTRACT Eukaryotic translation initiation factor 2 (eIF2) is a G protein that delivers the methionyl initiator tRNA to the small ribosomal subunit and releases it upon GTP hydrolysis after the recognition of the initiation codon. eIF2 is composed of three subunits, alpha, beta, and gamma. Subunit gamma shows the strongest conservation, and it confers both tRNA and GTP/GDP binding. Using intron positioning and protein sequence alignment, here we show that eIF2gamma is a suitable phylogenetic marker for eukaryotes. We determined or completed the sequences of 13 arthropod eIF2gamma genes. Analyzing the phylogenetic distribution of 52 different intron positions in 55 distantly related eIF2gamma genes, we identified ancient ones and shared derived introns in our data set. Obviously, intron positioning in eIF2gamma is evolutionarily conserved. However, there were episodes of complete and partial intron losses followed by intron gains. We identified 17 clusters of intron positions based on their distribution. The evolution of these clusters appears to be connected with preferred exon length and can be used to estimate the relative timing of intron gain because nearby precursor introns had to be erased from the gene before the new introns could be inserted. Moreover, we identified a putative case of intron sliding that constitutes a synapomorphic character state supporting monophyly of Coleoptera, Lepidoptera, and Diptera excluding Hymenoptera. We also performed tree reconstructions using the eIF2gamma protein sequences and intron positioning as phylogenetic information. Our results support the monophyly of Viridoplantae, Ascomycota, Homobasidiomyceta, and Apicomplexa.
- [Show abstract] [Hide abstract]
ABSTRACT: The core alpha1,6-fucosyltransferase (FUT8) catalyzes the transfer of a fucosyl moiety from GDP-fucose to the innermost asparagine-linked N-acetylglucosamine residue of glycoproteins. In mammals, this glycosylation has an important function in many fundamental biological processes and although no essential role has been demonstrated yet in all animals, FUT8 amino acid (aa) sequence and FUT8 activity are very well conserved throughout the animal kingdom. We have cloned the cDNA and the complete gene encoding the FUT8 in the Sf9 (Spodoptera frugiperda) lepidopteran cell line. As in most animal genomes, fut8 is a single-copy gene organized in different exons. The open reading frame contains 12 exons, a characteristic that seems to be shared by all lepidopteran fut8 genes. We chose to study the gene structure as a way to characterize the evolutionary relationships of the fut8 genes in metazoans. Analysis of the intron-exon organization in 56 fut8 orthologs allowed us to propose a model for fut8 evolution in metazoans. The presence of a highly variable number of exons in metazoan fut8 genes suggests a complex evolutionary history with many intron gain and loss events, particularly in arthropods, but not in chordata. Moreover, despite the high conservation of lepidoptera FUT8 sequences also in vertebrates and hymenoptera, the exon-intron organization of hymenoptera fut8 genes is order-specific with no shared exons. This feature suggests that the observed intron losses and gains may be linked to evolutionary innovations, such as the appearance of new orders.PLoS ONE 01/2014; 9(10):e110422. · 3.53 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Background LFA-1/JAM-A interaction plays a significant role in early steps of leukocyte transendothelial migration (diapedesis) which takes part in atherosclerosis pathogenesis. In this population-based case–control study, the frequencies of JAM-A rs790056 and LFA-1 rs8058823 gene polymorphisms in patients with coronary heart disease (CHD) and healthy subjects were investigated and the correlations between the different genotypes and cardiovascular risk factors were analyzed. Methods The JAM-A and LFA-1 genotypes were determined in 153 patients with CHD and 124 controls by PCR–RFLP assay. Results In CHD patient group, the frequency of JAM-A rs790056 TT genotype and the frequency of T allele were higher when compared with the control group (p = 0.03 and p = 0.007,respectively). In patient groups, the frequency of LFA-1 rs8058823 AA genotype was higher (p = 0.000), and the frequency of AG genotype was lower when compared with the control group (p = 0.031). In the control group, LFA-1 rs8058823 G allele carriers had higher SBP than subjects with AA genotype (p = 0.038), whereas in the CHD patient group, G allele carriers had lower DBP than subjects with AA genotype (p = 0.007). The multivariate logistic regression analysis confirmed that the JAM-A rs790056 TT genotype (OR = 2.472, p = 0.045) and LFA-1 rs8058823 AA genotype (OR = 6.751, p = 0.000) were risk factors for CHD development. Conclusion These results suggest that the wild type genotypes and alleles of JAM-A rs790056 (TT genotype and T allele) and LFA-1 rs8058823 (AA genotype and A allele) were found to be risk factors for CHD, whereas rare genotypes and alleles were found to be higher in healthy controls thus being protective.Meta Gene. 12/2014; 2:1–10.
- [Show abstract] [Hide abstract]
ABSTRACT: Intercellular adhesion molecule-1 (ICAM-1), an important immune adhesion molecule, is related to the atherosclerosis. We explored the association between the polymorphisms of the ICAM-1 gene and coronary atherosclerotic stenosis to determine whether any risk factors correlate with genetic polymorphisms in Chinese patients with coronary atherosclerosis. Using the SNaPshot assay, we examined six SNPs of rs5491, rs281428, rs281432, rs5496, rs5498 and rs281437 in 604 patients diagnosed with coronary atherosclerotic stenosis by angiography and in 468 controls. We found that AG genotype of rs5498 had higher frequency in the coronary atherosclerotic stenosis patients (41.56% to 34.19%, P = 0.017, OR = 1.368,95%CI 1.057-1.770) and that the haplotype Ars5491Crs281428Grs281432 had higher frequency in patients (13.8% to 12.1%, P = 0.048). When analyzing the clinical risk factors for coronary atherosclerosis, we found that the rs5498 locus was associated with the levels of apolipoprotein A (APOA) (P = 0.0002) and triglycerides (TG) (P = 0.002). Furthermore, the levels of triglycerides (TG) were also associated with rs281432 (P = 0.040). Additionally, the TT genotype of rs281437 was associated with a higher level of apolipoprotein A (APOA) (P = 0.039) and apolipoprotein B (APOB) (P = 0.003). Finally, among those with coronary atherosclerosis, we found no differences in the haplotype analysis of polymorphisms of the ICAM-1 gene from individuals with hypertension or those who smoked. According to our results, the ICAM-1 polymorphisms were associated with risk of coronary atherosclerotic stenosis in Chinese individuals.PLoS ONE 01/2014; 9(10):e109658. · 3.53 Impact Factor
Phylogenetic Mapping of Intron Positions: A Case Study of Translation
Initiation Factor eIF2c
Veiko Krauss, Marek Pecyna, Katrin Kurz, and Heinz Sass
Department of Genetics, University of Leipzig, Leipzig, Germany
Eukaryotic translation initiation factor 2 (eIF2) is a G protein that delivers the methionyl initiator tRNA to the small
ribosomal subunit and releases it upon GTP hydrolysis after the recognition of the initiation codon. eIF2 is composed of
three subunits, a, b, and c. Subunit c shows the strongest conservation, and it confers both tRNA and GTP/GDP binding.
Using intron positioning and protein sequence alignment, here we show that eIF2c is a suitable phylogenetic marker for
eukaryotes. We determined or completed the sequences of 13 arthropod eIF2c genes. Analyzing the phylogenetic
distribution of 52 different intron positions in 55 distantly related eIF2c genes, we identified ancient ones and shared
derived introns in our data set. Obviously, intron positioning in eIF2c is evolutionarily conserved. However, there were
episodes of complete and partial intron losses followed by intron gains. We identified 17 clusters of intron positions
based on their distribution. The evolution of these clusters appears to be connected with preferred exon length and can be
used to estimate the relative timing of intron gain because nearby precursor introns had to be erased from the gene before
the new introns could be inserted. Moreover, we identified a putative case of intron sliding that constitutes
a synapomorphic character state supporting monophyly of Coleoptera, Lepidoptera, and Diptera excluding Hymenoptera.
We also performed tree reconstructions using the eIF2c protein sequences and intron positioning as phylogenetic
information. Our results support the monophyly of Viridoplantae, Ascomycota, Homobasidiomyceta, and Apicomplexa.
In recent decades, we have witnessed significant
progress in reconstructing phylogenies based on molecular
data (nucleotide or amino acid sequences). Good examples
are the analysis of small-subunit ribosomal RNA of 2,551
species (Van de Peer et al. 2000), as well as the analysis of
over 500 proteins of six genomes (Wolf, Rogozin, and
Koonin 2004). However, the results of these studies often
conflict with respect to phylogeny. Therefore, it is neces-
sary to improve the phylogenetic analysis using additional
molecular markers and tighter species sampling. Beyond
gene sequences, such markers include singular character
states as transposable element insertions, gene order
changes, code variants, and intron positions (reviewed in
Rokas and Holland ).
Among these singular character states, intron position
appears to be rather unreliable (Krzywinski and Besansky
2002; Wada et al. 2002). Likely cases of intron insertion
and loss have been documented (Rzhetsky et al. 1997;
Logsdon, Stoltzfus, and Doolittle 1998; Feiber, Ranga-
rajan, and Vaughn 2002; Roy, Fedorov, and Gilbert 2003;
Brady and Danforth 2004). Based on recent genome
projects, large-scale comparisons of intron positions has
been done (Fedorov, Merican, and Gilbert 2002; Rogozin
et al. 2003). Their results suggested that intron positioning
is more dynamic than previously assumed. Therefore,
comprehensive analyses of novel marker genes, focusing
on both intron position and sequence data, would be useful.
We sought to analyze the evolution of sequence and
exon-intron structure of a strongly conserved single-copy
gene in a representative sample of eukaryotic species. For
this purpose, we have chosen the c subunit of eukaryotic
translation initiation factor 2 (eIF2). By delivering the
initiator methionyl-tRNA to the small subunit of ribo-
somes, eIF2 ensures specifity of initiation codon selection
(Kapp and Lorsch 2004, Roll-Mecak et al. 2004). eIF2c is
the strongest conserved subunit of the heterotrimeric eIF2
and is found in Eukaryota and Archaea.
In a preliminary study (Krauss and Reuter 2000), we
described eIF2c gene structures of six arthropod species
and showed that the gene is fused with the functionally
unrelated Su(var)3-9 histone methyltransferase gene in
holometabolic insects. Here, we have sequenced eIF2c
genes of 13 other arthropod species and collected database
sequences from 40 additional, selected eukaryotic species
for our analysis. Examining the cladistic distribution of 52
different intron positions in 55 distantly related eIF2c
genes, we identified ancient and shared derived introns.
Our analysis has shown that intron positioning in eIF2c is
evolutionarily conserved. However, there were episodes of
complete or partial intron losses followed by intron gains.
Using a maximum-parsimony analysis based on an intron
presence/absence matrix, we showed that introns are phy-
logenetically informative. We note that in phylogenetic
mapping of intron positions, sampling of taxa has to be as
complete as possible.
Materials and Methods
Sources of Arthropods Utilized
Species trapped in the vicinity of Leipzig (Sachsen,
Germany) were Lithobius forficatus (centipede), Oniscus
asellus (woodlouse), Enallagma cyathigerum (damselfly),
Forficula auricularia (earwig). and Aphis sambuci (aphid).
Arthropods captured around Ruhla (Thu ¨ringen, Germany)
were Araneus quadratus (spider), Cercopis vulnerata
(cicada), and Scoliopterix libatrix (butterfly). Allacma
fusca (springtail) was trapped near Ilsenburg (Sachsen-
Anhalt, Germany), and Lepismachilis spp. (bristletail) was
found in the vicinity of Pfarrwerfen (Salzburg, Austria).
Additional species used from commercial stocks were
Daphnia magna (water flea), Locusta migratoria (locust),
and Bombyx mori (silk worm).
Key words: eIF2c, intron evolution, molecular phylogenetics, intron
clustering, arthropod phylogeny, intron sliding.
E-mail address: email@example.com.
Mol. Biol. Evol. 22(1):74–84. 2005
Advance Access publication September 8, 2004
Molecular Biology and Evolution vol. 22 no. 1 ? Society for Molecular Biology and Evolution 2005; all rights reserved.
by guest on June 4, 2013
Isolation of eIF2c Genes Using PCR
DNA was isolated by standard protocols. Trizol
reagent (Invitrogen) was used to isolate total RNA. cDNA
was synthesized using Hminus-M-MLV reverse transcrip-
tase (Fermentas) and a polyT primer. Degenerate primers
based on the amino acid sequences of already known eIF2c
proteins were designed to partially amplify the eIF2c gene
from genomic DNA and/or cDNA of arthropod species.
Used degenerated oligonucleotide primers were Ef120
TXGCXCAYGG-39), Ef440 (59-CCRTTXARCATXG-
NCAYGG-39), Ef440c (59-TCCATCACAGCTGCTCC-
GTTCAACATNGTNGCCAT-39), Efdeg3 (59-GARCA-
and Efdeg6 (59-TTTGTACCAACACCDATHARDCCN-
CCNGG-39). Primer positions within eIF2c are shown in
(Eppendorf) at annealing temperatures between 458C and
658C. The initial PCR product (320 bp to 900 bp) was
purified using Spin PCRapid Kit (Macherey & Nagel) and
sequenced. Species-specific primers were designed based
on the received sequence to obtain 59 ends and 39 ends of
eIF2c transcripts by 59 RACE (Rapid amplification of
cDNA ends) and 39 RACE, respectively (GeneRacerKit,
Invitrogen). Alternatively, inverted PCR products from
digested and ligated genomic DNA preparations were
purified, cloned, and sequenced. The specific sequencing
strategy used for each of the analyzed species is given in
figure 1 of Supplementary Material online. Species-specific
primer sequences are available upon request.
Sequences were determined either by direct sequenc-
ing of the PCR fragment or by sequencing of two or three
independent clones from different PCR reactions. PCR
fragments were subcloned using pGEM-T PCR cloning kit
(Promega). Transcribed regions were sequenced as RT-
PCR products (directly or as a clone). Sequencing was per-
formed on ABI 3100 equipment (ABI) using BigDye
Sequencing Chemistry (ABI). For sequence analysis,
MacVector version 7.2 (Accelrys) was used.
Sequence Sampling and Annotation
eIF2c-orthologous DNA sequences from genome
sequencing projects were sampled from databases using
Blast. In particular, we used TBlastN (Altschul et al. 1997)
based on nine already known eIF2c sequences (Krauss and
Reuter 2000) to retrieve eIF2c-like genomic sequences
from finished and unfinished genome projects deposited at
the NCBI database. Additionally, single-trace sequences
were screened at the TraceSite of NCBI (http://www.ncbi.
nih.gov/blast/tracemb.html) using discontiguous MEGA-
Blast and were assembled manually. Independently, we
screened for similar EST sequences utilizing TBlastN and
assembled these sequences if possible. The orthology of
these candidate sequences was verified by multiple
alignment and phylogenetic analysis. We excluded all
angiosperm sequences, with the exception of Arabidopsis
and Oryza eIF2c genes, from the sequence set because
usage of incomplete angiosperm EST and genome data
would complicate both gene assembling and phylogenetic
analysis by frequent gene duplications. We also excluded
all vertebrate sequences, with the exceptions of Homo and
Takifugu, because protein identity between these species is
exceedingly high (.95%), and we could not find any gene
structure differences between vertebrate eIF2c genes.
Alignment and Mapping of Introns
Amino acid sequences were aligned using MacVector
7.2 (Accelrys) and revised by eye. The divergent ends of
eIF2c proteins were deleted from the final data set. Intron
positions at the corresponding nucleotide sequences were
in similarity. All identified exon boundaries are supported
by typical splice-site sequences of U2-dependent spliceo-
somal introns. This exon-intron structure was confirmed by
cDNA sequence if available. Introns localized upstream
or downstream from the conserved eIF2c ORF were not
considered. We evaluated the location of introns with
respect to (1) Drosophila melanogaster eIF2c amino acid
residue numbering and (2) phase in ORF, which results in
bipartite naming of all identified intron positions. Intron
phase was named 0 if the intron splits two consecutive
codons; 1 if an intron locates between the first and the
second nucleotide of the codon; and 2 if an intron locates
between the second and the third nucleotide of the codon.
The programs MrBayes version 3.0 (Ronquist and
Huelsenbeck 2003), Tree-Puzzle version 5.0 (Schmidt
et al. 2001), PAUP* version 4.0b10 (Swofford 2002), and
MacVector 7.2 (Accelrys) were used for phylogenetic
analyses. Tree constructions were performed through the
Bayesian inference (BI) method by MrBayes using the JTT
substitution model, 500,000 replicates (every 100th was
saved), and a burn-in of 2,000, resulting in 3,000 trees.
The posterior probability tree from this analysis was
computed using PAUP*. For a maximum-likelihood (ML)
analysis, we used quartet-puzzling by Tree-Puzzle 5.0 with
25,000 puzzling steps and the WAG substitution model,
and we assumed rate heterogeneity with eight gamma rate
by heuristic bootstrapping (1,000 steps) using PAUP* and
the branch-swapping algorithm tree-bisection-reconnection
(TBR). Finally, a neighbor-joining (NJ) analysis was
calculated by MacVector using bootstrapping (1,000 steps)
and a Poisson-corrected distance.
We have utilized all intron positions of 37 genes in an
independent tree reconstruction. Intronless genes were
excluded from this analysis because a total erasure of
introns from the eIF2c gene has taken place several times
in parallel during evolution (see below). We built an input
matrix based on presence/absence of a given intron and
implemented a branch-and-bound search (MP) using
Phylogenetic Mapping of eIF2c Intron Positions 75
by guest on June 4, 2013
PAUP* 4.0b10 (Swofford 2002). Characters were consid-
ered as unordered.
Results and Discussion
Structure of eIF2c Genes
We included 55 eIF2c genes from 52 different organ-
isms in our structural analysis (table 1 in Supplementary
Material online). Recent genome projects have shown that,
except in angiosperms (see Materials and Methods) and
mammals, the eIF2c gene is a single-copy gene. In mice
and humans, a second gene copy is essential for male
fertility (Mazeyrat et al. 2001) and is expressed only in the
males of these species (Ehrmann et al. 1998). These
second copies were excluded from our analysis because
they show accelerated sequence evolution.
M A T A E A Q I G V N R N L Q K Q D - - - - - - - - - - - - - - - - L S N L D V S K L T P L S P E V I S R Q A T I N I G T I G H V A H G K S T V V K A I S G V Q T V R F K N E L E R
M S G D E T L Q K R V Q D E S D S E E E E I E E E E V I P T V D V D I S K L H P L S P E V I S K Q A T V N I G T I G H V A H G K S T V V K A I S G V T T V R F K N E L V R
M S - K S K P Q L R E Q D - - - - - - - - - - - - - - - - L K A L D V A K L T P L S P E V I S R Q A T I N I G T I G H V A H G K S T V V K A I S G V Q T V R F K N E L E R
M T T N A E D H L L K Q D - - - - - - - - - - - - - - - - L S T L D V A K L T S L T P E V I S R Q A T I N I G T I G H V A H G K S T V V K A L S G V H T V R F K H E K E R
M G E K R K S R Q A E V N I G M V G H V D H G K T T L T K A L T G V W T D T H S E E L R R
M H F S K I I Y L K N T Y - - - - - - - - - - - - - - I L Q I N K K F I K G I I M A K K K Q A K Q A E V N I G M V G H V D H G K T S L T K A L T G V W T D R H S E E L R R
N I T I K L G Y A N A K I Y K C D N P K C P R P A S F V S D A S S K D D S L P C T R L N C S G N F R L V R H V S F V D C P G H D I L M A T M L N G A A V M D A A L L L I A G N E S C
N I T I K L G Y A N A K I Y K C E N D A C P R P G C Y R S Y P S D K E E H P P C E R P G C G H R M K L L R H V S F V D C P G H D I L M A T M L N G A A V M D A A L L L V A G N E T C
N I T I K L G Y A N A K I Y Q C G N A A C E R P S C Y R A Y G S A K E D N P P C E L - - C G A A M E L V R H V S F V D C P G H D I L M A T M L N G A A V M D G A L L L I A A N E T C
N I T I K L G Y A N A K I Y K C T N P E H E P P S C Y K S Y G S S K I D D P L C E K P G C G H K M E L K R H V S F V D C P G H D I L M A T M L N G A A V M D A A L L L I A G N E S C
G I T I K I G F A D A E I R R C - - P N C G - - - R Y S - - - - - - - T S P V C P Y - - C G H E T E F V R R V S F I D A P G H E A L M T T M L A G A S L M D G A I L V I A A N E P C
G I S I R L G Y A D C E I R K C - - P Q C G - - - T Y T - - - - - - - T K P R C P N - - C L A E T E F L R R V S F V D S P G H E T L M A T M L S G A S L M D G A I L V I A A N E P C
P Q P Q T S E H L A A I E I M K L K Q I L I L Q N K I D L I K E S Q A K E Q Y E E I T K F V Q G T V A E G A P I I P I S A Q L K Y N I D V L C E Y I V N K I P V P P R D F N A P P R
P Q P Q T S E H L A A V E I M K L E H I I I L Q N K V D L I K E P Q A L E H Q K S I S A F V K G T V A E N S P I V P I S A Q L K Y N I D A V N E Y I V K N I P I P V R D F T S D P R
P Q P Q T S E H L A A V E I M R L K D I I I L Q N K I D L I T E P N A I S Q H D A I K K F I Q G T I A D G A P V V P I S A Q L K Y N V D V V C E Y L V K K I P V P V R D F V S P S Q
P Q P Q T S E H L A A V E I M R L K N I L I L Q N K V E L I K E S Q A L L R Q Q E I K K F I S G T A A D G A P I I P I S A V L N Y N I D V I S E Y L V T Q I A V P K R N F T V P P Q
P R P Q T R E H L M A L Q I I G Q K N I I I A Q N K I E L V D K E K A L E N Y R Q I K E F I E G T V A E N A P I I P I S A L H G A N I D V L V K A I E D F I P T P K R D P N K P P K
P Q P Q T K E H L M A L E I L G I D K I I I V Q N K I D L V D E K Q A E E N Y E Q I K E F V K G T I A E N A P I I P I S A H H E A N I D V L L K A I Q D F I P T P K R D P D A T P R
L I V I R S F D V N K P G C E V A D L K G G V A G G S I L S G V L K V G Q E I E V R P G V V T K D S D G N I T C R P I F S R I V S L F A E Q N E L Q Y A V P G G L I G V G T K I D P
L I V I R S F D V N K P G A E V D E L Q G G V A G G S I L R G V L R L G Q E V E I R P G I V T K D S Q G R N R C K P I F S R I M S L H A E K N L L Q F A V P G G L I G V G T R I D P
M I V I R S F D V N K P G S E V D E L K G G V A G G S I L Q G V L K M G Q E I E V R P G I I T K D A E G R V K C I P I F S R I V S L F A E Q N E L Q F A V P G G L I G V G L T V D P
M I I I R S F D V N K P G E E I E N L Q G G V A G G S I L Y G V L K V N D E I E V R P G I I S K D Q N G Q I T C K S I K S R V I S L F A E Q N N L Q Y A I P G G L I G V G T T M D P
M L V L R S F D V N K P G T P P E K L V G G V L G G S I V Q G K L K V G D E I E I R P G - V P Y E E H G R I K Y E P I T T E I V S L Q A G G Q F V E E A Y P G G L V G V G T K L D P
M Y V A R S F D I N K P G T E I K D L K G G V L G G A I I Q G V F K V G D E I E I R P G - I K V T E G N K T F W K P L T T K I V S L A A G N T I L R K A H P G G L I G V G T T L D P
T L C R A D R L V G Q V L G A V G Q L P D I Y Q E L E I S Y Y L L R R L L G V R T D G D K K G A R V E K L Q K N E I L L V N I G S L S T G G R I S A T K G D L A K I V L T T P V C T
T L C R A D R L V G Q V L G A V G K L P K I Y T E L E I S L F L L R R L L G V K S - E D K K T T K V T K L V K N E L L L I N I G S T S T G G R V L S V K A D L A K I Q L T S P A C T
T L T R A D R L V G Q V L G Q V G A L P D V Y S E L E V N F F L L R R L L G V R S K E G E K Q G K V T K M S N G E V L M L N I G S M C T G A R V L A V K G D L A N L Q L T S P V C T
T L T R A D R L V G Q V I G Y I N T L P D C F I E I E V T Y Y L L R R L L G I K V T D N D K N V K V S K L K K N E F L M V N I G S T S V G G R V T G I K P D M A K F E L T G P V C T
Y L T K G D L M A G N V V G K P G K L P P V W D S L R L E V H L L E R V V G T E Q - - - - - E L K V E P I K R K E V L L L N V G T A R T M G L V T G L G K D E I E V K L Q I P V C A
Y L T K S D A L T G S V V G L P G T L P P I R E K I T I R A N L L D R V V G T K E - - - - - E L K I E P L R T G E V L M L N I G T A T T A G V I T S A R G D I A D I K L K L P I C A
E K G E K I A L S R R V E N H W R L I G W G Q I F G G K T I T P V L D S Q V A K K
E V G E K V A L S R R I E K H W R L V G W G S V Q R G T V L E V D
K E G E T V A L S R R V D T T W R L I G W G R L R T G L T R A L P K P T
R I G D K V A I S R R V D K H W R L I G W G Q I N K G K S L Q L I
E P G D R V A I S R Q I G S R W R L I G Y G I I K E
E I G D R V A I S R R V G S R W R L I G Y G T I E G
FIG. 1.—Alignment of selected eIF2c proteins. Four eukaryotic main groups are represented by Drosophila melanogaster (Dme, animals),
Coprinus cinereus (Cci, fungi), Chlamydomonas reinhardtii (Cre, plants), and Theileria annulata (Tan, protists). Additionally, two structurally
determined eIF2c proteins from Archaea are shown (Pab, Pyrococcus abyssi; Mja, Methanococcus jannaschii). Corresponding secondary structures are
given according to Schmitt, Blanquet, and Mechulam (2002) and Roll-Mecak et al. (2004). The three domains of eIF2c (G domain, Domain II, and
Domain III) are marked below the alignment. Above the sequences, binding sites of the used degenerate oligonucleotide primers are shown. Intron
positions are marked and named according to their position in the Drosophila melanogaster eIF2c sequence. Intron positions that are found in two or
more species are underlined.
76Krauss et al.
by guest on June 4, 2013
eIF2c genes contain up to 11 introns in the conserved
region of the ORF (Supplementary Material table 1), which
ranges from amino acid alignment position 14 to position
477 in figure 1. Intronless as wellas intron-rich eIF2cgenes
and deuterostomates (Supplementary Material table 1). It
indicates that erasure of all introns from the gene structure
(most probably by retrotransposition) has occurred several
times independently during evolution. Nevertheless, eIF2c
introns mapped onto multiple protein alignment show
a remarkable conservation of intron locations.
An important initial assumption of our analysis is the
homology of each specific intron position. We assume that
introns might have only very rarely been gained at
homologous sites in different evolutionary lineages, as im-
plicated by the proto–splice-site theory (Dibb and Newman
1989; Sadusky, Newman, and Dibb 2004), as compared
with intron insertion at different sites. The strong conser-
vation of the eIF2c protein sequence in the eukaryotic
species (fig. 1) excludes alignment ambiguities resulting in
wrong homologization or distinction of intron positions
found in different species. Therefore, we considered only
those intron positions as homologous that were identical in
both location and phase.
Altogether, we found 52 different intron positions in
eIF2c genes, and 22 of these introns were identified in
only one of the analyzed species. The other 30 introns are
present in two or more of those species and are very likely
evolutionarily conserved. According to the intron-early
theory, these introns predated the origin of eukaryotes and
had an important role by assembling a functional protein
from short-coding DNA sequences (Gilbert 1987). Thus,
we mapped the location of exon boundaries in the tertiary
structure described for two archaeal orthologs (fig. 1)
(Schmitt, Blanquet, and Mechulam 2002; Roll-Mecak
et al. 2004). eIF2c consists of three domains: G domain,
domain II, and domain III. There is no local correlation of
conserved intron positions with domain borders (fig. 1).
We could not find any specific location of introns with
respect to secondary structure elements. Furthermore, we
noticed that the bacterial and mitochondrial protein EF-Tu,
an elongation factor of translation, shows significant
homology to all three domains of eIF2c in sequence and
structure (Schmitt, Blanquet, and Mechulam 2002). Thus,
all analyzed eIF2c genes likely evolved from an intronless
ortholog. Accordingly, intron positioning in eIF2c has
occurred independently from an eventual exon shuffling
during early evolution. Hence, it follows that the pattern of
eIF2c intron-exon boundaries should reveal suitable
markers of eukaryotic phylogeny.
Distribution of Intron Positions and Exon Lengths
Next, we examined the exon length distribution in
eIF2c genes (fig. 2A), which shows a maximum between
150 and 180 nt. This differs from an estimation of 90 to 120
nt based on a database of gene structures sampled from
several model organisms (Deutsch and Long 1999). In
agreementwiththis study,exons smaller than 60 nt arerare.
A 16-nt exon between intron positions 39-0 and 44-1 was
found in the related fungi Coprinus and Phanerochaete and
a 23-nt exon between intron positions 133-1 and 141-0
was identified in the fungus Rhizopus. The rarity of small
exonsprobablyhavesome functional reasons.Itwas shown
that exons shorter than 50 nt are poorly included in mRNA
unless accompanied by strengthened splice sites or
accessory sequences that act as splice enhancers (Hwang
and Cohen 1997; Carlo, Sierra, and Berget 2000). Thus,
small exons should have evolved more seldom than the
larger ones. In addition, such exons may occur more
frequently in fungi than in other eukaryotes, which is
consistent with the data of Deutsch and Long (1999).
To analyze the distribution of all intron positions
occurring in eIF2c genes, we plotted the distances between
them (fig. 2B). We received a distribution with two
maxima around 10 and 50 nt. Midst of both maxima, we
identified a minimum that includes any intron position
distances between 33 and 40 nt. With this observed
distribution, we could arrange the 52 intron positions into
17 clusters, each consisting of one to six introns (fig. 2D).
Inside each cluster, intron positions are separated by
maximum of 33 nt, whereas intron clusters have a minimal
distance of 40 nt. This weak clustering of intron positions
revives the question of whether only specific intron
positions are homologous to each other or whether whole
intron clusters, evolutionarily related by the hypothetical
process of intron sliding, are the units of evolution.
We examined the distribution of introns between the
clusters. If different intron positions inside one cluster
are homologous to each other, the abundance of introns
found in each cluster should be independent from the
number of different intron positions that belong to
a cluster. However, we would expect a linear correlation
of intron abundance to the number of intron positions in
each cluster if each intron position were evolved in-
dependently. We found such a correlation (fig. 2C). An
additional argument for evolutionary independence of
each intron position is the rarity of intron sliding; that is,
the movement of an existing intron to a nearby position
(Stoltzfus et al. 1997; Rogozin, Lyons-Weiler, and Koonin
2000). Therefore, we suggest that apparent clusters of
intron positions are mainly the result of intron erasure and
subsequent insertion of novel introns at positions compat-
ible with preferred exon sizes. A corresponding, relatively
functional limitations of the nonsense-mediated decay
for transcripts harboring premature termination codons
(Lynch and Richardson 2002; Lynch and Kewalramani
2003). Intron positions play a guiding role during the
recognition of premature termination codons by NMD and,
therefore, might have evolutionarily forced to a more
uniform distribution than under a model of random
It appears that a clustered distribution of introns is not
specific for the eIF2c gene. Wada et al. (2002) demon-
strated at least one similar intron cluster (four different
small shifts of their intron position 7) in deuterostome
EF-1a genes. A very similar pattern of intron distribution
was revealed in the insect chemoreceptor superfamily of
Drosophila melanogaster (Robertson, Warr, and Carlson
Phylogenetic Mapping of eIF2c Intron Positions 77
by guest on June 4, 2013
Tree Analysis Based on eIF2c Gene Structure
We identified at least partial gene structures of 51 out
of 55 analyzed eIF2c genes. Seven intronless genes and
seven incompletelyanalyzed gene structures were excluded
from the data set (table 1 in Supplementary Material
online). An MP tree reconstruction (see Materials and
Methods) was performed using the remaining 37 gene
structures, represented by a presence/absence matrix
including all intron positions (table 2 in Supplementary
Material online). The resulting unrooted tree (fig. 3) shows
a remarkable phylogenetic information content and sup-
ports, for example, the following monophyletic groups:
Apicomplexa, Nematoda, Viridiplantae, Angiospermae,
Deuterostomia, Homobasidiomycetes, and Pezizomycotina
(Ascomycota sensu stricta). Other groupings are clearly
spurious, such as the branching of the diatom Thalassiosira
with nematodes or the branching of the flatworm
Schistosoma with Coleomata. Interestingly, most repre-
sentatives of Endopterygota (metamorphosing insects) did
not group with the other arthropods. The common
branching between those other arthropods (Daphnia, Apis,
Aphis and Allacma), Schistosoma, and the represented
deuterostomes can be explained by symplesiomorphic
and 451-2), probably acquired from the last common
ancestor of all bilaterians, in their eIF2c genes. This finding
may be related to results of recent studies (reviewed in
Raible and Arendt ) that revealed human and
platworm genes seem to be closer to the bilaterian roots
than are Drosophila and Caenorhabditis genes. This thesis
50 amino acids
number of exons
60120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080
occurrences of intron distances
distances of intron positions
10 20 30 40 50 60 70 80 90 100110 120130
R2 = 0,6836
0 0,20,40,60,81 1,2
occupation rate of intron cluster
number of intron positions
FIG. 2.—Intron positions of eIF2c genes are clustered. (A) Exon length distribution. Only internal exons are represented that are coded for the
conserved portion of the ORF. (B) Nucleotide distances of all intron positions identified in at least one eIF2c gene. (C) Correlation between the number
of intron positions and the intron occupation rate of each cluster. The intron occupation rate is the relation between the sum of introns found in all
analyzed genes in each cluster and the number of analyzed genes. (D) Schematic representation of 17 intron clusters identified in eIF2c gene structures.
Each intron position is shown as vertical hatch.
78Krauss et al.
by guest on June 4, 2013
is based on comparisons of gene content and similarity
between orthologs. Our results point to a possibly slower
evolution of gene structures in deuterostomes and some
platworms as well. Additionally, arthropods appear to have
evolved differentially fast in this respect.
Examination of other gene structures will show
whether the mode of successive intron deletion and
insertion is more or less clocklike as sequence evolution.
More likely, ancient episodes of general intron losses might
have erased most or all of the phylogenetic informative
intron positions from some lineages. This might have
occurred in the Pezizomycotina, which have lost all ancient
eIF2c introns and have acquired two novel introns (see
below). In this case, we cannot expect a correct phyloge-
netic reconstruction from intron position data alone,
irrespective of the number of gene structure samples.
Intron presence/absence trees are based on phylogeny and
lineage-specific modes of intron evolution; these should
not be interpreted as typical phylogenetic trees.
eIF2c Sequence Phylogeny
Phylogenetic analysis was carried out using BI, ML,
MP, and NJ methods and an eIF2c amino acid sequence
alignment (see Materials and Methods, and see figure 2 of
Supplementary Material online). Preferably, a nucleotide
sequence alignment was avoided because of high amounts
of homoplasy, which would be expected from coding
sequences separated by long divergence times. The aIF2c
sequences of the archaeal species Pyrococcus abyssi and
Methanococcus jannaschii were used as outgroups. The BI
tree, which provides additional branching information
from the other trees, is presented (fig. 4). Interesting results
of these analyses are (1) the significant support of the
Coelomata hypothesis in contrast to the Ecdysozoa
hypothesis and (2) the sister-relationship between Daphnia
and Allacma. The last result might support a novel
phylogenetic hypothesis (Nardi et al. 2003). Accordingly,
hexapods are not monophyletic, and both Collembola (i.e.,
Allacma) and ectognathian insects evolved independently
from crustacean-like arthropods. However, the strongly
supported, but evidently untenable, relationship of Litho-
bius and Strongylocentrotus argues for a cautious inter-
pretation of the eIF2c tree. Combined analyses of eIF2c
and other sequence data will deliver more soundly based
Phylogenetic Mapping of eIF2c Intron Gains and losses
Because all compared single-copy eIF2c genes are
certainly orthologous to each other, intron insertion events
on the phylogenetic tree were traced (fig. 5). For this
purpose, a consensus tree resulting from eIF2c sequence
analysis and commonly supported phylogeny was used.
Fourteen intron locations appear to be ancestral within
eIF2c genes, as determined by their common occurrence in
at least two highly divergent lineages of animals, fungi,
plants, or protists. Six of these intron positions (18-0, 34-0,
39-0, 189-0, 372-0, and 436-0) are present in only up to
four of the analyzed eIF2c genes and in only two of those
lineages (table 2 in Supplementary Material online). Those
seldom identified, but seemingly ancient introns might
have been acquired alternatively by two relatively recent,
parallel intron insertions in two different evolutionary
lineages at ‘‘proto–splice sites.’’ Such a scenario was anti-
cipated based on the co-occurrence of cryptic and natural
splice sites in actin genes of different species (Sadusky,
Newman, and Dibb 2004). We excluded, therefore, those
introns (18-0, 34-0, 39-0 189-0, 372-0, and 436-0) from
tracing (fig. 5).
On the other hand, some of the introns that appear
taxa specific might be undetected ancient introns.
However, most, if not all, of the other 38 introns were
probably gained relatively late within one specific lineage,
which is supported by their highly unequal distribution
between the lineages (table 2 in Supplementary Material
online). They are unlikely to be ancient, because then the
original eIF2c gene would have been extraordinarily
fragmented by introns, and multiple independent losses
must have occurred in multiple different lineages. Twenty-
two of those intron positions were found only in one of all
analyzed eIF2c genes. More interestingly, 16 other intron
positions are shared by two or more lineages as putative
synapomorphic (shared derived) characters. The tree
analysis of gene-structure data (fig. 3) showed that these
intron positions are indeed of high phylogenetic value
because they most override the impact of probable ancient
FIG. 3.—MP analysis of presence/absence of introns from 37 eIF2c
genes. A consensus tree computed on 65 most-parsimonious trees
(requiring 79 changes) is shown, together with selected taxonomic groups
that are supported by this analysis. Note that this intron tree contains only
incomplete phylogenetic information (see text).
Phylogenetic Mapping of eIF2c Intron Positions 79
by guest on June 4, 2013
intron positions, which were lost or gained in parallel in
Cladistic Patterns of Specific Introns
We further examined whether specific intron posi-
tions of eIF2c might be phylogenetically informative.
Several cases of successive losses and gains of only
slightly different intron positions were documented (fig.
5), resulting in a nested distribution of the evolutionary
newer introns. Such nearby introns cannot coexist in one
gene structure for two reasons. First, exon sizes smaller
than 50 nt are seldom and functionally detrimental (see
above) (Hwang and Cohen 1997; Carlo, Sierra, and Berget
2000). Second, co-occurring processes of intron gain and
loss were suggested to be driven by balance between
additional mutational load of intron-containing alleles and
selective pressure for an efficient mechanism of NMD,
which is provided by a sufficiently tight, overdispersed
distribution of introns; that is, exon sizes are more uniform
than expected under random insertion (Lynch and
Kewalramani 2003). Therefore, such intron changes
represent reliable synapomorphic character states.
The following cases of nested intron distributions are
particularly informative. First, intron 212-1 was found in
several species of animals, fungi, and protists. The nearby
FIG. 4.—Bayesian inference (BI) tree of eIF2c proteins, with groups of interest highlighted. An cladogram as drawn by PAUP* is shown. The
posterior probability (BI) is given above each node in percent of trees showing the same topology. The quartet-puzzling value (ML) is given below each
supported node. If the same topology is also supported by MP and/or NJ analyses, the corresponding node is marked with a dot (MP) and/or
a concentric circle (NJ).
80Krauss et al.
by guest on June 4, 2013
intron position 212-0 was identified only in angiosperm
plants and may demarcate a monophyletic group of plants,
because the intron 212-1 had been very likely lost before an
intron 212-0 was evolved. Second, intron 127-2 was
detected in protists, in some animals, and in one fungi (the
9 nt away, is 130-2, which was exclusively found in all five
analyzed Pezizomycotina species. In this case, we assume
that ancient Ascomyceta eIF2c genes were intronless,
because all other analyzed ascomycet species (Saccharo-
myces, Candida, Kluyveromyces, Eremothecium, and
Schizosaccharomyces) have completely intronless eIF2c
genes, and all analyzed Pezizomycotina species contain
introns, specific only for this taxa (130-2 and 460-2).
Interestingly, the establishment of spliceosomal introns
in former intronless genes were already reported from
Pezizomycotina species (Bhattacharya et al. 2000, and
references therein). Third, nearly all analyzed eIF2c genes
of Coleomata contain the intron 159-1, with exception of
the analyzed Coleoptera, Lepidoptera, and Diptera, which
instead contain the taxa-specific intron 160-1 (Anopheles
is intronless in this region). In contrast, both the aphid
Aphis sambuci and the bee Apis mellifera have the
plesiomorphic intron 159-1 as deuterostomes and remotely
related arthropods. Therefore, we propose a nested mono-
phyletic taxa, a group including Diptera, Lepidoptera, and
gain of 130-2 and 460-2
gain of 369-1
loss of 65-0 and 372-0
change from 451-2 to 454-1
change from 44-1 to 41-0
gain of 133-1 and 141-0
gain of 206-0
gain of 109-0
gain of 356-0 and 402-0
gain of 136-1 and 456-0
gain of 110-0,247-2 and 342-0
gain of 266-0, 311-2 and 387-0
gain of 316-2 and 378-2
gain of 81-2
gain of 136-0, change from 212-1 to 212-0
gain of 285-0
loss of all introns
sliding from 159-1 to 160-1,
change from 289-0 to 295-0
Su(var)3-9 in 81-1
gain of 81-1
gain of 159-1
gain of 23-0, 233-0,
257-1 and 289-0
gain of 44-1, 65-0, 87-0, 102-2, 127-2, 212-1, 394-0,451-2
gain of 196-0, 285-1, 334-1 and 421-1
FIG. 5.—Phylogenetic mapping of eIF2c gene structure changes. Intron gains and losses as well as the Su(var)3-9 insertion in the intron 81-1 are
shown. A consensus tree resulting from eIF2c sequence analysis and commonly supported phylogeny were used. Only one intron acquirement at each
position is shown, assuming that introns with identical positions are homologous. Putative intron losses are only given for highly parsimonious cases,
often connected with a nearby insertion of a novel intron.
Phylogenetic Mapping of eIF2c Intron Positions81
by guest on June 4, 2013
Coleoptera but not Hymenoptera (fig. 5). This novel taxa is
additionally supported by the intron 295-0 of Clytus arietis
(Coleoptera) and Bombyx mori (Lepidoptera), which is
nested in distribution of the nearby intron 289-0 (found in
platworms, deuterostomes, and several arthropods, includ-
ing Apis mellifera). Our novel grouping contradicts
commonly supported insect phylogeny, which considered
the Coleoptera as outgroup to Hymenoptera1Diptera
(Wheeler et al. 2001, and references therein). However,
Diptera1Lepidoptera1Coleoptera excluding Hymenop-
tera was at least supported by Ross (1965) based on
1 and 160-1 let us to assume that intron 160-1 might have
originated by sliding of the 159-1 intron. Thus, we com-
pared the 59 and 39 splice regions of both introns (fig. 6).
We found a significant conservation of some nucleotide
positions in the 39 splice sites of both introns, which argues,
indeed,forthe possibility ofintronslidinginvolving atleast
not necessarily need to occur contemporarily because of the
implicated one-codon shift of both splice sites.
From all analyzed intron positions in eIF2c, 23-0 and
81-1 appear more robust against erasure than any others.
The persistence of the intron 23-0, which was found in
nearly all analyzed metazoan genes, may be caused by
a regulatory element in this near-promoter intron. The
pancrustacean-specific intron 81-1 is even more interest-
ing. During the evolution of eIF2c in insects, a gene copy
of Su(var)3-9 has been inserted into this intron (Krauss and
Reuter 2000). Since this event, both eIF2c and Su(var)3-9
use the same promoter and two or three common exons
and, thus, are considered as one fused gene. Synthesis of
the two structually and functionally unrelated proteins is
mediated by alternative splicing (Krauss and Reuter 2000).
We also identified a Su(var)3-9–specific exon in the 81-1
intron of Apis mellifera, Cercopis vulnerata, and Enal-
lagma cyathigerum (V. Krauss, unpublished data) (fig. 5).
Thus, the eIF2c intron 81-1 exon might have been
protected by the Su(var)3-9–specific against erasure. The
intron itself is clearly older than this gene fusion and was
almost certainly established during the early evolution of
Pancrustacea. This is supported by (1) the presence of this
intron in the woodlouse Oniscus asellus (Malacostraca)
and in the springtail Allacma fusca, as well by (2) the ab-
sence of intron 81-1 and instead the presence of the older
intron 87-0 in the eIF2c gene of the centipede Lithobius
forficatus. The absence of 81-1 in Daphnia (Branchiopoda)
may be secondary or, more interestingly, may indicate a
sister-relationship between Hexapoda and Malacostraca as
already suggested (Burmester 2001).
Intron Positioning in eIF2c Reveal Insights in
Phylogenetics and in Modes of Intron Evolution
Our results give support to the following hypothesis
about the evolutionary history of eIF2c in eukaryotes.
Intronless eIF2c genes were probably inherited by unicel-
lular eukaryotes. First introns might have been acquired
during early evolution and passed on to protists, plants,
fungi, and animals. Other introns were gained significantly
later. The value of these late introns for phylogenetics
depends critically on their evolutionary polarization
through nearby older introns, which have to be lost before
the insertion of the novel introns, except in case of intron
sliding, where intron loss and gain occurs simultaneously.
As demonstrated by the ultrashort eIF2c exons found in
fungal species (16 or 23 nt long, respectively), the detection
of phylogenetically nested, synapomorphic introns can be
complicated by unusual splicing features. Therefore, the
reliability of this intron age classification depends critically
(1) on the strong conservation of gene sequence, which
makes the secure differentiation from nearby intron posi-
tions possible, and (2) on the sampling of gene structures,
which has to be as tight as possible. The use of intron
positioning for maximum-parsimony tree reconstruction
analysis of remote related eukaryotes was relatively
successful, which argues for a high phylogenetic informa-
tion content of intron positions. This analysis is sub-
stantially backed by tree reconstruction based on protein
sequences. For instance, both amino acid sequences and
intron positions argue against an Ecdysozoa group
containing both arthropods and nematodes. The strongly
persistent intron 81-1 adds evidence to the Pancrustacea
hypothesis. Two parallel, nested intron distributions deliv-
ered evidence for a novel monophyletic taxa, a Diptera1
Lepidoptera1Coleoptera clade excluding Hymenoptera
conserved gene structures will continue to improve the
knowledge of both intron evolution and higher-level
The newly reported sequences of eIF2c from Araneus
quadratus, Lithobius forficatus, Daphnia magna, Oniscus
asellus, Allacma fusca, Lepismachilis spp., Enallagma
cyathigerum, Forficula auricularia, Locusta migratoria,
Aphis sambuci, Cercopis vulnerata, Scoliopterix libatrix,
and Bombyx mori are deposited in the DDBJ/EMBL/
GenBank database (accession numbers AJ290958 and
AJ715857 to AJ715871). Supplementary tables and fig-
ures are available online at the MBE Web site.
159-1 YTD YTB ATW G | gtrvdwbhnh......dnwnndhhhhhhag | CK GGN AAY GAR YCN
L L I A G N E S/P
160-1 YTN CTH ATY GCD G | gtrwdtn......dyyhnyybbdyktrmag | GY AAY GAR TCH
L L I A G N E S
FIG. 6.—Comparison of intron 159-1 and 160-1 splice regions. Absolute consensus sequences are shown for seven 159-1 splice sites (different
Coleomata including Hymenoptera) and six 160-1 splice sites (Diptera, Coleoptera, and Lepidoptera), respectively. Intron sequences are shown in
lowercase, and coded amino acid sequence is given beneath the exon sequences. Conserved nucleotides between both consensus sequences are
underlined, whereas differences are shown in bold. Note that only three nucleotide exchanges are necessary to shift the 159-1 intron position 3 nt
downstream, because the alanine-coding and glycine-coding nucleotide sequences downstream from the 159-1 39 splice site would support the shift.
82Krauss et al.
by guest on June 4, 2013
We would like to thank J. Hebler, C. Gra ¨bsch,
R. Kirschner, A. Anton, G. Mu ¨ller, C. Wierzchacz, I.
Patties, and A. Howe for help in sequencing. We gratefully
acknowledge the sequencing of the yet unpublished
genomes of Theileria annulata, Toxoplasma gondii,
Thalassiosira pseudonana, Chlamydomonas reinhardtii,
Dictyostelium discoideum, Phytophthora sojae, Magna-
porthe grisea, Aspergillus fumigatus, Coccidioides immi-
tis, Gibberella zeae, Ustilago maydis, Cryptococcus
neoformans, Coprinus cinereus, Rhizopus oryzae, Schis-
tosoma mansoni, Schmidtea mediterranea, Brugia malayi,
Strongylocentrotus purpuratus, Apis mellifera, Drosophila
virilis, and Drosophila pseudoobscura. We would like to
thank S. Phalke for critical reading of the manuscript. This
work was supported by a grant from the Deutsche
Forschungsgemeinschaft to V.K. and H.S.
Altschul, S. F., T. L. Madden, A. A. Scha ¨ffer, J. Zhang,
Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped
BLAST and PSI-BLAST: a new generation of protein database
search programs. Nucleic Acids Res. 25:3389–3402.
Bhattacharya, D., F. Lutzoni, V. Reeb, D. Simon, J. Nason, and
F. Fernandez. 2000. Widespread occurrence of spliceosomal
introns in the rDNA genes of ascomycetes. Mol. Biol. Evol.
Brady, S. G., and B. N. Danforth. 2004. Recent intron gain in
elongation factor-1alpha of colletid bees (Hymenoptera:
Colletidae). Mol. Biol. Evol. 21:691–696.
Burmester, T. 2001. Molecular evolution of the arthropod
hemocyanin superfamily. Mol. Biol. Evol. 18:184–195.
Carlo, T., R. Sierra, and S. M. Berget. 2000. A 59 splice site-
proximal enhancer binds SF1 and activates exon bridging of
a microexon. Mol. Cell Biol. 20:3988–3995.
Deutsch, M., and M. Long. 1999. Intron-exon structures of
Dibb, N. J., and A. J. Newman. 1989. Evidence that introns arose
at proto-splice sites. EMBO J. 8:2015–2021.
Ehrmann, I. E., P. S. Ellis, S. Mazeyrat et al. (12 co-authors).
1998. Characterization of genes encoding translation initiation
factor eIF-2gamma in mouse and human: sex chromosome
localization, escape from X-inactivation and evolution. Hum.
Mol. Genet. 7:1725–1737.
Fedorov, A., A. F. Merican, and W. Gilbert. 2002. Large-scale
comparison of intron positions among animal, plant, and
fungal genes. Proc. Natl. Acad. Sci. USA 99:16128–16133.
Feiber, A. L., J. Rangarajan, and J. C. Vaughn. 2002. The
evolution of single-copy Drosophila nuclear 4f-rnp genes:
spliceosomal intron losses create polymorphic alleles. J. Mol.
Gilbert W. 1987. The exon theory of genes. Cold Spring Harb.
Symp. Quant. Biol. 52:901–905.
Hwang, D. Y., and J. B. Cohen. 1997. U1 small nuclear RNA-
promoted exon selection requires a minimal distance between
the position of U1 binding and the 39 splice site across the
exon. Mol. Cell Biol. 17:7099–7107.
Kapp, L.D., and J. R. Lorsch. 2004. GTP-dependent recognition
of the methionine moiety on initiator tRNA by translation
factor eIF2. J. Mol. Biol. 335:923–936.
Krauss, V., and G. Reuter. 2000. Two genes become one: the
genes encoding heterochromatin protein Su(var)3-9 and
translation initiation factor subunit eIF-2gamma are joined
to a dicistronic unit in holometabolic insects. Genetics 156:
Krzywinski, J., and N. J. Besansky. 2002. Frequent intron loss in
the white gene: a cautionary tale for phylogeneticists. Mol.
Biol. Evol. 19:362–366.
Logsdon, J. M., A. Stoltzfus, and W. F. Doolittle. 1998. Mo-
lecular evolution: recent cases of spliceosomal intron gain?
Curr. Biol. 8:R560–R563.
Lynch, M., and A. Kewalramani. 2003. Messenger RNA sur-
veillance and the evolutionary proliferation of introns. Mol.
Biol. Evol. 20:563–571.
Lynch, M., and A. O. Richardson. 2002. The evolution
of spliceosomal introns. Curr. Opin. Genet. Dev. 12:
Mazeyrat, S., N. Saut, V. Grigoriev, S. K. Mahadevaiah, O. A.
Ojarikre, A. Rattigan, C. Bishop, E. M. Eicher, M. J. Mitchell,
and P. S. Burgoyne. 2001. A Y-encoded subunit of the
translation initiation factor Eif2 is essential for mouse
spermatogenesis. Nat. Genet. 29:49–53.
Nardi, F., G. Spinsanti, J. L. Boore, A. Carapelli, R. Dallai, and
F. Frati. 2003. Hexapod origins: monophyletic or para-
phyletic? Science 299:1887–1889.
are more equal than others. Curr. Biol. 14:R106–R108.
Robertson, H. M., C. G. Warr, and J. R. Carlson. 2003.
Molecular evolution of the insect chemoreceptor gene
superfamily in Drosophila melanogaster. Proc. Natl. Acad.
Sci. USA 100 (suppl 2):14537–14542.
Rogozin, I. B., J. Lyons-Weiler, and E. V. Koonin. 2000. Intron
sliding in conserved gene families. Trends Genet. 16:430–432.
Rogozin, I. B., Y. I. Wolf, A. V. Sorokin, B. G. Mirkin, and E. V.
Koonin. 2003. Remarkable interkingdom conservation of
intron positions and massive, lineage-specific intron loss and
gain in eukaryotic evolution. Curr. Biol. 13:1512–1517.
Rokas, A., and P. W. H. Holland. 2000. Rare genomic changes as
a tool for phylogenetics. Trends Ecol. Evol. 15:454–459.
Roll-Mecak, A., P. Alone, C. Cao, T. E. Dever, and S. K. Burley.
2004. X-ray structure of translation initiation factor eIF2-
gamma: implications for tRNA and eIF2alpha binding.
J. Biol. Chem. 279:10634–10642.
Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian
phylogenetic inference under mixed models. Bioinformatics
Ross, H. H. 1965. A textbook of entomology. 3rd edition. Wiley,
Roy, S. W., A. Fedorov, and W. Gilbert. 2003. Large-scale com-
parison of intron positions in mammalian genes shows intron
loss but no gain. Proc. Natl. Acad. Sci. USA 100:7158–7162.
Rzhetsky, A., F. J. Ayala, L. C. Hsu, C. Chang, and A. Yoshida.
1997. Exon/intron structure of aldehyde dehydrogenase genes
supports the ‘‘introns-late’’ theory. Proc. Natl. Acad. Sci. USA
Sadusky, T., A. J. Newman, and N. J. Dibb. 2004. Exon junction
sequences as cryptic splice sites: implications for intron
origin. Curr. Biol. 14:505–509.
Schmitt, E., S. Blanquet, and Y. Mechulam. 2002. The large sub-
unit of initiation factor aIF2 is a close structural homologue
of elongation factors. EMBO J. 21:1821–1832.
Schmidt, H. A., K. Strimmer, M. Vingron, and A. V. Haeseler.
2001. TREE-PUZZLE 5.0. Maximum likelihood analysis for
nucleotide, amino acid, and two-state data. http://www.
Stoltzfus, A., J. M. Logsdon, J. D. Palmer, and W. F. Doolittle.
1997. Intron ‘‘sliding’’ and the diversity of intron positions.
Proc. Natl. Acad. Sci. USA 94:10739–10744.
Phylogenetic Mapping of eIF2c Intron Positions 83
by guest on June 4, 2013
Swofford, D. L. 2002. PAUP*: phylogenetic analysis using
parsimony (*and other methods). Version 4.0b10. Sinauer
Associates, Sunderland, Mass.
Van de Peer, Y., S. L. Baldauf, W. F. Doolittle, and A. Meyer.
2000. An updated and comprehensive rRNA phylogeny of
(crown) eukaryotes based on rate-calibrated evolutionary
distances. J. Mol. Evol. 51:565–576.
Wada, H., M. Kobayashi, R. Sato, N. Satoh, H. Miyasaka, and
Y. Shirayama. 2002. Dynamic insertion-deletion of introns in
deuterostome EF-1alpha genes. J. Mol. Evol. 54:118–128.
Wheeler, W. C., M. Whiting, Q. D. Wheeler, and J. M.
Carpenter. 2001. The phylogeny of the extant hexapod orders.
Wolf, Y. I., I. B. Rogozin, and E. V. Koonin. 2004. Coelomata
and not Ecdysozoa: evidence from genome-wide phylogenetic
analysis. Genome Res. 14:29–36.
Mark Ragan, Associate Editor
Accepted August 31, 2004
84Krauss et al.
by guest on June 4, 2013