C4 Photosynthesis Evolved in Grasses via Parallel Adaptive Genetic Changes
Phenotypic convergence is a widespread and well-recognized evolutionary phenomenon. However, the responsible molecular mechanisms remain often unknown mainly because the genes involved are not identified. A well-known example of physiological convergence is the C4 photosynthetic pathway, which evolved independently more than 45 times . Here, we address the question of the molecular bases of the C4 convergent phenotypes in grasses (Poaceae) by reconstructing the evolutionary history of genes encoding a C4 key enzyme, the phosphoenolpyruvate carboxylase (PEPC). PEPC genes belong to a multigene family encoding distinct isoforms of which only one is involved in C4 photosynthesis . By using phylogenetic analyses, we showed that grass C4 PEPCs appeared at least eight times independently from the same non-C4 PEPC. Twenty-one amino acids evolved under positive selection and converged to similar or identical amino acids in most of the grass C4 PEPC lineages. This is the first record of such a high level of molecular convergent evolution, illustrating the repeatability of evolution. These amino acids were responsible for a strong phylogenetic bias grouping all C4 PEPCs together. The C4-specific amino acids detected must be essential for C4 PEPC enzymatic characteristics, and their identification opens new avenues for the engineering of the C4 pathway in crops.
Current Biology 17, 1241–1247, July 17, 2007 ª2007 Elsevier Ltd All rights reserved DOI 10.1016/j.cub.2007.06.036
Photosynthesis Evolved in Grasses
via Parallel Adaptive Genetic Changes
Melvin R. Duvall,
and Guillaume Besnard
Department of Ecology and Evolution
University of Lausanne
Royal Botanic Gardens, Kew
TW9 3DS Surrey
Department of Biological Sciences
Northern Illinois University
DeKalb, Illinois 60115
Phenotypic convergence is a widespread and well-
recognized evolutionary phenomenon. However, the
responsible molecular mechanisms remain often un-
known mainly because the genes involved are not
identiﬁed. A well-known example of physiological con-
vergence is the C
photosynthetic pathway, which
evolved independently more than 45 times . Here,
we address the question of the molecular bases of the
convergent phenotypes in grasses (Poaceae) by
reconstructing the evolutionary history of genes en-
coding a C
key enzyme, the phosphoenolpyruvate
carboxylase (PEPC). PEPC genes belong to a multi-
gene family encoding distinct isoforms of which only
one is involved in C
photosynthesis . By using phy-
logenetic analyses, we showed that grass C
appeared at least eight times independently from the
PEPC. Twenty-one amino acids evolved
under positive selection and converged to similar or
identical amino acids in most of the grass C
lineages. This is the ﬁrst record of such a high level
of molecular convergent evolution, illustrating the
repeatability of evolution. These amino acids were re-
sponsible for a strong phylogenetic bias grouping
PEPCs together. The C
-speciﬁc amino acids
detected must be essential for C
characteristics, and their identiﬁcation opens new av-
enues for the engineering of the C
pathway in crops.
Results and Discussion
Congruence between Gene and Species Trees
Recovered Only with Nearly Neutral Sites
We constituted a data set of 169 PEPC encoding genes,
of which 127 were sequenced in this study. In the phylo-
genetic tree inferred on PEPC coding sequences
(Figure 1), all grass genes encoding C
except that of Centropodia, cluster together whereas its
genes (referred to as ppc-B2) form a par-
aphyletic group. The same pattern occurred whether
based on amino acid or nucleotide sequences and re-
gardless of the phylogenetic method used. The species
relationships deduced from ppc-C
as well as from ppc-
B2 were highly incongruent with the species tree in-
ferred by other markers [3–5]. The obtained gene tree
can be explained only by postulating a very high number
of gene duplications and losses or horizontal gene
transfers, making this topology very unlikely. Because
trait evolved several times independently in the
grass family [1, 3], a single origin of the C
A potential source of bias, which could be responsible
for the ppc-C
grouping found in our analyses, is the
evolutionary forces driving C
PEPC evolution. Because
of the crucial role played by this PEPC isoform in the C
photosynthetic pathway , it could have been the
target of strong selective pressures that would have
drastically altered the amino acid sequences. If a high
enough number of identical amino acids appeared inde-
pendently in the different C
lineages, the codon posi-
tions that determine the transition between non-C
-characteristic amino acids would tend to group the
together and thus be misleading in a phyloge-
netic context. This hypothesis would then predict that
the trees constructed with sites less affected by selec-
tion, for example the third positions of the codons or
the intron sequences, would have a different topology
reﬂecting the species relationships. This prediction
was veriﬁed with grass PEPC: species relationships de-
duced from third positions and intron topology (Figure 2)
were congruent with accepted species trees [3–5].In
this tree, the different ppc-C
lineages grouped into sup-
ported clusters with ppc-B2 of related species (Figure 2).
The species tree is thus recovered when nearly neutral
sites are used. This pattern could not be imputed to co-
don usage bias between the different lineages because
codon frequencies were approximately constant across
the phylogenetic tree (data not shown).
The bias observed in the topology obtained with all
positions suggests that a proportion of amino acids
essential for the C
function converged between the dif-
lineages, as conﬁrmed by the positive se-
lection analyses (see below). When the 21 codons under
positive selection were removed from the phylogenetic
analyses, a topology congruent with the species tree
was obtained on the 421 remaining codons (Supplemen-
tal Data available online). This result conﬁrmed that the
clustering of ppc-C
was in a large part due to these
few codons under positive selection.
A phylogenetic bias resulting from molecular conver-
gence was already proposed for other genes in other
organisms [6, 7]. However, these studies were not able
to recover the species tree from the coding sequence.
Thus, the phylogenetic bias they observed could be due
*Correspondence: firstname.lastname@example.org (P.-A.C.),
to complex gene evolutionary history (e.g., exon or gene
transfer). Our study is the ﬁrst to clearly show that phy-
logenetic reconstruction methods can be misleading
because of a small proportion of convergent codons. It
also highlights the importance of understanding how
different parts of the data inﬂuence phylogenetic recon-
struction. If using all codon positions increases the num-
ber of characters and thus the accuracy of the tree con-
structions, third positions and introns can, under certain
conditions, better represent the true evolutionary history
of the genes .
PEPC Evolved through Parallel Changes
It was shown before that an alanine amino acid con-
served in all the known non-C
PEPCs changed to a ser-
ine in all C
PEPCs (position 780 in Zea mays, CAA33317)
, representing a strong example of parallel changes.
Other modiﬁcations of the coding sequence are ex-
pected because C
PEPCs present catalytic properties
and sensitivities toward repressors different from non-
isoforms [10, 11]. In order to identify sites that under-
went adaptative changes during C
evolution in grasses,
we performed positive selection tests that use a u (dN/
dS ratio) greater than 1 as evidence of past positive se-
lection [12, 13]. Different codon models were optimized
on third positions and intron topology and compared
with likelihood ratio tests. The model allowing a propor-
tion of codons to evolve under positive selection in
branches deﬁned a priori as the foreground branches
(in this case, branches leading to ppc-C
, identiﬁed by
an alanine-to-serine transition at position 780) was sig-
niﬁcantly better than the null models (A versus M1a,
df = 2, p value < 0.0001; A versus A
, df = 1, p value =
0.066; see Experimental Procedures for further details
on the models). This result shows that ppc-C
under adaptive molecular evolution. To take into ac-
count the uncertainty in topology, the three codon
models were run again with 11 alternative topologies
sampled during the Bayesian search. 21 out of the 442
codons considered (4.8%) were identiﬁed as having
evolved under positive selection in branches leading to
with a posterior probability greater than 0.95 in
all analyses (Figure 3). By changing the nominal value
of the test to 0.99 and 0.999, the number of codons
under positive selection is reduced to 15 (3.4%) and 12
(2.7%), respectively (Figure 3).
These results show that the same positions evolved
under positive selection in the different grass ppc-C
eages. Many of these sites were mutated recurrently to
an identical amino acid (Figure 3). In addition, some of
the amino acids under positive selection identiﬁed in
grasses underwent the same transitions between non-
PEPC in other C
systems, even in very distant
families such as Asteraceae or Amaranthaceae (Fig-
ure 3). In addition to the alanine-to-serine transition at
position 780, amino acids at positions 517, 577, 665,
Figure 1. Maximum Likelihood Tree Containing All Grass and Main Monocot and Dicot Genes Encoding PEPC
This tree was constructed on nucleotide coding sequence with PhyML under a GTR+I+G model. Genes belonging to the different grass gene
lineages and the main dicot clades are compressed. Uncompressed tree is available in Supplemental Data. Support values of 100 bootstrap
replicates are indicated above branches when greater than 50%. The position of Centropodia forskalii gene with a serine at position 780 is
indicated by an asterisk. The logarithm of the likelihood for this tree was 256730.88.
Figure 2. Bayesian Tree Constructed with MrBayes on Third Positions and Introns Combined, Including ppc-B2 and ppc-C
Branches leading to ppc-C
, determined by the presence of a serine at position 780, are in bold. Capital letters identify branches used in the
positive selection tests. Bayesian support values greater than 0.5 are indicated for the principal branches. Support values for all branches
are available in Supplemental Data. Subfamilies are indicated on the left of the tree. The three main Panicoideae tribes, Andropogoneae, Pan-
iceae, and Centotheceae, are indicated. Aristi, Aristidoideae; Mi, Micrairoideae; Ar+Da, Arundinoideae + Danthonioideae; Cent, Centotheceae.
In some Chloridoideae, both ppc-B2 and ppc-C
are present, suggesting an ancestral gene-duplication event. On the right, the most frequent
amino acid of each clade is shown for the 12 sites (positions indicated correspond to Zea mays PEPC; CAA33317) with a posterior probability
greater than 0.999 of having evolved under positive selection in branches leading to ppc-C
(Figure 3). Black triangles on the right indicate ppc-C
lineages. Residues with similar biochemical properties are identically colored. For visual clarity, C
-speciﬁc amino acids are brightened.
Photosynthesis Molecular Evolution
and 761 recurrently changed from the same C
to identical C
-speciﬁc amino acid in grasses and other
families (Figure 3). The evolution of a C
was performed through many parallel changes in a high
number of independent C
lineages, highlighting the
repeatability of some evolutionary processes.
Phenotypic convergence between distant lineages is
a widespread feature and concerns morphological as
well as physiological traits. The recurrent appearance of
the same phenotype through convergent molecular evo-
lution has already been demonstrated [14–20]. Some
studies traced the convergence to different modiﬁca-
tions of the same gene [15–17, 19] or to the same muta-
tions taking place independently in different lineages
[14, 18, 20]. However, these cases concerned only a
small proportion of sites in a restricted number of line-
ages. Our study reports the ﬁrst case of such a high level
of molecular convergent evolution in up to eight distinct
lineages. The observed amino acid transitions between
PEPC enzymes are all due to a single nu-
cleotide change. This increases the probability of these
mutations occurring by chance. The mutations that im-
prove the encoded enzyme can later be ﬁxed by natural
selection. The presence of a non-C
PEPC gene (i.e.,
ppc-B2) with a nucleotide sequence allowing the acqui-
sition of C
-advantageous amino acids through simple
single nucleotide changes likely favored recurrent evo-
lution of the C
pathway by allowing a rapid and efﬁcient
acquisition of a C
-speciﬁc PEPC, the key enzyme of this
The sites under selection show different degrees of
parallelisms. For instance, residues at positions 531,
579, 761, 780, and 807 mutated to an identical amino acid
in six to eight grass C
PEPC lineages (parallel changes
sensu stricto; Figures 2 and 3). In contrast, residues at
positions 502, 596, and 625 changed to a different amino
acid (Figures 2 and 3), suggesting that the C
istics are conferred by the absence of the non-C
acid at these positions rather than the presence of a C
speciﬁc amino acid. Although the latter does not match
the strict deﬁnition of parallel change, it corresponds to
parallel genotypic adaptation  because the same lo-
cus (i.e., ppc-B2) evolved independently through similar
changes to fulﬁl the same function (atmospheric CO
ation in mesophyll cells). Unfortunately, the effects of
these different changes are difﬁcult to predict because
the described active sites and regulation targets of the
PEPC [22, 23] are not affected. The alanine-to-serine
transition (position 780, Figure 3) has been shown to
alter the catalytic properties of the encoded enzyme
[9, 11]. The histidine-to-asparagine transition that oc-
curred at position 665 in C
grasses as well as in several
dicots (Figure 3) could have an important effect on
protein folding because it creates a putative N-glycosyl-
ation site (positions 665–668 ) that is absent from
PEPCs. Serine at position 761 is part of
Figure 3. Amino Acids Detected as Evolving under Positive Selection in Branches Leading to Genes Encoding C
PEPC in Grasses
These sites were detected with a posterior probability (PP) greater than 0.999, 0.99, or 0.95. The amino acids are shown for the different C
PEPC gene lineages (capital letters identify independent grass ppc-C
lineages as identiﬁed on Figure 2). When one lineage exhibited
different amino acids, the most abundant is written ﬁrst. For grasses, the number of sequences included in each lineage is indicated (n), as is
the photosynthetic type for nongrasses. Amino acids that differ between ppc-B2 and ppc-C
are highlighted in blue. Amino acids that underwent
the same changes in non-grass C
PEPCs are in green.
a predicted casein kinase II phosphorylation site (posi-
tions 761–763 ) that disappears once this serine is
mutated to an alanine, which is the case in C
Breaking this phosphorylation site could have helped
the acquisition of the C
-speciﬁc regulation pattern of
the PEPC. This amino acid is also part of a putative N-
myristylation site (positions 757–762 ), which works
with either an alanine or a serine. Thus, the only single-
nucleotide change that is able to break the phosphoryla-
tion site without altering the myristylation site was pre-
cisely a serine-to-alanine substitution (serine in non-C
PEPC is encoded by a UCN codon). The effect of the
other mutations is still unpredictable. The use of the
3D structure predictions could help evaluate whether
-speciﬁc amino acids can putatively alter the
enzyme structure and thus its catalytic properties .
Implications for Bioengineering
21 amino acids were detected, with high probability, to
have undergone positive selection along the branches
leading to grass ppc-C
and purifying or neutral selection
in other branches. These changes are thus likely to be
important for the C
function of the encoded enzyme.
Their recurrent evolution in different lineages strongly
supports their high adaptive signiﬁcance, a fact that is
reinforced by the similar or identical changes occurring
at the same residues in very distant plant families (Fig-
ure 3). Knowledge of these C
opens promising opportunities for the molecular engi-
neering of grass C
crops, such as rice, barley, and
wheat. This is especially relevant for the biotechnologi-
cal efforts to incorporate some C
characteristics in C
crops [25–27]. Identiﬁcation of the major C
nants has been performed in Flaveria through expres-
sion of chimerical enzymes and analysis of their catalytic
properties . This approach allowed the detection of
the alanine-to-serine transition (position 780 in maize).
However, such a procedure can identify only changes
having a detectable effect on the phenotype. The
changes evidenced in our study have certainly minor in-
dependent effects, but, taken together, would help the
optimization of C
function. Testing these many residues
is not feasible in an experimental framework. The use of
phylogenetic inference to detect potential residues
important for the function of an enzyme is thus a feasible
alternative and powerful approach that should be ex-
tended to other important enzymes.
Ampliﬁcation of PEPC Genes
Samples from 111 grass species were taken, focusing on the PAC-
CAD clade that contains all C
grass species [1, 4]. Panicoideae,
which contains several putatively independent C
lineages, was es-
pecially densely sampled. DNAs (listed in Supplemental Data) were
obtained either from aliquots provided by other teams or extracted
from leaves dried in silica gel via the CTAB method. The photosyn-
thetic type was attributed to each species according to the literature.
Genes encoding PEPC were obtained from genomic DNA via poly-
merase chain reaction (PCR). The primers were designed to amplify
a segment of ppc-C
genes as well as ppc-B1 gene previously de-
tected in Oryza sativa . Because of the length of the complete
gene (more than 6000 bp in Zea mays, X15239), we focused on a
segment from exon 8 (PEPC-1362-For: 5
) to exon 10 (PEPC-2701-Rev: 5
) that carries major C
determinants . The PCR
reaction mixture contained w100 ng of genomic DNA template,
5 ml of 10X AccuPrime PCR Buffer II, 200 pmol of each dNTP, 20
pmol of each primer, 3 mmol of MgSO
, 2.5 ml (5% vol) of DMSO,
and 1 unit of a proof-reading Taq polymerase (AccuPrime Taq
DNA Polymerase High Fidelity, Invitrogen) in a total volume of 50
ml. The samples were incubated for 2 min at 94
C, followed by 35 cy-
cles consisting of 30 s at 94
C, 30 s at 57
C (annealing temperature),
and 2 min at 68
C. The last cycle was followed by a 20 min extension
C. Total PCR products were puriﬁed with QIAquick Gel Extrac-
tion Kit (QIAGEN). To separate the different genes (or alleles) puta-
tively ampliﬁed, puriﬁed PCR products were cloned into the
pTZ57R/T vector with InsT/Aclone PCR Product Cloning Kit (Fer-
mentas) and PCR ampliﬁed with the M13 primers. Between 8 and
20 positive clones were then digested with TaqI restriction enzyme
(Invitrogen). The degree of polymorphism for TaqI digestion prod-
ucts was high, allowing an unambiguous distinction of the different
ppc gene lineages. For each species, inserts of each clone present-
ing a different restriction pattern were sequenced with the M13
primers with the Big Dye 3.1 Terminator cycle sequencing kit (Ap-
plied Biosystems), according to the provider instructions, and sep-
arated on an ABI Prism 3100 genetic analyzer (Applied Biosystems).
A segment of about 1500 bp, including w40% of the total coding se-
quence and two introns, was sequenced. All sequences have been
deposited in the EMBL database (accession numbers in Supple-
DNA Sequence Analyses
For PEPC-gene segments isolated from genomic DNA, exons were
identiﬁed by homology with Zea mays, Sorghum bicolor, and Oryza
sativa genes (X15239, X63756, and AK101274, respectively) and ac-
cording to, when possible, the GT-AG rule. Coding sequences were
then translated into amino acids and aligned with ClustalW .
Once retranslated into nucleotides, alignment was checked visually.
19 grass and 23 nongrass ppc genes available on GenBank were
added to the data set (Supplemental Data). A phylogenetic tree was
inferred both by maximum likelihood via PhyML , DNAML ,
and PAUP*  (NNI branch swapping on 151 trees found during
a ﬁrst round of tree selection with 1000 random addition sequences
with TBR branch swapping under the Parsimony criterion; this was
needed to reduce computational time) and by Bayesian inference
via MrBayes  (two runs of 10,000,000 generations with four
chains, burn-in period of 2,000,000) under a GTR model with base
frequencies gamma shape parameter and proportion of invariants
estimated from the data (hereafter referred as GTR+I+G). PhyML
 and ProML  were further used to compute a phylogenetic
tree based on the amino acids sequence under a JTT substitution
model with a gamma shape parameter. For DNAML and ProML,
gamma shape parameter was ﬁxed to the value estimated by
PhyML. These analyses allowed the identiﬁcation of the number of
gene lineages present in grasses and their relationships to each
Further analyses included only ppc-C
lineage and its closest
ancestor (hereafter named ppc-B2, Figure 1). More distantly
related sequences were omitted to avoid saturation of fast-evolving
nucleotides such as introns and third positions. To distinguish the
phylogenetic information provided by the different parts of the
sequences, two data sets were created. First, all coding positions
were considered for a total of 1326 bp. Second, the third codon po-
sitions (442 bp) were combined with introns 8 and 9. The introns were
extracted and aligned with ClustalW with gap opening and gap ex-
tension penalties set to 15 and 6.6 for the pairwise and multiple
alignments. To avoid subjectivity, intron alignments were checked
visually but not manually edited. Because of their fast evolutionary
rate, introns are useless to resolve basal nodes but give a strong
signal to infer the top nodes. Their use in combination with unequiv-
ocally aligned third positions appeared as the best way to obtain a
supported tree only weakly affected by selective pressures. The
substitution model used for the introns was the HKY model. All cod-
ing positions and third positions of codons were analyzed under
a GTR+I+G model. Best-ﬁt substitution models were determined
with hierarchical likelihood ratio tests (LRT). Both data sets were
analyzed by Bayesian inference with MrBayes 3.1 . Each analysis
was run twice for 10,000,000 generations. Sample frequency was set
to 1000 generations. Prior distributions were left to their default
Photosynthesis Molecular Evolution
values. The number of chains in each run was increased from four for
all coding positions analysis to six for introns and third position anal-
ysis because of convergence problems. Base frequency, which was
the only parameter common to the substitution models of these two
data sets, was optimized separately for each partition (option unlink
To test for the action of positive selection at particular sites of the
nucleotide sequence of the ppc genes along branches leading to
, three different codon models [12, 13] were optimized on
the topology obtained by combining third codon positions and
introns via codeml . The neutral model M1a allows u (the dN/
dS ratio) to vary among codons. This parameter is constant among
the branches of the tree and its value is allowed to be either 1 (neu-
tral) or smaller than 1 (purifying selection). The alternative model,
model A, allows u to vary among both sites and branches. It requires
the speciﬁcation of two branch types, the background branches in
which selective pressures are either neutral or purifying and the fore-
ground branches under positive selection (u > 1). The last model A
allows u to vary among sites and branches but not to be greater than
1. It is therefore identical to model A except that the u value in fore-
ground branches is ﬁxed to 1 for sites that differ between foreground
and background branches. Models were compared with LRT. Test 1
compares model M1a and A and thus tests for the occurrence of
different selective pressures on the foreground branches . Test
2, which compares models A
and A, is more conservative and
speciﬁcally tests the signiﬁcance of a u value greater than 1 on
the foreground branches .
and A require an a priori identiﬁcation of the foreground
branches. All the branches leading to full C
PEPC groups (identiﬁed
by a serine at position 780, see Figure 2) were used simultaneously
as foreground branches. To ensure that the results were not due to
a bias in the tree used, the same procedure was repeated with topol-
ogies sampled during the Bayesian search. Trees were taken each
500,000 generations between 5,000,000 and 10,000,000 for a total
of 11 additional topologies. By the Bayes Empirical Bayes approach
, only codons with posterior probability of being under positive
selection greater than a given threshold (i.e., 0.95, 0.99, or 0.999)
in all 12 analyses were considered as having evolved under positive
selection during C
The most likely ancestral residue at position 780 was determined
with codeml under a F3x4 model of codon substitution. The de-
duced amino acid was used to trace the ppc-C
on the phylogenetic trees.
Subsequent to these analyses, sequences from nongrasses ppc-
and their related non-C
PEPC gene available in GenBank were
aligned to the grass DNA sequences. The amino acids correspond-
ing to sites under positive selection in grasses were reported.
Five ﬁgures and two tables are available at http://www.
This work was funded by Swiss NSF grant 3100AO-105886/1. N.S.
and V.S. were funded by the European Commission (Marie Curie
EST ‘‘HOTSPOTS,’’ contract MEST-CT-2005-020561). We thank
the Swiss Institute of Bioinformatics for access to the Vital-IT clus-
ter. The authors are especially thankful to F. Anthelme, Y. Bouche-
nak-Khelladi, V.R. Clark, M. Gonzalez, T.R. Hodkinson, J. Kissling,
C. Lavergne, A. Persico, T. Renaud, P. Rondeau, S. Sunkkaew,
A. Teerawatanakon, and Y. Wang who provided either DNA aliquots
or grass samples. N. Fumeaux at the herbarium of the botanical
garden of Geneva helped with grass identiﬁcation. Finally, O. Bro
nimann, L. Bu¨ chi, M. Chapuisat, P.B. Pearman, E. Samaritani, and I.
Sanders made useful comments on the earlier versions of the man-
uscript. We thank two anonymous reviewers for useful comments.
Received: May 3, 2007
Revised: June 4, 2007
Accepted: June 12, 2007
Published online: July 5, 2007
1. Sage, R.F. (2004). The evolution of C
Phytol. 161, 341–370.
2. Lepiniec, L., Vidal, J., Chollet, R., Gadal, P., and Cre
tin, C. (1994).
Phosphoenolpyruvate carboxylase—structure, regulation and
evolution. Plant Sci. 99, 111–124.
3. Giussani, L., Cota-Sa
nchez, J.H., Zuloaga, F., and Kellogg, E.A.
(2001). A molecular phylogeny of the grass subfamily Panicoi-
deae (Poaceae) shows multiple origins of C
Am. J. Bot. 88, 1993–2012.
4. GPWG-Grass Phylogeny Working Group (2001). Phylogeny and
subfamilial classiﬁcation of the grasses (Poaceae). Ann. Mo.
Bot. Gard. 88, 373–457.
5. Sanchez-Ken, J.G., Clark, L.G., Kellogg, E.A., and Kay, E.E.
(2007). Reinstatement and emendation of subfamily Micrairoi-
deae (Poaceae). Syst. Bot. 32, 71–80.
6. Stewart, C.B., Schilling, J.W., and Wilson, A.C. (1987). Adaptive
evolution in the stomach lysozymes of foregut fermenters.
Nature 330, 401–404.
7. Kriener, K., O’hUigin, C., Tichy, H., and Klein, J. (2000). Conver-
gent evolution of major histocompatibility complex molecules
in humans and New World monkeys. Immunogenetics 51, 169–
8. Savolainen, V., Chase, M.W., Salamin, N., Soltis, D.E., Soltis,
P.E., Lopez, A.J., Fedrigo, O., and Naylor, G.J.P. (2002). Phylog-
eny reconstruction and functional constraints in organellar
genomes: plastid atpB and rbcL sequences versus animal mito-
chondrion. Syst. Biol. 51, 638–647.
sing, O.E., Westhoff, P., and Svensson, P. (2000). Evolution of
phosphoenolpyruvate carboxylase in Flaveria, a conserved
serine residue in the carboxyl-terminal part of the enzyme is
a major determinant for C
-speciﬁc characteristics. J. Biol.
Chem. 275, 27917–27923.
10. Dong, L.Y., Masuda, T., Kawamura, T., Hata, S., and Izui, K.
(1998). Cloning, expression, and characterization of a root-form
phosphoenolpyruvate carboxylase from Zea mays: comparison
with the C4-form enzyme. Plant Cell Physiol. 39, 865–873.
11. Svensson, P., Bla
sing, O.E., and Westhoff, P. (2003). Evolution of
phosphoenolpyruvate carboxylase. Arch. Biochem. Biophys.
12. Yang, Z.H., and Nielsen, R. (2002). Codon-substitution models
for detecting molecular adaptation at individual sites along spe-
ciﬁc lineages. Mol. Biol. Evol. 19, 908–917.
13. Zhang, J.Z., Nielsen, R., and Yang, Z.H. (2005). Evaluation of an
improved branch-site likelihood method for detecting positive
selection at the molecular level. Mol. Biol. Evol. 22, 2472–2479.
14. Andreev, D., Kreitman, M., Phillips, T.W., Beeman, R.W., and
Ffrench-Constant, R.H. (1999). Multiple origins of cyclodiene
insecticide resistance in Tribolium castaneum (Coleoptera:
Tenebrionidae). J. Mol. Evol. 48, 615–624.
15. Mundy, N.I., Badcock, N.S., Hart, T., Scribner, K., Janssen, K.,
and Nadeau, N.J. (2004). Conserved genetic basis of a quantita-
tive plumage trait involved in mate choice. Science 303, 1870–
16. Mundy, N.I. (2005). A window on the genetics of evolution: MC1R
and plumage colouration in birds. Proc. R. Soc. Lond. B. Biol.
Sci. 272, 1633–1640.
17. Protas, M.E., Hersey, C., Kochanek, D., Zhou, Y., Wilkens, H.,
Jeffery, W.R., Zon, L.I., Borowsky, R., and Tabin, C.J. (2006).
Genetic analysis of caveﬁsh reveals molecular convergence in
the evolution of albinism. Nat. Genet. 38, 107–111.
18. Yokoyama, R., and Yokoyama, S. (1990). Convergent evolution
of the red- and green-like visual pigment genes in ﬁsh, Astyanax
fasciatus, and human. Proc. Natl. Acad. Sci. USA 87, 9315–9318.
19. Zakon, H.H., Lu, Y., Zwickl, D.J., and Hillis, D.M. (2006). Sodium
channel genes and the evolution of diversity in communication
signals of electric ﬁshes: convergent molecular evolution.
Proc. Natl. Acad. Sci. USA 103, 3675–3680.
20. Zhang, J.Z. (2006). Parallel adaptive origins of digestive RNases
in Asian and African leaf monkeys. Nat. Genet. 38, 819–823.
21. Wood, T.E., Burke, J.M., and Rieseberg, L.H. (2005). Parallel ge-
notypic adaptation: when evolution repeats itself. Genetica 123,
22. Kai, Y., Matsumura, H., Inoue, T., Terada, K., Nagara, Y., Yoshi-
naga, T., Kihara, A., Tsumura, K., and Izui, K. (1999). Three-
dimensional structure of phosphoenolpyruvate carboxylase: a
proposed mechanism for allosteric inhibition. Proc. Natl. Acad.
Sci. USA 96, 823–828.
23. Kai, Y., Matsumura, H., and Izui, K. (2003). Phosphoenolpyruvate
carboxylase: three-dimensional structure and molecular mech-
anisms. Arch. Biochem. Biophys. 414, 170–179.
24. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Lan-
gendijk-Genevaux, P.S., Pagni, M., and Sigrist, C.J.A. (2006).
The PROSITE database. Nucleic Acids Res. 34, D227–D230.
25. Matsuoka, M., Furbank, R.T., Fukayama, H., and Miyao, M.
(2001). Molecular engineering of C
Rev. Plant Biol. 52, 297–314.
26. Miyao, M. (2003). Molecular evolution and genetic engineering of
photosynthetic enzymes. J. Exp. Bot. 54, 179–189.
27. Raines, C.A. (2006). Transgenic approaches to manipulate the
environmental responses of the C
carbon ﬁxation cycle. Plant
Cell Environ. 29, 331–339.
28. Christin, P.A., Salamin, N., Savolainen, V., and Besnard, G.
(2007). A phylogenetic study of the phosphoenolpyruvate
carboxylase multigene family in Poaceae: understanding the
molecular changes linked to C
photosynthesis evolution. Kew
Bull. 62, in press.
29. Thompson, J.D., Higgins, D.J., and Gibson, T.J. (1994). Clus-
talW: improving the sensitivity of progressive multiple sequence
alignment through sequence weighting, position speciﬁc gap
penalties and matrix choice. Nucleic Acids Res. 22, 4673–4680.
30. Guindon, S., and Gascuel, O. (2003). A simple, fast, and accurate
algorithm to estimate large phylogenies by maximum likelihood.
Syst. Biol. 52, 696–704.
31. Felsenstein, J. (2005). PHYLIP (Phylogeny Inference Package)
version 3.6 (Seattle, WA: Department of Genome Sciences,
University of Washington).
32. Swofford, D.L. (2002). PAUP*: Phylogenetic Analysis Using
Parsimony (* and other methods), version 4.0b8 (Sunderland,
MA: Sinauer Associates).
33. Ronquist, F., and Huelsenbeck, J.P. (2003). MrBayes 3: Bayes-
ian phylogenetic inference under mixed models. Bioinformatics
34. Yang, Z.H. (1997). PAML: a program package for phylogenetic
analysis by maximum likelihood. Comput. Appl. Biosci. 13,
35. Yang, Z.H., Wong, W.S.W., and Nielsen, R. (2005). Bayes empir-
ical Bayes inference of amino acids sites under positive selec-
tion. Mol. Biol. Evol. 22, 1107–1118.
The accession numbers assigned to the sequences we submitted
to GenBank are from AM689877 to AM689901 and from AM690209
Photosynthesis Molecular Evolution