The Chloroplast Genomes of the Green Algae Pyramimonas, Monomastix, and
Pycnococcus Shed New light on the Evolutionary History of Prasinophytes and
the Origin of the Secondary Chloroplasts of Euglenids
Monique Turmel,* Marie-Christine Gagnon,* Charley J. O’Kelly,? Christian Otis,* and
*De ´partement de Biochimie et de Microbiologie, Universite ´ Laval, Que ´bec (Que ´bec), Canada; and ?Botany Department, University
Because they represent the earliest divergences of the Chlorophyta and include the smallest known eukaryotes (e.g., the
coccoid Ostreococcus), the morphologically diverse unicellular green algae making up the Prasinophyceae are central to
our understanding of the evolutionary patterns that accompanied the radiation of chlorophytes and the reduction of cell
size in some lineages. Seven prasinophyte lineages, four of which exhibit a coccoid cell organization (no flagella nor
scales), were uncovered from analysis of nuclear-encoded 18S rDNA data; however, their order of divergence remains
unknown. In this study, the chloroplast genome sequences of the scaly quadriflagellate Pyramimonas parkeae (clade I),
the coccoid Pycnococcus provasolii (clade V), and the scaly uniflagellate Monomastix (unknown affiliation) were
determined, annotated, and compared with those previously reported for green algae/land plants, including two
prasinophytes (Nephroselmis olivacea, clade III and Ostreococcus tauri, clade II). The chlorarachniophyte Bigelowiella
natans and the euglenid Euglena gracilis, whose chloroplasts originate presumably from distinct green algal
endosymbionts, were also included in our comparisons. The three newly sequenced prasinophyte genomes differ
considerably from one another and from their homologs in overall structure, gene content, and gene order, with the
80,211-bp Pycnococcus and 114,528-bp Monomastix genomes (98 and 94 conserved genes, respectively) resembling the
71,666-bp Ostreococcus genome (88 genes) in featuring a significantly reduced gene content. The 101,605-bp
Pyramimonas genome (110 genes) features two conserved genes (rpl22 and ycf65) and ancestral gene linkages
previously unrecognized in chlorophytes as well as a DNA primase gene putatively acquired from a virus. The
Pyramimonas and Euglena cpDNAs revealed uniquely shared derived gene clusters. Besides providing unequivocal
evidence that the green algal ancestor of the euglenid chloroplasts belonged to the Pyramimonadales, phylogenetic
analyses of concatenated chloroplast genes and proteins elucidated the position of Monomastix and showed that the
Mamiellales, a clade comprising Ostreococcus and Monomastix, are sister to the Pyramimonadales þ Euglena clade. Our
results also revealed that major reduction in gene content and restructuring of the chloroplast genome occurred in
conjunction with important changes in cell organization in at least two independent prasinophyte lineages, the
Mamiellales and the Pycnococcaceae.
The green plants (Viridiplantae) are divided among
two major lineages: the Chlorophyta, containing the bulk
of the extant green algae, and the Streptophyta, containing
the green algae belonging to the Charophyceae sensu
Mattox and Stewart (1984) and all land plants (Lewis
and McCourt 2004). It is thought that the first green plants
were unicellular green algae bearing nonmineralized or-
ganic scales on their cell body and/or their flagella (Mattox
was recognized that flagellated reproductive cells (zoo-
spores, gametes) of some taxa in both the Chlorophyta
and Streptophyta are covered by a layer of square-shaped
scales, which also occur as an underlayer in many prasino-
phytes. Free-living scaly flagellates have been ascribed
mainly to the Prasinophyceae, a nonmonophyletic class
representing the earliest divergences of the Chlorophyta
(Steinkotter et al. 1994; Nakayama et al. 1998; Fawley
et al. 2000; Guillou et al. 2004; Proschold and Leliaert
2007). This morphologically heterogeneous assemblage
of green algae gave rise to the three advanced classes des-
ignated as the Trebouxiophyceae, Ulvophyceae, and Chlor-
ophyceae (Lewis and McCourt 2004). Note that the scaly
biflagellate Mesostigma viride, traditionally classified
within the Prasinophyceae, has been formally excluded
from this class and placed in the Streptophyta (Marin
and Melkonian 1999; Lemieux et al. 2007; Rodriguez-
Ezpeleta et al. 2007). Prasinophytes have always fascinated
the phycologists because their studies have the potential
to shed light on the nature of the last common ancestor
of all green plants and on the origin of the advanced
The concept of the class Prasinophyceae has been un-
up and Throndsen (1988) (Sym and Pienaar 1993); in the
last few years, it has profoundly changed with the descrip-
tion of several new taxa and the analysis of environmental
sequences. Most prasinophytes are found in marine habi-
tats, and considerable diversity is observed with respect
to cell shape and size, flagella number and behavior, mitotic
as accessory photosynthetic pigments and storage products
(Melkonian 1990; O’Kelly 1992; Sym and Pienaar 1993;
Latasa et al. 2004). Some species lack flagella, others lack
(e.g.,Ostreococccus tauri).Thesmall-sized membersofthe
Prasinophyceae, particularly those belonging to three gen-
era of the Mamiellales (Micromonas, Bathycoccus, and
Ostreococcus), are prominent in the oceanic picoplankton
(comprising organisms less than3 lm indiameter) (Guillou
et al. 2004). Included in this category is the smallest
Key words: prasinophyte green algae, euglenids, chloroplast genome
evolution, phylogenomics, secondary endosymbiosis, genome reduction,
horizontal DNA transfers.
Mol. Biol. Evol. 26(3):631–648. 2009
Advance Access publication December 12, 2008
? The Author 2008. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: email@example.com
by guest on February 20, 2013
1994). Phylogenetic studies using molecular data, in partic-
ular the nuclear-encoded small subunit (SSU) rRNA gene,
identified seven monophyletic groups of prasinophytes
at the base of the Chlorophyta (Steinkotter et al. 1994;
Nakayama et al. 1998; Fawley et al. 2000; Guillou et al.
2004); however, their order of divergence could not be re-
solved. Despite this uncertainty, it appears that the coccoid
form evolved more than once in the Prasinophyceae
(Fawley et al. 2000; Guillou et al. 2004). Coccoid cells
are distributed among four lineages (clade II, Mamiellales;
clade V, Pseudocourfieldiales, Pycnococcaceae; clade VI,
Prasinococcales; and clade VII, no order assigned to this
clade), two of which (clades II and V) exhibit both the
coccoid and flagellated cell organizations.
helpful to resolve problematic relationships among green
algae and land plants (Wolf et al. 2005; Qiu et al. 2006;
Jansen et al. 2007; Lemieux et al. 2007; Turmel et al.
2008) although the phylogenetic positions of some green
plant lineages have remained contentious (Pombert et al.
2005; Turmel et al. 2006; Lemieux et al. 2007). The only
two complete chloroplast DNA (cpDNA) sequences cur-
rentlyavailable for prasinophytes,those ofthe scalybiflagel-
late Nephroselmis olivacea (clade III, Pseudocourfieldiales,
Nephroselmidaceae) (Turmel et al. 1999b) and of the tiny
coccoid O. tauri (clade II, Mamiellales) (Robbens et al.
can be designated as ancestral and reduced derived, respec-
tively. Whereas the 200.8-kb Nephroselmis genome harbors
different conserved genes compared with about 138 genes
for the deepest branching streptophyte algae) and has re-
Ostreococcus genome, which is the most compact chloro-
phyte cpDNA known to date, displays a reduced set of 88
genes whose order is highly scrambled. As in most other
chloroplast genomes, two identical copies of a large inverted
repeat (IR) are separated by single-copy (SC) regions; how-
ever, the two prasinophyte genomes differ remarkably in
their quadripartite architectures. The Nephroselmis architec-
tural design closely resembles that found in all streptophyte
IR-containing cpDNAs: the SC regions are vastly unequal in
size, each SC region is characterized by a highly conserved
set ofgenes,and the rRNAoperonencoded bythe IRistran-
scribed toward the small SC (SSC) region. In Ostreococcus,
the SC regions have essentially the same number of genes;
SC region, and the rRNA operon is transcribed away from
the latter SC region (see supplementary fig. 1, Supplemen-
tary Material online). This gene partitioning pattern is rem-
iniscent of that reported for the cpDNAs of the ulvophytes
Pseudendoclonium akinetum and Oltmannsiellopsis viridis
(Pombert et al. 2005, 2006).
To explore the relationships among prasinophyte lin-
eages and to better understand the mode of cpDNA evolu-
tion in the Prasinophyceae, we sequenced the cpDNAs of
the scaly quadriflagellate Pyramimonas parkeae (clade I,
Pyramimonadales), the coccoid Pycnococcus provasolii
(clade V, Pseudocourfieldiales, Pycnococcaceae), and the
scaly uniflagellate Monomastix (unknown affiliation) and
compared these genomes with those previously reported
for Nephroselmis (Turmel et al. 1999b), Ostreococcus
(Robbens et al. 2007), other chlorophytes (Wakasugi
et al. 1997; Maul et al. 2002; Pombert et al. 2005, 2006;
Be ´langer et al. 2006; de Cambiaire et al. 2006, 2007;
Brouard et al. 2008), the deep-branching streptophytes
Mesostigma (Lemieux et al. 2000) and Chlorokybus atmo-
phyticus (Lemieux et al. 2007), the euglenid Euglena gra-
cilis (Hallick et al. 1993) and the chlorarachniophyte
Bigelowiella natans (Rogers et al. 2007). The latter photo-
synthetic eukaryotes, which presumably gained their chlor-
oplasts via independent secondary endosymbiotic events
(Rogers et al. 2007), were included in our comparisons
in an attempt to gain more detailed information about
the green algal donors of their chloroplasts. We found that
the three newly sequenced prasinophyte genomes differ
considerably from one another and from their previously
sequenced homologs at the overall structure, gene content,
and gene order levels, with both the Monomastix and
Pycnococcus genomes featuring a reduced pattern of
evolution. Our phylogenetic analyses of sequence data of-
fered significant insights into the phylogeny and evolution
of prasinophytes and provided unequivocal evidence that
the euglenid chloroplasts were secondarily acquired from
a member of the Pyramimonadales.
Materials and Methods
Strains and Culture Conditions
Pyramimonas parkeae (CCMP 726) and P. provasolii
(CCMP 1203), two marine species, were obtained from the
Provasoli–Guillard National Center for Culture of Marine
Phytoplankton (West Boothbay Harbor, Maine) and grown
in K medium (Keller et al. 1987) under 12 h light–dark
cycles. Monomastix sp., a freshwater strain originally col-
lected by H. R. Preisig in New Zealand, originates from the
personal collection of C.J.O. This strain, which is available
upon request to M.T., was grown in modified Volvox me-
dium (McCracken et al. 1980) under 12 h light–dark cycles.
Cloning and Sequencing of Chloroplast Genomes
The complete cpDNA sequences of Pyramimonas,
Monomastix, and Pycnococcus were generated essentially
alga, A þ T-rich organelle DNA was separated from nu-
clear DNA by CsCl–bisbenzimide isopycnic centrifugation
of total cellular DNA (Turmel et al. 1999a). The organelle
DNA fraction was sheared by nebulization to produce
1,500 to 3,000-bp fragments that were subsequently cloned
into a plasmid vector, either pBluescrit II KSþ or
WI). After hybridization of the resulting clones with the
original DNA used for cloning, plasmids from positive
clones were purified with the QIAprep 96 Miniprep kit
(Qiagen Inc., Mississauga, Canada) and sequenced using
universal primers. DNA assembly was carried out using
AUTOASSEMBLER 2.1.1 (Applied BioSystems, Foster
City, CA) or SEQUENCHER 4.2 (Gene Codes Corporation,
632 Turmel et al.
by guest on February 20, 2013
Ann Arbor, MI). Distinct contigs of cpDNA origin were or-
dered by polymerase chain reaction (PCR) amplification
encompassing uncloned regions were sequenced on both
Chloroplast Genome Analyses
Genes and all open reading frames (ORFs) larger than
et al. 2006). Secondary structures of group I and group II
introns were modeled according to Michel et al. (1989) and
Michel and Westhof (1990), respectively. Short repeats in
the Monomastix genome were identified using REPuter
2.74 (Kurtz et al. 2001), and the number of copies of each
repeat was determined with FINDPATTERNS of the
Genetics Computer Group package (Accelrys, San Diego,
CA). For all three newly sequenced prasinophyte genomes,
regions containing nonoverlapping repeated elements were
mapped with RepeatMasker (http://www.repeatmasker.
org/) running under the WU-Blast 2.0 search engine
(http://blast.wustl.edu/), using the repeats?30 bp identified
with REPuter as input sequences. Conserved gene clusters
exhibiting identical gene polarities in selected green algal
cpDNAs were identified using a custom-built program.
Sequencing of the Monomastix 18S rRNA Gene and
The nuclear-encoded SSU rRNA gene was amplified
from total cellular DNA by PCR using the specific primers
NS1 (White et al. 1990) and 18L (Hamby and Zimmer
1991). The resulting PCR product was purified and se-
quenced directly using these primers and two internal pri-
mers. The Monomastix nuclear-encoded SSU rDNA
sequence was aligned manually against the alignment pre-
pared by Guillou et al. (2004) from 83 chlorophytes and 12
streptophytes. A data set of 1,663 positions was obtained
GBLOCKS 0.91b (Castresana 2000) and the same filtration
parameters employed by Guillou et al. (2004). Maximum
likelihood (ML) trees were inferred using Treefinder (ver-
sion of April 2008) (Jobb et al. 2004) with the best model
fitting the data [TN þ I (proportion of invariable sites) þ C
(four discrete rate categories)] under the Akaike informa-
tion criterion. Bootstrap values were calculated for 100
Phylogenetic Inferences from Whole-Genome Sequence
An amino acid data set and the corresponding nucle-
otide data set with first and second codon positions were
derived from the completely sequenced cpDNAs of
Bigelowiella (NC_008408), Euglena (NC_001603), and
22 green plants (species names and accession numbers, ex-
cept those for Oedogonium cardiacum [NC_011031] and
Leptosira terrestris [NC_009681], are provided in table
3 of Lemieux et al. 2007). These data sets were allowed
to contain missing data; however, limitations were imposed
to the proportion of missing data by selecting for analysis
the protein-coding genes that are shared by at least 14 taxa.
Seventy genes met this criterion: atpA, B, E, F, H, I, ccsA,
cemA, chlB, I, L, N, clpP, ftsH, infA, petA, B, D, G, L, psaA,
B, C, I, J, M, psbA, B, C, D, E, F, H, I, J, K, L, M, N, T, Z,
rbcL, rpl2, 5, 14, 16, 20, 23, 32, 36, rpoA, B, C1, C2, rps2,
3, 4, 7, 8, 9, 11, 12, 14, 18, 19, tufA, ycf1, 3, 4, 12. The
amino acid data set was prepared as follows. The deduced
amino acid sequences from the 70 individual genes were
aligned using MUSCLE 3.7 (Edgar 2004), the ambiguously
aligned regions in each alignment were removed using
GBLOCKS 0.91b (Castresana 2000) with the –b2 option
(minimal number of sequences for a flank position) set
to 13, and the protein alignments were concatenated. To
obtain the nucleotide data set, the multiple sequence align-
ment of each protein was converted into a codon alignment,
the poorly aligned and divergent regions in each codon
alignment were excluded using GBLOCKS 0.91b with
the options –b2 5 13 and –t 5 c (the latter specifying that
selected sequences are complete codons), the individual co-
don alignments were concatenated, and finally third codon
positions were excluded with PAUP* 4.0b10 (Swofford
2003). Missing characters represented 5.9% and 5.8% of
the amino acid and nucleotide data sets, respectively.
Treefinder (version of April 2008) was used to per-
form the ML analyses and to identify the best model fitting
the data under the Akaike information criterion. The amino
acid data set was analyzed using the cpREV þ F (observed
amino acid frequencies) þ C (five categories) model of se-
quence evolution. Trees were inferred from the nucleotide
data set using the GTR þ C (five categories) model. Con-
fidence of branch points was estimated by 500 bootstrap
The Bayesian inference method was conducted using
MrBayes 3.1.2 (Ronquist and Huelsenbeck 2003). The
model selected was cpREV þ F þ C for the inference from
the amino acid data set and GTR þ C for the inference of
the nucleotide data set. Rates across sites were modeled on
a discrete gamma distribution with five categories. Two in-
dependent Markovchain Monte Carlo runs, each consisting
of three heated chains in addition to the cold chain, were
carried out using the default parameters. For the analysis
of the nucleotide data set, the length of each run was 3 mil-
lion generations after a burn-in phase of 500,000 genera-
tions; for the amino acid data set, it was 1 million
generations after a burn-in phase of 150,000 generations.
Trees were sampled every 100 generations. Convergence
of the two independent runs was verified according to
the output of the ‘‘sump’’ command; this output was also
used to determine the burn-in phase. Posterior probability
values were estimated from the trees sampled from both
runs using the ‘‘sumt’’ command.
Reconstruction of Ancestral Character States
A data set of gene content was prepared from the chlo-
roplast genomes of the streptophytes Mesostigma and
Chlorokybus, the prasinophytes, and Euglena by coding
the presence and absence of genes as binary characters.
Analysis of three Prasinophyte Chloroplast Genomes633
by guest on February 20, 2013
Gene order in each of these chloroplast genomes was con-
verted to all possible pairs of signed genes (i.e., taking into
account gene polarity) and a gene order data set was ob-
tained by coding as binary characters the presence/absence
of the ancestral gene pairs conserved in at least one strep-
tophyte and one prasinophyte. The gene content and gene
order data sets were merged to produce a data set of com-
bined ancestral characters. Losses of these characters on the
best tree topology inferred from sequence data were map-
ped using MacClade 4.08 (Maddison and Maddison 2000).
The most parsimonious reconstructions of ancestral charac-
ter states were inferred under the Dollo principle of parsi-
mony (Farris 1977).
Results and Discussion
Pyramimonas cpDNA Features an Ancestral
Quadripartite Structure and a Large Repertoire of Genes
Of the three newly sequenced prasinophyte genomes,
only that of Pyramimonas displays a large IR (table 1). At
101,605 bp, this genome is 2-fold smaller than its Nephro-
selmis homolog, a size difference attributable to a much
shorter IR, gene losses, and a more compact gene organi-
zation. As shown in figure 1, the two copies of the IR se-
quence, each 13,057 bp in size and encoding 11 genes, are
separated by SC regions of 10,338 and 65,153 bp compris-
ing 12 and 76 genes, respectively. In this figure are color
coded the genes whose orthologs are usually found within
the IR, the SSC and large SC (LSC) regions in streptophyte
cpDNAs. It can be seen that the pattern of gene partitioning
among the SC regions of the Pyramimonas genome closely
resembles that observed for streptophytes. Considering that
the Pyramimonas IR is about 2-fold larger and encodes ad-
ditional genes relative to that of Mesostigma and that the IR
is known to contract and expand through gene conversion
events (Goulding et al. 1996), the observation that the ter-
mini of the Pyramimonas IR contain genes characteristic of
the adjacent SC regions is not surprising. The most impor-
tant deviation from the highly conserved partitioning pat-
tern displayed by streptophytes concerns the locations of
chlL and chlN. These two genes, which would be expected
to be present in the SSC region, lie within the IR near the
The Pyramimonas chloroplast genome encodes 110
conserved genes, that is, genes found in several other
cpDNAs and usually present in cyanobacteria. The products
of these genes consist of 81 proteins and 29 RNA species (2
rRNAs and 27 tRNAs) (table 2). The set of 27 tRNAs is suf-
ficient to decode all 61 sense codons provided that the tRNA
respective codon family through superwobble pairing
between the first position of the anticodon and the third
position of the codon (Rogalski et al. 2008). The size of
the Pyramimonas chloroplast gene complement closely
matches those observed for the trebouxiophytes Chlorella
vulgaris and Leptosira and for the ulvophytes Pseudendo-
clonium and Oltmannsiellopsis (de Cambiaire et al. 2007).
Although it is significantly reduced compared with its
chloroplast genes includes six ndh genes (ndhA and ndhD
ously found only in Nephroselmis in the Chlorophyta, as
well as two protein-coding genes reported here for the first
time in a chlorophyte chloroplast genome, rpl22 and ycf65
(supplementary table 1, Supplementary Material online).
The ycf65 gene is present in both Mesostigma and Chloro-
kybus but missing in the other investigated streptophytes,
whereas rpl22 shows a widespread distribution in the
Streptophyta and also resides in the Euglena chloroplasts.
Perhaps not surprisingly, most of the 22 chloroplast genes
present in Nephroselmis but absent in Pyramimonas are
also missing from some chlorophytes belonging to the
Trebouxiophyceae, Ulvophyceae, or Chlorophyceae (sup-
plementary table 1, Supplementary Material online). Only
five genes (cemA, petD, petL, psbM, and rrf) represent ex-
General Features of Prasinophyte cpDNAs
Conserved genes (no.)b
Fraction of genome (%)
Group I (no.)
Group II (no.)
Fraction of genome (%)
Average size (bp)
Short repeated sequencesd
Fraction of genome (%)
aBecause Pycnococcus and Monomastix cpDNAs lack an IR, only the total sizes of these genomes are given.
bConserved genes refer to free-standing coding sequences usually present in chloroplast genomes. Genes present in the IR were counted only once.
cIn addition to conserved genes, all ORFs ?100 codons were considered as gene sequences.
dNonoverlapping repeat elements were mapped on each genome with RepeatMasker using the repeats ?30 bp identified with REPuter as input sequences.
634 Turmel et al.
by guest on February 20, 2013
gene), are also lacking in the Ostreococcus and Euglena
chloroplasts. The analysis of the nuclear genome from
both O. tauri and Ostreococcus lucimarinus revealed that
(Derelle et al. 2006; Palenik et al. 2007; Robbens et al.
2007). Considering that these genes are essential for chlo-
roplast function, they are also likely to be nuclear-encoded
transfer has been documented for rrf, the possibility exists
and that its sequence has diverged beyond recognition.
We found two large ORFs that are not associated with
anyintrons,orf454andorf510. Forthe orf510,present inthe
LSC region near the IR, our Blast searches against the non-
redundant protein sequence database of the National Center
function for the potential encoded protein. However, the
product of the orf454 localized in the IR revealed sequence
similarity with the conserved domain of phage associated
DNA primases (COG3378, E-value 5 1e ? 06). Interest-
ingly, in the course of the present study, we have found that
the orf389 in the Nephroselmis IR (Turmel et al. 1999b) also
encodes a putative protein with the conserved domain of
phage associated DNA primases (COG3378, E-value 5
2e ? 12). Given that viruses have been observed in Pyrami-
monas (Moestrup and Thomsen 1974; Sandaa et al. 2001)
and Nephroselmis (Nakayama et al. 2007), it is tempting
to speculate that the above-mentioned orf454 and orf389
originated from horizontal transfer of viral genes. There
are only a few documented cases of nonstandard, free-
standing chloroplast genes that were acquired via horizontal
gene transfer, and all these cases involve genes that partic-
Brouard et al. 2008; Cattolico et al. 2008). Like the orf454
and orf389, the two horizontally transferred genes identified
in the chlorophycean green alga Oedogonium cardiacum are
housed in the IR (Brouard et al. 2008).
In general, the conserved genes present in Pyramimo-
nas cpDNA are densely packed (table 1). Prominent excep-
tions are those in the regions containing the orf454 and
orf510 (fig. 1). There are two cases of overlapping genes
(psbC–psbD and ndhC–ndhK); for the remaining genes,
FIG. 1.—Gene map of Pyramimonas cpDNA. The two copies of the IR sequence are represented by thick lines. Genes (filled boxes) on the outside
of the map are transcribed in a clockwise direction. Coding sequences not commonly found in cpDNA are shown in gray. The single intron in atpB is
represented by an open box. The color code denotes the genomic regions containing the corresponding genes in the cpDNAs of Nephroselmis and
streptophytes: magenta, SSC; cyan, LSC; and yellow, IR. Given the variable gene content of the IR in these ancestral-type genomes, only the genes
invariably present in this region (i.e., those forming the rRNA operon) were represented in yellow. tRNA genes are indicated by the one-letter amino
acid code (Me, elongator methionine; Mf, initiator methionine) followed by the anticodon in parentheses.
Analysis of three Prasinophyte Chloroplast Genomes635
by guest on February 20, 2013
intergenic spacers vary between 3 and 2,517 bp, with an
average size of 159 bp. Consistent with this high degree
of compaction, only a few short repeats, mostly direct re-
peats, were identified (table 2); they are found mainly in the
large spacer adjacent to the orf510.
below), the Pyramimonas genome features a unique intron,
a group II intron in atpB. However, the Pyramimonas atpB
intron and those of Ostreococcus and Pycnococcus are in-
serted at different sites and carry distinct ORFs, indicating
that they arose from separate events of horizontal DNA
transfer. It should be pointed out here that the currently
available chloroplast genome data strongly support the no-
tion that no introns were present in the chloroplast of the
Gene Repertoires of Prasinophyte cpDNAs
aOnly the genes that are missing in one or more genomes are indicated. A total of 80 genes are shared by all compared cpDNAs: atpA, B, E, F, H, I, clpP, ftsH, infA,
petA, B, G, psaA, B, I, psbA, B, C, D, E, F, H, I, J, K, L, N, T, Z, rbcL, rpl2, 5, 14, 16, 20, 23, 36, rpoA, C1, C2, rps2, 3, 4, 7, 8, 11, 12, 14, 18, 19, rrl, rrs, tufA, trnA(ugc),
C(gca), D(guc), E(uuc), F(gaa), G(ucc), H(gug), I(gau), K(uuu), L(uaa), L(uag), Me(cau), Mf(cau), N(guu), P(ugg), Q(uug), R(acg), R(ucu), S(gcu), S(uga), T(ugu), V(uac),
W(cca), Y(gua), ycf1, 3, 12.
bycf20 is present as a pseudogene in Nephroselmis (Lemieux C, unpublished data); it is located downstream of ndhE and corresponds to orf111 in the gene map
reported by Turmel et al. (1999b).
636Turmel et al.
by guest on February 20, 2013
common ancestor of all green plants (Turmel et al. 1999b;
group IIA intron is located within domain IV of the
intron secondary structure and carries the reverse tran-
scriptase (cd01651) and maturase (pfam01348) domains,
but not the endonuclease domain, of reverse transcriptases
encoded by group II introns. The endonuclease domain,
which carries out second-strand DNA cleavage during
group II intron mobility (Lambowitz and Zimmerly
2004), was most likely lost after the horizontal transfer
of the intron in the Pyramimonas chloroplast. The
orf608 product shares strong sequence similarity with re-
verse transcriptases encoded by the genomes of firmicute
bacteria and by the mitochondrial cox1 genes of fungi,
the brown alga Pylaiella littoralis, and the cryptophyte
Like Its Ostreococcus Homolog, Pycnococcus cpDNA
Has a Reduced Gene Content and Is Highly Compact
The Pycnococcus chloroplast genome is the smallest
and most compact of the three prasinophyte genomes se-
quenced during this study (table 1 and fig. 2). It is only
8.6 kb larger relative to Ostreococcus cpDNA and contains
10 additional conserved genes, for a total of 98 genes. In
terms of size, this gene repertoire, which consists of 65 pro-
tein genes and 33 RNA genes encoding 2 rRNAs, 30
tRNAs, and the RNA component of RNase P (table 2),
is similar to that observed for chlorophycean green algae
(Brouard et al. 2008). The tRNA complement includes
one tRNA species not previously documented in any chlor-
ophytes [tRNAPro(GGG)] but like its Ostreococcus homo-
log, lacks the tRNA species that reads the AUA codon [i.e.,
the tRNAIle(CAU) where C is modified posttranscription-
ally to lysidine]. As in Pyramimonas cpDNA, the 5S rRNA
gene was not detected. Moreover, the Pycnococcus genome
is missing the protein-coding genes psaJ and rpoB, which
are present in all other investigated chlorophytes. Although
cpDNAs all show a reduced gene content compared with
tial differences (table 2).
No vestigial IR region was identified in Pycnococcus
cpDNA. The genes generally found in this region are
FIG. 2.—Gene map of Pycnococcus cpDNA. Genes (filled boxes) on the outside of the map are transcribed in a clockwise direction. The single
intron in atpB is represented by an open box. The orf163 and orf175 revealed no detectable similarity with any known gene sequences. The genes
whose orthologs are found within the IR, SSC, and LSC regions in Nephroselmis and streptophyte cpDNAs are color coded in supplementary figure 2,
Supplementary Material online.
Analysis of three Prasinophyte Chloroplast Genomes 637
by guest on February 20, 2013
dispersed throughout the genome; in contrast, several
genes usually present within the SSC region in genomes
displaying an ancestral quadripartite structure [chlN, chlL,
ycf1, cysT, and trnP(ggg)] remained clustered together
(supplementary fig. 2, Supplementary Material online).
There are two cases of overlapping genes (ycf4–rnpB
and psbD–psbC); for the other coding regions, intergenic
spacers were found to vary from 0 to 383 bp, for an av-
erage length of 102 bp.
The Pycnococcus atpB intron shares with its Ostreo-
coccus counterpart the same insertion position and a large
ORF in domain IV that features the reverse transcriptase
(cd01651), maturase (pfam08388), and HNH endonuclease
(cd00085) domains of reverse transcriptases encoded by
group II introns. The Pycnococcus and Ostreococcus intron
ORFs share strong similarity with one another and with re-
verse transcriptase genes found in several cyanobacterial
species as well as in group II introns present in the mito-
chondrial large subunit (LSU) rRNA gene of the red alga
Porphyra purpurea (Burger et al. 1999) and the chloroplast
psbA genes of Chlamydomonas sp. CCMP 1619 (Odom
et al. 2004) and Euglena myxocylindracea (Sheveleva
and Hallick 2004).
The Monomastix Chloroplast Genome Has a Reduced
Gene Content but is Loosely Packed with Genes
Compared with its Pycnococcus homolog, the Mono-
mastix chloroplast genome is 34 kb larger, has a deficit
of four genes, and contains five additional introns (table 1,
fig.3and supplementaryfig.3, Supplementary Materialon-
line). Its increased size is largely accounted for by the ex-
pansion of intergenic spacers. The latter vary from 3 to
2,566 bp, for an average size of 524 bp, and contain a
myriad of short repeated sequences rich in G þ C. The
94 conserved genes specify 64 proteins and 30 RNAs
(3 rRNAs, 26 tRNAs, and the RNA component of RNase
P) (table 2). The 26 tRNAs can decode all 61 sense codons
assuming that tRNAArg(ACG), where A is modified to
inosine, recognizes all four codons of the CGX family.
The reduced gene content of Monomastix is more like
the gene complement of Ostreococcus than that of
FIG. 3.—Gene map of Monomastix cpDNA. Genes (filled boxes) on the outside of the map are transcribed in a clockwise direction. Introns are
represented by open boxes. The orf122 and orf125 revealed no detectable similarity with any known gene sequences. The genes whose orthologs are
found within the IR, SSC, and LSC regions in Nephroselmis and streptophyte cpDNAs are color coded in supplementary figure 3, Supplementary
638Turmel et al.
by guest on February 20, 2013
Pycnococcus (table 2). It features nine genes that are miss-
ing from Ostreococcus and lacks only three genes that are
present in this alga, including psaC, a gene shared by the
chloroplasts of all previously investigated chlorophytes.
Although short dispersed repeats were mapped predomi-
nantly to intergenic regions, a small fraction was found
withinthe codingregions offivegenes(ftsH,rpoB,rpoC1,
intron 4) (supplementary fig. 4, Supplementary Material
online). This distribution pattern resembles those reported
for other chlorophyte cpDNAs rich in short repeats (Maul
et al. 2002; Pombert et al. 2005, 2006; Be ´langer et al.
2006; de Cambiaire et al. 2006, 2007). Ranging from 19
and 58 nucleotides, the most abundant short dispersed re-
peats of Monomastix were classified into four families (A
motifs; moreover, some repeats displaying partial sequen-
ces characteristic of distinct families were discerned (sup-
plementary fig. 5, Supplementary Material online). The
signed to six categories (named AB, AC, AD, A1D, A1B,
and BD), suggests they arose through recombination be-
tween regions carrying different repeats.
The Monomastix chloroplast genome contains a single
group II intron, located in trnK(uuu), and five group I in-
trons, one of which resides in psbA and four in the LSU
rRNA gene (rrl) (fig. 1). The IIB trnK intron is inserted
within the D arm of the tRNA secondary structure follow-
ing G23 and lacks an ORF. All other trnK(uuu) introns that
have been identified in streptophyte cpDNAs carry an in-
ternal ORF with a maturase domain (matK) and are inserted
within the anticodon loop (Turmel et al. 2006). In view of
their ability to encode a homing endonuclease, the five
Monomastix group I introns are likely to be mobile and
were probably captured via horizontal intracellular and/or
intercellular DNA transfer. The IA2 psbA intron, found
at position 525 relative to the corresponding Mesostigma
gene, specifies a potential homing endonuclease with the
GIY–YIG motif and has chloroplast homologs with the
same insertion site and highly similar endonuclease genes
in the ulvophytes Oltmannsiellopsis and Pseudendoclo-
nium and the chlorophycean green algae Oedogonium
and Chlamydomonas reinhardtii (Brouard et al. 2008).
The remaining four group I introns encode potential LA-
GLIDADG homing endonucleases (Co ˆte ´ et al. 1993; Lucas
et al. 2001) and also share identical insertion sites with
a large number of chlorophyte (Lucas et al. 2001; Brouard
etal.2008) andcyanobacterial (Haugenetal.2007)introns.
The first and third LSU rDNA introns, whose insertion po-
sitions correspond to sites 1931 and 2500 in the E. coli 23S
rRNA, fall within subgroup IB4, whereas the second and
fourth introns inserted at sites 1951 and 2593 belong to
the IA3 family. Like its Chlamydomonas homolog I-CreI,
the Monomastix site-2593 intron-encoded homing endonu-
clease (I-MsoI) has been characterized at the 3D level in
the presence of its DNA target site, revealing that the two
FIG. 4.—Conservation of ancestral gene clusters in prasinophyte and Euglena cpDNAs. Ancestral clusters were defined as those containing genes
in the same order and polarity in at least one streptophyte and one prasinophyte. For each genome, the set of genes making up each of the identified
ancestral clusters is shown as black boxes connected by a horizontal line. Black boxes that are contiguous but not linked together indicate that the
corresponding genes are not adjacent on the genome. Gray boxes denote individual genes that have been relocated elsewhere on the chloroplast genome
and empty boxes denote missing genes. The relative polarities of the genes are not represented in this figure; for this information, consult the maps
shown in figures 1–3 or that previously reported for the Nephroselmis genome (Turmel et al. 1999b).
Analysis of three Prasinophyte Chloroplast Genomes639
by guest on February 20, 2013
sites 1931, 2500, and 2593, the Monomastix mitochondrial
LSU rRNA gene features introns with similar structures and
ORFs as those found atidentical sites in the chloroplast gene
group Iintrons were exchangedbetweendifferent organel-
lar compartments in the Monomastix lineage. Evidence
supporting such intracellular exchanges of group I introns
has also been reported for the Nephroselmis (Turmel et al.
1999a) and Pseudendoclonium (Pombert et al. 2006)
Pyramimonas and Euglena cpDNAs Show Striking
Similarities in Gene Order
Gene orders in the three newly sequenced prasino-
phyte chloroplast genomes were compared with one an-
chlorophytes, the streptophytes Mesostigma and Chloroky-
bus, the euglenid Euglena, and the chlorarachniophyte
Bigelowiella. In all pairwise genome comparisons, except
that including Pyramimonas and Euglena, the vast majority
of the identified syntenic blocks were composed exclu-
sively of gene clusters commonly found in streptophytes
and chlorophytes. Ancestral clusters of this type display
substantial variability among the Euglena and prasinophyte
genomes (fig. 4). Clearly, the gene-rich genome of Neph-
roselmis exhibits the highest number of genes (94 genes)
mapping to clusters predating the split of the Chlorophyta
and Streptophyta. Breakpoints within ancestral clusters
proved to be too variable in positions to determine which
of the compared genomes are the most closely related. Note
that our comparisons of the Pyramimonas genome with
those of Mesostigma and Chlorokybus disclosed ancestral
gene linkages that had not been reported in any chlorophyte
cpDNA (e.g., psbH–petB–petD, R(ccg)–rbcL–atpB–atpE).
The ancestral rps2–atpI linkage detected in the Euglena
genome was also previously unrecognized in chlorophytes.
Comparison of gene orders in the Pyramimonas
and Euglena cpDNAs revealed striking similarities be-
tween these genomes. Almost two-thirds of the 87 genes
(56 genes) in Euglena cpDNA were found to be part of
collinear regions, for a total of 16 syntenic blocks.
Thirty-five of these genes form eight blocks that exhibit
gene linkages unique to Pyramimonas and Euglena
(fig. 5). Four blocks contain exclusively derived linkages,
whereas the remaining four also include ancestral gene
linkages present in chlorophytes and streptophytes
(the rpl23, rpl32, rps12, and rrs clusters). It is interesting
to note that in each of the latter four blocks, a pair of
adjacent genes was cleanly excised from the Euglena
genome following the formation of the derived linkages.
The syntenic block containing the triad psbK–ycf12–
psaM is not uniquely shared by the Pyramimonas and
Euglena chloroplasts. Being also present in Chlorella,
Pseudendoclonium, and Oltmannsiellopsis but not in
streptophytes, this derived cluster must have arisen
in prasinophytes and have been transmitted by vertical
descent to the trebouxiophyte and ulvophyte lineages.
Monomastix Occupies an Early-Diverging Branch of the
Mamiellales in 18S rDNA Trees
Monomastix has been historically affiliated with the
Prasinophyceae; however, the finding that its body scales
are not typical of those found in prasinophytes but are
more like those of the chrysophyte Chromulina placentula
(Manton 1967) led to the exclusion of this genus from the
Prasinophyceae (Melkonian 1990; Sym and Pienaar
1993). Very limited molecular information has been re-
ported so far for Monomastix, explaining why its phylo-
genetic status has remained enigmatic. In the present
study, we determined the sequence of the Monomastix
nuclear-encoded SSU rRNA gene and compared it with
those available for other prasinophytes and some
representatives of the Trebouxiophyceae, Ulvophyceae,
and Chlorophyceae. Trees inferred with ML unam-
biguously showed that Monomastix represents an early-
diverging lineage of the Mamiellales (clade II) (fig. 6).
This uniflagellate, which has nonprasinophyte scales,
was resolved as the first branch of this morphologically
diverse clade. An unquestionable affinity therefore
exists between Ostreococcus and Monomastix even
though these two taxa belong to different lineages of
FIG. 5.—Derived gene clusters uniquely shared by the Euglena and Pyramimonas cpDNAs. The genes shown as gray boxes represent the derived
640Turmel et al.
by guest on February 20, 2013
the Mamiellales. The naked Ostreococcus is closely re-
lated to the scaly Bathycoccus and the clade uniting these
nonflagellated genera is sister to that containing the flag-
ellated genera Mamiella (two flagella), Mantoniella (one
flagellum), Micromonas (naked, one flagellum), and the
new genus represented by isolate RCC 391 (two flagella).
FIG. 6.—Phylogenetic position of Monomastix among prasinophytes as inferred from nuclear-encoded SSU rDNA sequences. The figure presents
the best ML tree. Bootstrap values are shown on the corresponding nodes. The names of the taxa whose chloroplast genomes were examined in the
present study are shown on a black background. Clade numbering follows that of Guillou et al. (2004).
Analysis of three Prasinophyte Chloroplast Genomes641
by guest on February 20, 2013
Chloroplast Phylogenomic Analyses Unite the
Pyramimonadales with the Mamiellales and Identify the
Pyramimonadales as the Source of the Euglenid
To explore the relationships among prasinophyte lin-
eages (in particular clades I, II, III, and V) as well as the
relationships of chlorophyte chloroplasts with the second-
arilyacquired chloroplasts of Bigelowiella and Euglena,we
generated data sets of 70 concatenated proteins and genes
(first and second codon positions) from completely se-
quenced chloroplast genomes and analyzed them using
the ML and Bayesian methods (fig. 7). As expected, both
the protein and gene trees identified a strongly supported
clade uniting the two representatives of the Mamiellales,
Monomastix, and Ostreococcus. This clade is sister to a
FIG. 7.—Phylogenies inferred from 70 concatenated chloroplast genes (first two codon positions) and their deduced amino acid sequences. (A) Best
ML tree inferred from the amino acid data set. (B) Best ML tree inferred from the nucleotide data set. The bootstrap values obtained in ML analyses and
the posterior probability values obtained in Bayesian analyses are shown on the left and right, respectively, on the corresponding nodes.
642 Turmel et al.
by guest on February 20, 2013
robust monophyletic group clustering the Pyramimonas
(scaly, four or eight flagella) and Euglena chloroplasts.
Although this sister relationship received 87% bootstrap
support in the protein ML tree (fig. 7A), exclusion of the
resulted in 97% bootstrap support for the Pyramimonas þ
Monomastix þ Ostreococcus clade (data not shown). In
all analyses, the scaly biflagellate Nephroselmis was sister
to all chlorophytes analyzed, whereas the position of the
naked, nonflagellated Pycnococcus remained equivocal.
The latter prasinophyte was resolved as sister to the core
chlorophytes in the protein tree (fig. 7A), but was sister
to the Mamiellales, Pyramimonadales, and euglenids in
thegene tree (fig. 7B). The proteinand gene trees thus differ
only in the branching position of the core chlorophytes with
respect to the prasinophyte lineages.
Because phylogenetic analyses based on the whole-
genome approach are inherently associated with sparse
taxon sampling, they can lead to trees robustly supporting
an artifactual clustering of taxa (Brinkmann and Philippe
2008; Heath et al. 2008). Caution must therefore be exer-
cised in the interpretation of the observed topologies. In the
case of trees derived from complete genome sequences,
dent data to test topologies (Rokas 2006). In the present
study, the strong alliance we uncovered between the
Pyramimonas and Euglena chloroplasts is strengthened
of these algae (fig. 5). Based on this finding, we infer with
confidence that the green algal partner in the secondary
endosymbiosis that gave rise to euglenids was a member
of the Pyramimonadales. Euglenids are unicellular organ-
isms that belong to the Excavata, a supergroup of eukar-
yotes including diverse nonphotosynthetic groups like
diplomonads, retortamonads, parabasalids, oxymonads,
and jakobids (Baldauf et al. 2000, 2008; Keeling et al.
2005). Euglenids are the only photosynthetic excavates,
and they are specifically related to a subgroup containing
the kinetoplastids and diplonemids (Triemer and Farmer
2007). Prior to our study, published data were consistent
with the notion that the euglenid chloroplasts evolved from
a green algal endosymbiont that was allied to prasinophytes
(Turmel et al. 1999b; Ishida et al. 1997; Rogers et al. 2007);
however, it remained unknown as to which of the mono-
phyletic groups of prasinophytes harbored the closest rel-
ative of the euglenid endosymbiont. In agreement with
our results, the ML tree that Ishida et al. (1997) inferred
from the amino acid sequences of elongation factor Tu
revealed a strongly supported clade clustering Pyramimo-
nas disomata and the euglenids E. gracilis and Astasia
longa; however, this Pyramimonas species was the only
prasinophyte sampled in this single-gene analysis. Like-
wise, considering that P. parkeae is the unique representa-
tive ofthe Pyramimonadales
phylogenomic study, there remain uncertainties about the
exact pyramimonadalean lineage that was the source of
the euglenid chloroplasts.
In the eukaryotic tree of life based on nuclear-encoded
genes, euglenids and chlorarachniophytes fall within dis-
tinct branches. Like euglenids, chlorarachniophytes belong
to a supergroup of eukaryotes that is primarily nonphoto-
synthetic, the Rhizaria (Keeling et al. 2005; Baldauf
2008). By robustly placing Bigelowiella at a separate posi-
tion from Euglena, our chloroplast phylogenomic analyses
strongly reinforce the hypothesis that the euglenid and
chlorarachniophyte chloroplasts trace back to two indepen-
dent secondary endosymbioses (Rogers et al. 2007; Taka-
hashi et al. 2007) (fig. 7). Although the chloroplast of
Bigelowiella was found to be sister to those of the ulvo-
phytes Pseudendoclonium and Oltmannsiellopsis in both
the protein and gene trees, broader sampling of core chlor-
ophytes will be required to pinpoint the closest green algal
relative of the chlorarachniophyte endosymbiont.
The most unexpected finding that emerged from our
study is the observation that the Pyramimonas þ Euglena
clade is sister to the Monomastix þ Ostreococcus clade. Al-
though the existence of a sister relationship between the
Pyramimonadales and Mamiellales has not been previously
documented, it is compatible with the resemblance that
these monophyletic groups display at the level of flagellar
scale structure (Melkonian 1984, 1990; O’Kelly 1992; Sym
and Pienaar 1993) and with the branching order inferred
from 18S rDNA data. The Pyramimonadales emerge just
et al. 1994; Nakayama et al. 1998; Fawley et al. 2000;
Guillou et al. 2004); however, these lineages form a weakly
supported clade in the ML tree recently reported by
Nakayama et al. (2007). No similarities were found at
the chloroplast gene order level that link the Pyramimona-
dales and Mamiellales to the exclusion of other chlorophyte
groups; however, losses of at least four genes (cemA, cysT,
petL, and rpl19) could be traced back unambiguously to the
(supplementary table 1, Supplementary Material online).
Because the Pyramimonadales and Mamiellales are
distinguished by prominent morphological differences,
the existence of a sister relationship between these lineages
has important implications for the evolution of prasino-
phytes. All members of the Pyramimonadales, which rep-
the Tasmanites (a fossil resembling the phycoma stages of
Cymbomonas, Pterosperma, and Halosphaera, which has
been found in Precambrian deposits), share a number of
synapomorphic characters and have at least four flagella
and a complex scaly covering consisting of three layers
of scales on the cell body and of two layers on the flagella
(Melkonian 1984, 1990; Sym and Pienaar 1993). The inter-
mediate scale layer on the cell body consists of spiderweb-
shaped scales in Pterosperma and is homologous to the
outer scale layer on the flagellum (the limulus scales)
and to the spiderweb scales of the Mamiellales. The limu-
loid scales of Cymbomonas are also reminiscent of the
spiderweb scales of the Mamiellales, particularly during
morphogenesis (Moestrup et al. 2003). Interestingly, an ap-
parent food-uptake apparatus is present in Cymbomonas,
which has been interpreted as a character inherited from a
phagotrophic ancestor of the green plants and subsequently
lost during evolution of the green algae (Moestrup et al.
2003). On the other hand, the members of the Mamiellales
show reduced morphological complexity and are character-
ized by a progressive simplification of cellular structure
Analysis of three Prasinophyte Chloroplast Genomes643
by guest on February 20, 2013
the loss of scales (Nakayama et al. 1998). They lack an
underlayer of square-shaped scales (such scales are present
in most other prasinophyte lineages and the flagellate
reproductive cells of streptophytes) and no microtubular
flagellar roots are attached to the basal body no. 2. A sister
relationship between the Pyramimonadales and Mamiel-
lales implies that some of the cellular features displayed
by the Mamiellales were derived from the more complex
organization seen in the Pyramimonadales and presumably
in the common ancestor of all chlorophytes. In this context,
it is worth mentioning that the nature of the progenitor of
all green plants has generated intense debate and is still
controversial (Melkonian 1984; O’Kelly 1992; Sym and
Pienaar 1993). A better understanding of the relationships
among prasinophyte lineages will be required before one
can infer with confidence evolutionary scenarios of cellular
At present, the identity of the earliest-diverging chlor-
ophyte lineage remains uncertain. Intriguingly, the trees
inferred from 18S rDNA sequences (Guillou et al. 2004;
Nakayama et al. 2007) are in discordance with the chloro-
plast phylogenomic trees reported in this study with regards
to the position of the Nephroselmis genus (clade III). The
early-diverging position observed for the Nephroselmis
representative in chloroplast trees is in agreement with
the high degree of ancestral features found in the cpDNA
of this taxon (see fig. 8) but contrasts sharply with the much
later divergence observed for the genus in 18S rDNA trees.
In the latter trees, the branch occupied by Nephroselmis
species emerges near the lineage containing Pycnococcus
and Pseudocourfieldia marina,the clade VII containing on-
ly picoplanktonic species, and the clade containing the core
chlorophytes (Chlorodendrales sensu [Melkonian 1990] þ
Trebouxiophyceae þ Ulvophyceae þ Chlorophyceae).
Together, these lineages form a large clade that is well sup-
ported in ML analysis (fig. 6). Given the close relationship
observed on the basis of scale structure between Nephro-
et al. (2007) proposed that the common ancestor of the
clade containing Nephroselmis and the core chlorophytes
scales and rod-shaped scales) and cell body (square scales
and stellate scales). The above-mentioned discrepancy be-
tween nuclear and chloroplast trees highlights the need for
analysis of chloroplast genomes from additional prasino-
phytes. Sampling of chloroplast genomes from all seven
the exact position of Nephroselmis relative to the Pycnococ-
caceae, Pyramimonadales, and Mamiellales.
Losses of Multiple Ancestral cpDNA Characters in
Independent Prasinophyte Lineages are Correlated with
Major Cellular Remodeling
To trace some of the evolutionary changes that
occurred at the chloroplast genome level during the evolu-
tion of prasinophytes and euglenids, losses of 62 genes and
75 ancestral gene pairs were mapped on the tree topology
inferred from sequence data (fig. 8). In this analysis, the
core chlorophytes were excluded and the streptophytes
Mesostigma and Chlorokybus were used as outgroup.
Although multiple characters were lost in independent
lineages, a substantial fraction of losses are uniquely
shared. In particular, the monophyletic group containing
the Mamiellales þ euglenids þ Pyramimonadales and
the node linking the latter clade with the Pycnococcaceae
are supported by several changes that occurred only once.
Because the nuclear genome of just one prasinophyte genus
(Ostreococcus) has been decrypted so far (Derelle et al.
2006; Palenik et al. 2007), we cannot interpret our results
in terms of gene transfers from the chloroplast to the nu-
cleus. Most of the genes that vanished from the chloroplast
genome probably fall into this category; however, some
might have disappeared entirely from the cell because their
requirement is restricted to certain growth and physiolog-
ical conditions (e.g., the chl genes associated with chloro-
phyll synthesis in the dark, the cys genes involved in sulfate
and thiosulfate transport, and the ndh genes associated with
The chloroplast genome sustained important reduction
in gene content in at least three separate lineages, namely,
the lineages leading to Euglena, to the mamiellalean genera
In light of the close affinity of the Pyramimonas and
accompanied by extensive gene losses. Similar extinction of
numerous chloroplast genes has been associated with the
secondary endosymbiosis that involved the capture of a red
alga and generated the chloroplasts of heterokonts, crypto-
phytes, and haptophytes (Khan et al. 2007; Oudot-Le Secq
et al. 2007; Cattolico et al. 2008). With regards to the
Mamiellales, it appears that the common ancestor of
Monomastix and Ostreococcus had already experienced
multiple chloroplast gene losses (fig. 8), implying that these
events might have accompanied the simplification of cell or-
ganization that presumably coincided with the emergence of
the Mamiellales. Moreover, as indicated by the higher fre-
with the Monomastix lineage, part of the gene losses in the
former lineage were likely connected with the evolution of
the coccoid cell organization and the reduction in cell size.
Pycnococcus represents an independent coccoid lineage that
sustained considerable reduction of the chloroplast genome,
and as observed for Ostreoccocus, there was strong pressure
to maintain a compact genome organization. In contrast, the
supplementary fig. 4, Supplementary Material online).
The pressure to maintain the ancestral quadripartite ar-
chitecture became relaxed during the evolution of prasino-
phytes and euglenids. The IR was lost a minimum of three
independent IR losses have been documented for the class
Trebouxiophyceae (de Cambiaire et al. 2007) and for land
plants (Palmer 1991; Raubeson and Jansen 2005). More
unexpected was our finding that the three examined IR-
containing prasinophyte cpDNAs differ significantly in
the distribution of their genes among the two SC regions
and in the orientation of the IR relative to these regions.
644 Turmel et al.
by guest on February 20, 2013
gene partitioning pattern observed for streptophytes and
some nongreen algae (Turmel et al. 1999b), the reduced ge-
Supplementary Material online) more like that observed for
the ulvophytes Pseudendoclonium and Oltmannsiellopsis
FIG. 8.—Losses of chloroplast genes and gene pairs during the evolution of prasinophytes and euglenids. Unique losses are indicated by squares,
whereas convergent losses in two or more lineages are indicated by triangles. Red and blue symbols refer to losses of genes and gene pairs, respectively.
Some gene pairs disappeared as a result of gene losses; those that were not correlated with any gene losses are denoted by dots. The number below each
taxon name indicates the total number of conserved genes in the chloroplast genome. Losses of the IR occurred in the three indicated lineages.
Analysis of three Prasinophyte Chloroplast Genomes645
by guest on February 20, 2013
(Pombert et al. 2005, 2006). When the latter pattern was
identified in Pseudendoclonium, it was hypothesized that
it might represent an intermediate form between the highly
derived pattern found in the chlorophycean green alga
C. reinhardtii and the ancestral quadripartite structure
found in streptophytes, Nephroselmis, and probably early-
diverging trebouxiophytes, thus lending support to the
notion that the Ulvophyceae is sister to the Chlorophyceae
(Pombert et al. 2005). However, the great variability in the
quadripartite structure uncovered here for the Prasinophy-
ceae and recently reported for the Chlorophyceae (de
Cambiaire et al. 2006; Brouard et al. 2008) casts doubt
on the phylogenetic value of this genomic feature. Clearly,
these data indicate that chloroplast genome rearrange-
ments led to the exchanges of genes between opposite
SC regions on multiple occasions during the evolutionary
history of chlorophytes.
The chloroplast genome of prasinophytes exhibits
much more fluidity in gene content and arrangement than
anticipated from the earlier reports on the Nephroselmis
and Ostreococcus genomes. Major reduction and restruc-
turing of the chloroplast genome occurred in conjunction
with changes in cell organization in at least two lineages,
istence of a sister relationship between the Mamiellales and
Pyramimonadales, our study represents a significant step
toward a better understanding of prasinophyte evolution.
Furthermore, it offers for the first time compelling evidence
that the evolutionary history of the prasinophytes was di-
rectly linked with the acquisition of photosynthesis through
secondary endosymbiosis by a subgroup of excavates, the
euglenids. Two independent lines of evidence, trees in-
ferred from sequence data and the presence of uniquely
shared derived gene clusters, robustly support the notion
that the green algal ancestor of the euglenid chloroplasts
belonged to the Pyramimonadales. Although sampling of
Bigelowiella has not enabled us to pinpoint the green algal
donor of chlorarachniophytes chloroplasts, the inferred
trees strengthen the hypothesis that chloroplasts arose in-
dependentlyinchlorarachniophytesand euglenids. Consid-
ering that pyramimonadaleans are richer in ancestral
characters at the chloroplast genome level and exhibit
a more pronounced level of cell asymmetry and complexity
compared with the mamiellaleans, it is plausible that cell
asymmetry characterized the common ancestor of these lin-
eages. Consistent with the hypothesis that the common an-
architecture is the observation that Nephroselmis occupies
the earliest divergence of the Chlorophyta and displays the
highest conservation of ancestral characters. Future chloro-
plast genome investigations incorporating the Chloroden-
drales, the two picoplanktonic lineages not sampled in
the present study, and a broader range of taxa in each lin-
eage should resolve further the branching pattern of prasi-
nophyte lineages and clarify the number of separate events
that gave rise to coccoids and streamlining of the chloro-
Supplementary figures 1–5, supplementary table 1, the
data sets used in phylogenetic analyses, and the data set
used to infer the evolutionary scenario of character losses
are available at Molecular Biology and Evolution online
(http://mbe.oxfordjournals.org/). The fully annotated chlo-
roplast genome sequences of Monomastix, Pycnococcus
and Pyramimonas have been deposited in the GenBank da-
tabase under the accession numbers FJ493497, FJ493498,
and FJ493499, respectively. The GenBank accession num-
ber for the Monomastix 18S rDNA sequence determined in
this study is FJ493496.
We thank Mathieu Blais and Bertrand Caillier for their
assistance in cloning and sequencing the Pyramimonas
chloroplast genome. This study was supported by a grant
from the Natural Sciences and Engineering Research Coun-
cil of Canada (to M.T. and C.L.).
Baldauf SL. 2008. An overview of the phylogeny and diversity of
eucaryotes. J Syst Evol. 46:263–273.
Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF. 2000.
A kingdom-level phylogeny of eukaryotes based on combined
protein data. Science. 290:972–977.
Be ´langer A-S, Brouard J-S, Charlebois P, Otis C, Lemieux C,
Turmel M. 2006. Distinctive architecture of the chloroplast
genome in the chlorophycean green alga Stigeoclonium
helveticum. Mol Genet Genomics. 276:464–477.
Brinkmann H, Philippe H. 2008. Animal phylogeny and large-
scale sequencing: progress and pitfalls. J Syst Evol. 46:
Brouard J-S, Otis C, Lemieux C, Turmel M. 2008. Chloroplast
DNA sequence of the green alga Oedogonium cardiacum
(Chlorophyceae): unique genome architecture, derived char-
acters shared with the Chaetophorales and novel genes
acquired through horizontal transfer. BMC Genomics. 9:290.
Burger G, Saint-Louis D, Gray MW, Lang BF. 1999. Complete
sequence of the mitochondrial DNA of the red alga Porphyra
purpurea. Cyanobacterial introns and shared ancestry of red
and green algae. Plant Cell. 11:1675–1694.
Castresana J. 2000. Selection of conserved blocks from multiple
alignments for their use in phylogenetic analysis. Mol Biol
Cattolico R, Jacobs M, Zhou Y, Chang J, Duplessis M,
Lybrand T, McKay J, Ong H, Sims E, Rocap G. 2008.
Chloroplast genome sequencing analysis of Heterosigma
akashiwo CCMP452 (West Atlantic) and NIES293 (West
Pacific) strains. BMC Genomics. 9:211.
Chevalier B, Turmel M, Lemieux C, Monnat RJ, Stoddard BL.
2003. Flexible DNA target site recognition by divergent
homing endonuclease isoschizomers I-CreI and I-MsoI. J Mol
Co ˆte ´ V, Mercier J-P, Lemieux C, Turmel M. 1993. The single
group-I intron in the chloroplast rrnL gene of Chlamydomo-
nas humicola encodes a site-specific DNA endonuclease
(I-ChuI). Gene. 129:69–76.
Courties C, Vaquer A, Troussellier M, Lautier J, Chretiennot-
Dinet MJ, Neveux J, Machado C, Claustre H. 1994. Smallest
eukaryotic organism. Nature. 370:255.
646 Turmel et al.
by guest on February 20, 2013
de Cambiaire J-C, Otis C, Lemieux C, Turmel M. 2006. The
complete chloroplast genome sequence of the chlorophycean
green alga Scenedesmus obliquus reveals a compact gene
organization and a biased distribution of genes on the two
DNA strands. BMC Evol Biol. 6:37.
de Cambiaire J-C, Otis C, Lemieux C, Turmel M. 2007. The
chloroplast genome sequence of the green alga Leptosira
terrestris: multiple losses of the inverted repeat and extensive
genome rearrangements within the Trebouxiophyceae. BMC
Derelle E, Ferraz C, Rombauts S, et al. (26 co-authors). 2006.
Genome analysis of the smallest free-living eukaryote Ostreo-
coccus tauri unveils many unique features. Proc Natl Acad Sci
Edgar RC. 2004. MUSCLE: multiple sequence alignment with
high accuracy and high throughput. Nucleic Acids Res.
Farris JS. 1977. Phylogenetic analysis under Dollo’s Law. Syst
Fawley MW, Yun Y, Qin M. 2000. Phylogenetic analyses of 18S
rDNA sequences reveal a new coccoid lineage of the
Prasinophyceae (Chlorophyta). J Phycol. 36:387–393.
Goulding SE, Olmstead RG, Morden CW, Wolfe KH. 1996. Ebb
and flow of the chloroplast inverted repeat. Mol Gen Genet.
Guillou L, Eikrem W, Chre ´tiennot-Dinet M-J, Le Gall F,
Massana R, Romari K, Pedro ´s-Alio ´ C, Vaulot D. 2004.
Diversity of picoplanktonic prasinophytes assessed by direct
nuclear SSU rDNA sequencing of environmental samples and
novel isolates retrieved from oceanic and coastal marine
ecosystems. Protist. 155:193–214.
Hallick RB, Hong L, Drager RG, Favreau MR, Monfort A,
Orsat B, Spielmann A, Stutz E. 1993. Complete sequence of
Euglena gracilis chloroplast DNA. Nucleic Acids Res.
Hamby RK, Zimmer EA. 1991. Ribosomal RNA as a phyloge-
netic tool in plant systematics. In: Soltis P, Soltis D, Doyle J,
editors. Molecular systematics in plants. New York: Rout-
ledge, Chapman and Hall. p. 50–91.
Haugen P, Bhattacharya D, Palmer JD, Turner S, Lewis LA,
Pryer KM. 2007. Cyanobacterial ribosomal RNA genes with
multiple, endonuclease-encoding group I introns. BMC Evol
Heath TA, Hedtke SM, Hillis DM. 2008. Taxon sampling and the
accuracy of phylogenetic analyses. J Syst Evol. 46:239–257.
Ishida K, Cao Y, Hasegawa M, Okada N, Hara Y. 1997. The
origin of chlorarachniophyte plastids, as inferred from
phylogenetic comparisons of amino acid sequences of EF-
Tu. J Mol Evol. 45:682–687.
Jansen RK, Cai Z, Raubeson LA, et al. (16 co-authors). 2007.
Analysis of 81 genes from 64 plastid genomes resolves
relationships in angiosperms and identifies genome-scale
Jobb G, von Haeseler A, Strimmer K. 2004. TREEFINDER:
a powerful graphical analysis environment for molecular
phylogenetics. BMC Evol Biol. 4:18.
Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW,
Pearlman RE, Roger AJ, Gray MW. 2005. The tree of
eukaryotes. Trends Ecol Evol. 20:670–676.
culture of oceanic ultraphytoplankton. J Phycol. 23:633–638.
Khan H, Parks N, Kozera C, Curtis BA, Parsons BJ, Bowman S,
Archibald JM. 2007. Plastid genome sequence of the
cryptophyte alga Rhodomonas salina CCMP1319: lateral
transfer of putative DNA replication machinery and a test of
chromist plastid phylogeny. Mol Biol Evol. 24:1832–1842.
Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J,
Lambowitz AM, Zimmerly S. 2004. Mobile group II introns.
Annu Rev Genet. 38:1–35.
taxonomic groups in Prasinophyceae. J Phycol. 40:1149–1155.
Lemieux C, Otis C, Turmel M. 2000. Ancestral chloroplast
genome in Mesostigma viride reveals an early branch of green
plant evolution. Nature. 403:649–652.
Lemieux C, Otis C, Turmel M. 2007. A clade uniting the green
algae Mesostigma viride and Chlorokybus atmophyticus
represents the deepest branch of the Streptophyta in
chloroplast genome-based phylogenies. BMC Biol. 5:2.
Lewis LA, McCourt RM. 2004. Green algae and the origin of
land plants. Am J Bot. 91:1535–1556.
Lucas P, Otis C, Mercier J-P, Turmel M, Lemieux C. 2001.
Rapid evolution of the DNA-binding site in LAGLIDADG
homing endonucleases. Nucleic Acids Res. 29:960–969.
and character evolution. Sunderland (MA): Sinauer Associates.
Manton I. 1967. Electron microscopical observations on a clone
of Monomastix Scherffel in culture. Nova Hedwigia. 14:1–11.
Marin B, Melkonian M. 1999. Mesostigmatophyceae, a new
class of streptophyte green algae revealed by SSU rRNA
sequence comparisons. Protist. 150:399–417.
Mattox KR, Stewart KD. 1984. Classification of the green algae:
a concept based on comparative cytology. In: Irvine DEG,
John DM, editors. The systematics of the green algae.
London: Academic Press. p. 29–72.
Maul JE, Lilly JW, Cui L, dePamphilis CW, Miller W,
Harris EH, Stern DB. 2002. The Chlamydomonas reinhardtii
plastid chromosome: islands of genes in a sea of repeats. Plant
and ultrastructural evaluation of the taxonomic position of
Glaucosphaera vacuolata Korsch. New Phytol. 86:39–44.
Melkonian M. 1984. Flagellar apparatus ultrastructure in relation
to green algal classification. In: Irvine DEG, John DM, editors.
The systematics of the green algae. London: Academic Press.
Melkonian M. 1990. Phylum Chlorophyta. Class Prasinophy-
ceae. In: Margulis L, Corliss JO, Melkonian M, Chapman DJ,
editors. Handbook of protoctista. The structure, cultivation,
habitats and life histories of the eukaryotic microorganisms
and their descendants exclusive of animals, plants and fungi.
Boston: Jones and Bartlett Publishers. p. 600–607.
Michel F, Umesono K, Ozeki H. 1989. Comparative and
functional anatomy of group II catalytic introns – a review.
Michel F, Westhof E. 1990. Modelling of the three-dimensional
architecture of group I catalytic introns based on comparative
sequence analysis. J Mol Biol. 216:585–610.
Moestrup O, Inouye I, Hori T. 2003. Ultrastructural studies on
Cymbomonas tetramitiformis (Prasinophyceae). I. General struc-
ture, scale microstructure, and ontogeny. Can J Bot. 81:657–671.
Moestrup O, Thomsen HA. 1974. An ultrastructural study of the
flagellate Pyramimonas orientalis with particular emphasis on
golgi apparatus activity and the flagellar apparatus. Proto-
Moestrup O, Throndsen J. 1988. Light and electron microscop-
ical studies on Pseudoscourfieldia marina a primitive scaly
green flagellate prasinophyceae with posterior flagella. Can J
Nakayama T, Marin B, Kranz HD, Surek B, Huss VAR, Inouye I,
Melkonian M. 1998. The basal position of scaly green
flagellates among the green algae (Chlorophyta) is revealed by
Analysis of three Prasinophyte Chloroplast Genomes647
by guest on February 20, 2013
analyses of nuclear-encoded SSU rRNA sequences. Protist. Download full-text
Nakayama T, Suda S, Kawachi M, Inouye I. 2007. Phylogeny
and ultrastructure of Nephroselmis and Pseudoscourfieldia
(Chlorophyta), including the description of Nephroselmis
anterostigmatica sp. nov. and a proposal for the Nephrosel-
midales ord. nov. Phycologia. 46:680–697.
O’Kelly CJ. 1992. Flagellar apparatus architecture and the
phylogeny of ‘‘green algae’’: chlorophytes, euglenoids,
glaucophytes. In: Menzel D, editor. The cytoskeleton of the
algae. Boca Raton: CRC Press. p. 315–345.
Odom OW, Shenkenberg DL, Garcia JA, Herrin DL. 2004.
A horizontally acquired group II intron in the chloroplast psbA
gene of a psychrophilic Chlamydomonas: in vitro self-splicing
Oudot-Le Secq M-P, Grimwood J, Shapiro H, Armbrust EV,
Bowler C, Green BR. 2007. Chloroplast genomes of the
diatoms Phaeodactylum tricornutum and Thalassiosira pseu-
donana: comparison with other plastid genomes of the red
lineage. Mol Genet Genomics. 277:427–439.
Palenik B, Grimwood J, Aerts A, et al. 2007. The tiny eukaryote
Ostreococcus provides genomic insights into the paradox of
Palmer JD. 1991. Plastid chromosomes: structure and evolution.
In: Bogorad L, Vasil K, editors. The molecular biology of
plastids. San Diego: Academic Press. p. 5–53.
DNA sequence of the green alga Oltmannsiellopsis viridis
reveals a distinctive quadripartite architecture in the chloroplast
genome of early diverging ulvophytes. BMC Biol. 4:3.
Pombert J-F, Otis C, Lemieux C, Turmel M. 2005. The
chloroplast genome sequence of the green alga Pseudendo-
clonium akinetum (Ulvophyceae) reveals unusual structural
features and new insights into the branching order of
chlorophyte lineages. Mol Biol Evol. 22:1903–1918.
Proschold T, Leliaert F. 2007. Systematics of the green algae:
conflict of classic and modern approaches. In: Brodie J, Lewis
J, editors. Unravelling the algae: the past, present, and future
of algal systematics. Boca Raton: CRC Press, Taylor &
Francis. p. 123–153.
Qiu YL, Li LB, Wang B, et al. (21 co-authors). 2006. The
deepest divergences in land plants inferred from phyloge-
nomic evidence. Proc Natl Acad Sci USA. 103:15511–15516.
Raubeson LA, Jansen RK. 2005. Chloroplast genomes of plants.
In: Henry RJ, editor. Plant diversity and evolution: genotypic
and phenotypic variation in higher plants. Wallingford: CABI
Publishing. p. 45–68.
Robbens S, Derelle E, Ferraz C, Wuyts J, Moreau H, Van de
Peer Y. 2007. The complete chloroplast and mitochondrial
DNA sequence of Ostreococcus tauri: organelle genomes of
the smallest eukaryote are examples of compaction. Mol Biol
Rodriguez-Ezpeleta N, Philippe H, Brinkmann H, Becker B,
Melkonian M. 2007. Phylogenetic analyses of nuclear,
mitochondrial, and plastid multigene data sets support the
placement of Mesostigma in the Streptophyta. Mol Biol Evol.
Rogalski M, Karcher D, Bock R. 2008. Superwobbling facilitates
translation with reduced tRNA sets. Nat Struct Mol Biol.
Rogers MB, Gilson PR, Su V, McFadden GI, Keeling PJ. 2007.
The complete chloroplast genome of the chlorarachniophyte
Bigelowiella natans: evidence for independent origins of
chlorarachniophyte and euglenid secondary endosymbionts.
Mol Biol Evol. 24:54–62.
Rokas A. 2006. Genomics and the tree of life. Science.
Ronquist F, Huelsenbeck JP. 2003. MrBayes 3: Bayesian
phylogenetic inference under mixed models. Bioinformatics.
Sandaa RA, Heldal M, Castberg T, Thyrhaug R, Bratbak G.
2001. Isolation and characterization of two viruses with large
genome size infecting Chrysochromulina ericina (Prymne-
siophyceae) and Pyramimonas orientalis (Prasinophyceae).
to a chloroplast genome. Nucleic Acids Res. 32:803–810.
Steinkotter J, Bhattacharya D, Semmelroth I, Bibeau C,
Melkonian M. 1994. Prasinophytes form independent lineages
within the Chlorophyta: evidence from ribosomal RNA
sequence comparisons. J Phycol. 30:340–345.
Swofford DL. 2003. PAUP*. Phylogenetic analysis using
parsimony (*and other methods). Version 4. Sunderland
(MA): Sinauer Associates.
Sym SD, Pienaar RN. 1993. The class Prasinophyceae. In: Round
FE, Chapman DJ, editors. Progress in phycological research.
Bristol: Biopress Ltd. p. 281–376.
Takahashi F, Okabe Y, Nakada T, Sekimoto H, Ito M,
Kataoka H, Nozaki H. 2007. Origins of the secondary plastids
of Euglenophyta and Chlorarachniophyta as revealed by an
analysis of the plastid-targeting, nuclear-encoded gene psbO.
J Phycol. 43:1302–1309.
Triemer R, Farmer M. 2007. A decade of euglenoid molecular
phylogenetics. In: Brodie J, Lewis J, editors. Unravelling the
algae: the past, present, and future of algal systematics. Boca
Raton: CRC Press, Taylor & Francis. p. 315–330.
Turmel M, Brouard JS, Gagnon C, Otis C, Lemieux C. 2008.
Deep division in the Chlorophyceae (Chlorophyta) revealed
by chloroplast phylogenomic analyses. J Phycol. 44:739–750.
Turmel M, Lemieux C, Burger G, Lang BF, Otis C, Plante I,
Gray MW. 1999a. The complete mitochondrial DNA
sequences of Nephroselmis olivacea and Pedinomonas minor:
two radically different evolutionary patterns within green
algae. Plant Cell. 11:1717–1729.
Turmel M, Otis C, Lemieux C. 1999b. The complete chloroplast
DNA sequence of the green alga Nephroselmis olivacea:
insights into the architecture of ancestral chloroplast genomes.
Proc Natl Acad Sci USA. 96:10248–10253.
Turmel M, Otis C, Lemieux C. 2005. The complete chloroplast
DNA sequences of the charophycean green algae Staurastrum
and Zygnema reveal that the chloroplast genome underwent
extensive changes during the evolution of the Zygnematales.
BMC Biol. 3:22.
Turmel M, Otis C, Lemieux C. 2006. The chloroplast genome
algal relatives of land plants. Mol Biol Evol. 23:1324–1338.
Wakasugi T, Nagai T, Kapoor M, et al. (15 co-authors). 1997.
Complete nucleotide sequence of the chloroplast genome from
the green alga Chlorella vulgaris: the existence of genes
possibly involved in chloroplast division. Proc Natl Acad Sci
White TJ, Bruns T, Lee S, Taylor J. 1990. Amplification and
direct sequencing of fungal ribosomal RNA genes for
phylogenetics. In: Innis MA, Gelfand DH, Sninsky JJ, White
TJ, editors. PCR protocols: a guide to methods and
applications. San Diego: Academic Press. p. 315–322.
Wolf PG, Karol KG, Mandoli DF, et al. 2005. The first complete
chloroplast genome sequence of a lycophyte, Huperzia
lucidula (Lycopodiaceae). Gene. 350:117–128.
Martin Embley, Associate Editor
Accepted December 8, 2008
648Turmel et al.
by guest on February 20, 2013