ArticlePDF Available

Contrasted histories of organelle and nuclear genomes underlying physiological diversification in a grass species

The Royal Society
Proceedings of the Royal Society B
Authors:

Abstract and Figures

C4 photosynthesis evolved multiple times independently in angiosperms, but most origins are relatively old so that the early events linked to photo- synthetic diversification are blurred. The grass Alloteropsis semialata is an exception, as this species encompasses C4 and non-C4 populations. Using phylogenomics and population genomics, we infer the history of dispersal and secondary gene flow before, during and after photosynthetic divergence in A. semialata. We further analyse the genome composition of individuals with varied ploidy levels to establish the origins of polyploids in this species. Detailed organelle phylogenies indicate limited seed dispersal within the mountainous region of origin and the emergence of a C4 lineage after disper- sal to warmer areas of lower elevation. Nuclear genome analyses highlight repeated secondary gene flow. In particular, the nuclear genome associated with the C4 phenotype was swept into a distantly related maternal lineage probably via unidirectional pollen flow. Multiple intraspecific allopoly- ploidy events mediated additional secondary genetic exchanges between photosynthetic types. Overall, our results show that limited dispersal and isolation allowed lineage divergence, with photosynthetic innovation happening after migration to new environments, and pollen-mediated gene flow led to the rapid spread of the derived C4 physiology away from its region of origin.
Content may be subject to copyright.
royalsocietypublishing.org/journal/rspb
Research
Cite this article: Bianconi ME et al. 2020
Contrasted histories of organelle and nuclear
genomes underlying physiological
diversification in a grass species. Proc. R. Soc. B
287: 20201960.
http://dx.doi.org/10.1098/rspb.2020.1960
Received: 14 August 2020
Accepted: 20 October 2020
Subject Category:
Evolution
Subject Areas:
evolution, genomics, plant science
Keywords:
admixture, C
4
photosynthesis,
miombo woodlands, phylogenomics,
phylogeography, polyploidy
Author for correspondence:
Pascal-Antoine Christin
e-mail: p.christin@sheffield.ac.uk
Present address: Institut Botànic de Barcelona
(IBB, CSIC-Ajuntament de Barcelona), Passeig del
Migdia s.n., 08038 Barcelona, Catalonia, Spain.
Lancaster Environment Centre, Lancaster
University, Lancaster LA1 4YQ, UK.
Section for GeoGenetics, GLOBE Institute,
University of Copenhagen, Øster Farimagsgade 5,
building 7, 1353 København K, Denmark.
Electronic supplementary material is available
online at https://doi.org/10.6084/m9.figshare.
c.5195276.
Contrasted histories of organelle and
nuclear genomes underlying physiological
diversification in a grass species
Matheus E. Bianconi1, Luke T. Dunning1, Emma V. Curran1, Oriane Hidalgo2,,
Robyn F. Powell2, Sahr Mian2, Ilia J. Leitch2, Marjorie R. Lundgren1,,
Sophie Manzi3, Maria S. Vorontsova2, Guillaume Besnard3, Colin P. Osborne1,
Jill K. Olofsson1,¶ and Pascal-Antoine Christin1
1
Department of Animal and Plant Sciences, University of Sheffield, Western Bank, Sheffield S10 2TN, UK
2
Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AB, UK
3
Laboratoire Evolution and Diversité Biologique (EDB UMR5174), Université de Toulouse III Paul Sabatier,
CNRS, IRD, 118 route de Narbonne, 31062 Toulouse, France
MEB, 0000-0002-1585-5947; IJL, 0000-0002-3837-8186; CPO, 0000-0002-7423-3718;
P-AC, 0000-0001-6292-8734
C
4
photosynthesis evolved multiple times independently in angiosperms,
but most origins are relatively old so that the early events linked to photo-
synthetic diversification are blurred. The grass Alloteropsis semialata is an
exception, as this species encompasses C
4
and non-C
4
populations. Using
phylogenomics and population genomics, we infer the history of dispersal
and secondary gene flow before, during and after photosynthetic divergence
in A. semialata. We further analyse the genome composition of individuals
with varied ploidy levels to establish the origins of polyploids in this species.
Detailed organelle phylogenies indicate limited seed dispersal within the
mountainous region of origin and the emergence of a C
4
lineage after disper-
sal to warmer areas of lower elevation. Nuclear genome analyses highlight
repeated secondary gene flow. In particular, the nuclear genome associated
with the C
4
phenotype was swept into a distantly related maternal lineage
probably via unidirectional pollen flow. Multiple intraspecific allopoly-
ploidy events mediated additional secondary genetic exchanges between
photosynthetic types. Overall, our results show that limited dispersal and
isolation allowed lineage divergence, with photosynthetic innovation
happening after migration to new environments, and pollen-mediated
gene flow led to the rapid spread of the derived C
4
physiology away from
its region of origin.
1. Introduction
Most terrestrial plants assimilate carbon using the ancestral C
3
photosynthetic
metabolism, but the efficiency of this pathway decreases in conditions that
limit CO
2
availability within the leaf, such as warm, arid and saline habitats
[1]. The derived C
4
metabolism boosts productivity in such conditions via the
concerted action of numerous enzymes and anatomical features that con-
centrate CO
2
at the site of photosynthetic carbon fixation [1]. Nowadays, C
4
plants, particularly those belonging to the grass and sedge families, dominate
tropical grasslands and savannahs, which they have shaped via feedback
with herbivores and fire [2,3]. The C
4
metabolism evolved multiple times inde-
pendently over the past 30 Myr [1], and retracing the eco-evolutionary
dynamics linked to photosynthetic transitions is difficult for old C
4
lineages.
However, a few lineages evolved the C
4
trait relatively recently, offering tract-
able systems to study the events leading to C
4
evolution.
© 2020 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution
License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original
author and source are credited.
The grass Alloteropsis semialata is the only species known
to have genotypes with distinct photosynthetic pathways [4].
C
4
accessions are distributed across the palaeotropics, while
C
3
individuals are restricted to southern Africa (electronic
supplementary material, figure S1 [5]). In addition, individuals
performing a weak C
4
pathway (C
3
+C
4
individuals [6]) occur
in parts of Tanzania and Zambia [5], in the plant biogeographic
region referred to as Zambezian[7] or Central Zambezian
[8], which is associated with miombo woodlands [9]. Analyses
of plastid genomes have suggested that the species originated
in this region, and one lineage associated with the C
3
type
then migrated to southern Africa, while a C
4
lineage disper-
sed across Africa and Asia/Oceania [10]. However, these
previous phylogenetic trees had limited resolution, and more
data are needed to firmly establish the exact origin of the
different lineages.
The different photosynthetic types of A. semialata are
associated with distinct ploidy levels in South Africa, but both
C
4
and non-C
4
diploids exist in other parts of Africa [4,10],
and nuclear genome analyses have found evidence of genetic
exchanges between lineages with different photosynthetic
types [11]. In addition, previously reported discrepancies
between mitochondrial and plastid genomes [12] might reflect
the footprint of intraspecific allopolyploidization, as previously
suggested based on cytological analyses [13]. However, the
history of nuclear exchanges and their effect on the spread
of different photosynthetic types through ecological and
geographical spaces remain to be formally established.
In this study, we analyse the organelle and nuclear genomes
of 69 accessions of A. semialata (plus six congeners) from
28 countries, covering the known species range and photo-
synthetic diversity, to establish the order of seed-mediated
range expansion and subsequent pollen-mediated admixture
of nuclear genomes. Organelle phylogenetic trees are used to
(i) identify the geographical and ecological origins of the
species and its subgroups, with a special focus on the C
4
group. Analyses of nuclear genomes are then used to (ii) estab-
lish the history of secondary genetic exchanges and their impact
on the sorting of photosynthetic types. Finally, genome size
estimates coupled with phylogenomics and population
genomics approaches are used to (iii) identify the origins of
polyploids and their relationship to photosynthetic divergence.
Our detailed genome biogeography analyses shed new light on
the historical factors that lead to functional diversity within a
single species.
2. Materials and methods
(a) Sampling, sequencing and data filtering
Whole-genome sequencing data for 26 accessions of A. semialata
and one of A. paniculata were sequenced here and added to 48
accessions retrieved from previous studies (electronic supple-
mentary material, table S1). For herbarium samples, genomic
DNA (gDNA) was isolated using the BioSprint 15 DNA Plant
Kit (Qiagen). Libraries were prepared with 22157 ng of gDNA
using the Illumina TruSeq Nano DNA LT Sample Prep kit
(Illumina, San Diego, CA, USA). Each sample was sequenced at
the GenoToul-GeT-PlaGE platform (Toulouse, France) as paired-
end reads on 1/24th of an Illumina HiSeq3000 lane. For fresh or
silica gel dried leaves, gDNA was isolated using the Plant
DNeasy Extraction kit (Qiagen). Libraries were constructed by
the respective sequencing centres and paired-end sequenced on
full, 1/6th or 1/12th lane of an Illumina HiSeq2500 at the Sheffield
Diagnostic Genetics Service (UK) and the Edinburgh Genomics
facility (UK; electronic supplementary material, table S1). The
expected sequencing depth ranged from 0.6 to 58.7 × (median =
4.9×; electronic supplementary material, table S1). Raw Illumina
datasets were filtered before analysis using the NGSQC Toolkit
v. 2.3.3 [14] to remove reads with less than 80% of the bases with
Phred score above 20, reads containing ambiguous bases or
adaptor contamination. The retained reads were further trimmed
from the 30end to remove bases with Phred score below 20.
The quality of the filtered datasets was assessed using FastQC v.
0.11.9 [15].
(b) Genome sizing and carbon isotope analyses
Genome sizes of A. semialata accessions were retrieved from
previous studies or estimated by flow cytometry (electronic sup-
plementary material, table S1) following the one-step protocol of
[16] with minor modifications [17]. For fresh samples either the
Ebihara or GPB with 3% PVP nuclei isolation buffer was used,
whereas the CyStain PI Oxprotect buffer (Sysmex, Germany)
was used for silica-dried material. As internal calibration stan-
dards, Petroselinum crispum Champion Moss Curled(4.5 pg/
2C) or Oryza sativa IR36 (1.0 pg/2C) were used for diploids,
while Pisum sativum Ctirad(9.09 pg/2C) was used for accessions
with C-values more than three times larger than any diploid. All
samples were analysed on a Sysmex Partec Cyflow SL3 flow
cytometer fitted with a 100 mW green solid-state laser (Cobalt
Samba, Sweden). Individuals with known chromosome numbers
and genome sizes [10] were used to assign ploidy levels based
on genome size estimates.
Photosynthetic types were established based on carbon iso-
tope ratios retrieved from previous studies or measured here as
previously described [10] (electronic supplementary material,
table S1). All individuals with values below 17were classi-
fied as non-C
4
. These were further distinguished between C
3
and C
3
+C
4
using previous anatomical and physiological data,
and/or expression levels of C
4
pathway genes, where available
(electronic supplementary material, table S1).
(c) Assembly of organelle genomes and molecular
dating
Plastid and mitochondrial genome sequences were assembled
here using a reference-guided approach, except for 15 plastid gen-
omes retrieved from previous studies (electronic supplementary
material, table S1). The reference dataset consisted of the chromo-
some-level nuclear, mitochondrial and plastid genomes of an
Australian individual of A. semialata (AUS1-01 [10,18]). Paired-
end genomic reads were mapped to this reference using Bowtie2
v. 2.3.5 [19] with default parameters. Variant sites from the reads
uniquely mapped to each organelle were incorporated into
a majority consensus sequence using the mpileup function of
Samtools v. 1.9 [20] implemented in a bash-scripted pipeline [12].
Only sites covered by more than five times the expected sequencing
depth of the nuclear genome (electronic supplementary material,
table S1) were called, discarding potential organelle-nuclear
transfers. This approach produced sequences that were already
aligned to the organelle references. The plastid alignment was
manually combined with the 15 previous sequences using Aliview
v. 1.17.1 [21], after the latter were aligned using MAFFT v. 7.427
[22], and the second inverted repeat was removed. Both organelle
alignments were then trimmed to remove sites covered by less
than 90% of individuals using trimAl v. 1.4 [23]. The resulting
plastid and mitochondrial alignment lengths were 84 587 bp and
139 916 bp, respectively (available on Dryad: https://dx.doi.org/
10.5061/dryad.zs7h44j6v [24]).
A time-calibrated phylogeny was obtained independently on
plastid and mitochondrial alignments using BEAST v. 1.8.4 [25].
royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 287: 20201960
2
The median ages estimated in [10] were used for secondary cali-
bration of the genus Alloteropsis (11.46 Ma for the crown node,
and 8.075 Ma for the split between A. angusta and A. semialata),
using a normal distribution with standard deviation of 0.0001.
While secondary calibrations are imperfect and can result in
younger estimates and underestimated uncertainties [26], they
are the only option in the absence of fossils of Alloteropsis and
provide accurate relative ages and indicative absolute ages.
The GTR + G + I substitution model was used, with a lognormal
uncorrelated relaxed clock and a constant population size
coalescent tree prior. Two analyses were run in parallel for
300 000 000 generations using the CIPRES Science Gateway 3.3,
with sampling every 20 000 generations. Convergence of runs
and effective sample sizes greater than 100 were confirmed
using Tracer v. 1.6 [27]. Median ages of trees sampled after a
burn-in period of 10% (plastid) and 25% (mitochondria) were
mapped on the maximum clade credibility tree.
(d) Phylogenetic analyses of the nuclear genome
Genome-wide nuclear markers were assembled using the genomic
data and combined into a multigene coalescent phylogeny, which
can identify different histories among genes. A total of 7408 single-
copy orthologues of Panicoideae were identified from the genomes
of A. semialata [18], Setaria italica,Panicum hallii and Sorghum bicolor
(from Phytozome v. 13 [28]) using OrthoFinder v. 2.3.3 [29].
Coding sequences of A. semialata were extracted and used as refer-
ences to assemble genes from all Alloteropsis accessions with the
approach described above for organelle genomes, except that
reads were mapped as unpaired to avoid discordant pairs where
mates mapped to non-exonic sequences. Sites covered by less
than 70% of individuals were trimmed using trimAl, and individ-
ual sequences shorter than 200 bp after trimming were discarded.
Only trimmed alignments longer than 500 bp and with taxon
occupancy greater than 95% were retained. A maximum likeli-
hood tree was then inferred on each of the 3553 retained
alignments using RAxML v. 8.2.4 [30], with a GTR + CAT substi-
tution model and 100 bootstrap pseudoreplicates. Gene trees
were summarized into a multigene coalescent phylogeny using
Astral v. 5.6.2 [31] after collapsing branches with bootstrap support
values below 30. The analysis was repeated with only confirmed
diploid individuals, and with all individuals but less stringent
missing data thresholds; (i) non-trimmed alignments, and (ii)
alignments trimmed to remove sites covered by less than 30% of
individuals.
Using a similar approach to [32], we evaluated the probability
of observing the organelle topology solely based on incomplete
lineage sorting. A total of 100000 gene trees were simulated in
Hybrid-Lambda [33], using the nuclear coalescent phylogeny for
diploid accessions, with terminal branches assigned an arbitrary
length of one and extended to make the tree ultrametric, and
branch lengths multiplied by two to reflect the smaller effective
population size of organelles in monoecious species [34]. Simu-
lations were performed using default parameters and repeated
using various combinations of mutation rates (from 2.5 × 10
5
to
10
4
) and population sizes (from 100 to 50 000).
(e) Genetic structure
Principal component and individual-based admixture analyses
were performed on reads mapped to the whole nuclear
genome. Reads were sorted and indexed using Samtools v. 1.9,
and duplicates were removed using the function MarkDuplicates
from Picard tools v. 2.13.2 (http://broadinstitute.github.io/
picard/). Genotype likelihoods were estimated using ANGSD
v. 0.929 [35]. Sites covered by 70% or more of individuals and
with mapping and base quality scores of 30 or above were
retained, resulting in 11 439 variable sites. A covariance matrix
was estimated from the genotype likelihoods using PCAngsd
v. 0.98 [36]. Eigenvector decomposition was carried out using
the eigen function in R v. 3.4.4 to recover the principal com-
ponents of genetic variation. Individual admixture proportions
were estimated from genotype likelihoods using a maximum
likelihood approach with NGSadmix v. 32 [37]. NGSadmix was
run with numbers of ancestral populations (K) ranging from 1
to 10, with five replicates each and random starting seeds. The
Evanno method [38], as implemented in CLUMPAK [39], ident-
ified the value of Kthat best describes the uppermost level of
structure.
(f) Genome composition
The multigene coalescence approach implemented above allows
for different histories of loci, but not for different histories of
alleles at each locus, as expected in diploid hybrids or allopoly-
ploids. Likewise, the population genomics tools used above are
not tailored for mixed ploidy datasets [40], and do not explicitly
consider known ploidy differences. We consequently established
the history of polyploids with phylogenetic trees of phased
alleles obtained with a newly developed pipeline. To reduce
risks of paralogy, we considered only near-universally single-
copy orthologues in land plants according to BUSCO v. 3.0.2
[41]. Out of 1202 such genes identified in A. semialata, 473 had
at least one exon longer than the average insert size of the
reads used here. The longest exon of each of these genes was
used. As a first step to verify that the genes were single-copy
and orthologous, we considered only the 26 individuals of
A. semialata that were diploid, including a F1 hybrid between
C
3
(nuclear clade I) and C
4
(nuclear clade IV) parents (electronic
supplementary material, table S1). These individuals were
sequenced as 250 bp paired-end reads (insert size = 550 bp)
with an expected sequencing depth between 4.7 and 58.8, and
we added four congeners (three A. angusta and one A. cimicina)
sequenced at similar depth to serve as outgroups. Reads were
mapped to the reference genome of A. semialata using Bowtie2
with default parameters, except the insert size (-X) that was
increased to 1100 bp. Mapped reads were then phased for the
473 exons using the phase function of Samtools v. 0.1.19,
where bases with quality below 20 were removed during hetero-
zygous calling (option -Q20), and reads with ambiguous phasing
were discarded (option -A). Sequences were then generated for
both alleles using custom bash scripts in which different depth
filters were applied for resequencing and high coverage (20× or
above) datasets (at least 3 and 10 reads covering each position
and at least 2 and 3 reads covering each variant for polymorphic
sites, respectively). Phased sequences shorter than 200 bp were
discarded. We only retained genes that: (i) included at least
one sequence for each of the two congeners; (ii) had at least
four sequences in each of the four nuclear clades of A. semialata
(see Results); and (iii) had at least 30 sequences in total (50% of
the possible maximum). A maximum likelihood tree was inferred
for each of the 300 genes that matched these criteria using
PhyML v. 20120412 [42] with a GTR + G + I model. The resulting
trees were rooted on A. cimicina and processed with custom
scripts to identify those genes for which one allele of the F1
hybrid was nested within the C
3
clade I and one nested within
the C
4
clade IV. The 120 genes not fulfilling this criterion were
discarded, as they might represent insufficiently variable genes
or include Alloteropsis-specific paralogs. The remaining 180 phy-
logenetically informative genes were retained for downstream
analysis.
Because phasing is difficult with polyploids, we incorporated
each pair of reads from the polyploids into the phylogenetic tree
containing the diploid phased alleles and assigned them to the
clade in which they were positioned. These analyses were
conducted on five hexa- and dodecaploids of A. semialata sequenced
as 250-bp paired-end reads (electronic supplementary material,
royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 287: 20201960
3
table S1). Each pair of reads that fully overlapped with one of
the 180 exons, as determined using BEDTools v. 2.24.0 [43]
with the function intersect(option -f 1.0), was separately added
to the respective gene alignment using the MAFFT function
‘–add. The paired reads were then merged, and each align-
ment was trimmed to remove non-overlapping regions (max.
alignment length = 500 bp). Individual sequences shorter than
250 bp were subsequently discarded. A maximum likelihood tree
was then inferred as described above, and the read pair was
assigned to the nuclear clade in which it was nested. Read assign-
ments were not used in cases in which the C
3
and C
4
alleles from
the F1 hybrid were not correctly placed as a consequence of align-
ment trimming, or when the sister group of the reads was
composed of multiple lineages. The analyses were later repeated
with each diploid used as the focus individual, in which case the
phased alleles from the focal individual were removed from
the reference dataset.
3. Results
(a) Genome sizes
Out of 38 samples with genome sizes available, 31 were within
25% of the range previously reported for A. semialata diploids
(1.782.77 pg/2C; electronic supplementary material, table S1).
One individual from Australia is possibly a tetraploid (3.65 pg/
2C), while three from Zambia and one from Mozambique have
genome sizes suggesting hexaploidy (5.35 and 6.71 pg/2C), as
previously reported for some South African populations
[10,13]. Finally, individuals from one Cameroonian population
had genome sizes suggesting dodecaploids (11.87 pg/2C;
electronic supplementary material, table S1). All polyploids
detected so far in A. semialata have carbon isotopic signatures
of C
4
plants (electronic supplementary material, table S1).
(b) Time-calibrated organelle phylogenies
The organelle phylogenetic trees recovered the seven
major lineages reported in previous studies, as well as a
clear incongruence between the two organelles (figure 1
and electronic supplementary material, figure S2 [1012]).
Indeed, six individuals, including one hexaploid and
one dodecaploid, from Cameroon, Democratic Republic of
Congo (DRC) and Zambia form a monophyletic group
within plastid lineage DE, but form a paraphyletic group
within mitochondrial lineage FG (electronic supplementary
material, figure S2).
The ages estimated on the mitochondrial and plastid
genomes were overall similar, and discrepancies can result
from the use of secondary calibrations. The first split within
A. semialata, which separates lineage FG from the Central
Zambezian region of Africa (in DRC, Zambia and Tanzania)
and the rest of the species (lineage ABCDE), was estimated at
2.8/1.8 Ma on the mitochondrial/plastid trees (95% HPD =
1.54.3/1.22.6). Within the FG group, accessions from DRC
diverge first, and accessions from the south of Tanzania are
nested within a paraphyletic Zambian clade (figure 1 and
electronic supplementary material, figure S2). Within the
ABCDE group, the separation between the non-C
4
group ABC
and the exclusively C
4
lineage DE is estimated at 2/1.7 Ma.
Accessions from the B and C clades are spread across northern
areas of the Central Zambezian region (Burundi, DRC, and Tan-
zania), with Zambian accessions derived from within part of
clade B (figure 1 and electronic supplementary material,
figure S2). Within clade A covering southern Africa, early diver-
gence fromaccessions from Mozambique and Zimbabwe likely
represents the footprint of a gradual migration to South Africa
between 1.4 and 0.3 Ma (figure 1 and electronic supplementary
material, figure S2).
Rift Valley lakes
semi-deserts
tropical rain forest
miombo woodlands
Central Zambezian miombo
Asia–Oceania
photosynthetic type (PT)
ploidy
non–C4
C3+C4
C3
C4
2x 4x 6x 12x
0
PT
ploidy
(Ma)1234
A
B
D
C
E
G
F
Figure 1. Origin and dispersal of Alloteropsis semialata in Africa. The time-calibrated phylogenetic tree based on mitochondrial genomes is shown, with letters on
nodes (AG) indicating the organelle lineages (see electronic supplementary material, figure S2 for details). White dots indicate nodes with posterior probabilities of
0.95 or above and grey bars represent 95% HPD intervals around estimated ages. For each sample, the photosynthetic type is indicated with a coloured square and
the ploidy level by the number of black dots. All sampled African populations are shown on the map, with circles coloured based on the group indicated on the right
of the phylogeny. Arrows indicate putative dispersal events. (Online version in colour.)
royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 287: 20201960
4
Within the C
4
organelle lineage D, accessions from Asia,
Oceania and Madagascar are sister to a sample from Angola
(figure 1 and electronic supplementary material, figure S2).
The sister lineage E contains a subgroup spread east of the Cen-
tral Zambezian region (Tanzania, Malawi and Mozambique)
and South Africa, while the other subgroup contains one acces-
sion from Ethiopia that is sister to samples spread from Kenya
to Sierra Leone with very little divergence (figure 1 and elec-
tronic supplementary material, figure S2). The six individuals
with discordant mitochondria and plastids are placed as
sister to this clade E in the plastid phylogeny.
(c) Nuclear phylogeny
A multigene coalescent phylogeny was estimated based on
3553 nuclear markers. The four nuclear clades previously
defined within A. semialata [11] were retrieved with high sup-
port (figure 2), and similar relationships were obtained with
different thresholds for missing data (electronic supplementary
material, figure S3), and when only diploid samples were
included (electronic supplementary material, figure S4). The
monophyly of nuclear clade I, which corresponds to organelle
lineage A and is associated with C
3
photosynthesis, is sup-
ported here by almost all quartet trees. A lower proportion
of quartet trees (75%) support the monophyly of nuclear
clade II, which contains non-C
4
accessions from the Central
Zambezian region (C
3
+C
4
; organelle lineages B and C). This
proportion is even lower for each of the nuclear clades III
and IV (66%), which contain the C
4
accessions (figure 2).
However, the two alternative topologies occur at similar fre-
quencies at the base of each of the C
4
clades (figure 2), which
is compatible with incomplete lineage sorting. The relation-
ships among the four clades vary, with the C
3
+C
4
clade II
placed either as sister to clade III + IV (figure 2) or to clade I
(electronic supplementary material, figures S3 and S4), in
both cases with a similar number of quartets supporting the
alternative topology.
Five accessions are unplacedat the base of C
4
nuclear clades
III plus IV with low quartet support values (figure 2 and elec-
tronic supplementary material, figure S3). These include two
C
4
Cameroonian accessions (one of which is a dodecaploid)
and another three accessions (two C
4
and one non-C
4
)from
DRC and Tanzania, which are admixed between clades II and
III [11]. The exclusively C
4
nuclear clade III is formed of acces-
sions from mitochondrial lineage FG (including those placed
within plastid lineage E) that are restri cted to th eCentral Zambe-
zian region. Nuclear clade IV contains all C
4
individuals from
mitochondrial lineage DE, but with relationships that differ
from the organelle genomes. The first splits within clade IV
lead to South African hexaploids, one Mozambican hexaploid
accession and one Angolan accession, while most accessions
cluster in one of two sister groups; one composed of all other
African accessions (including Madagascar) and one composed
exclusively of Asian and Oceanian accessions ( figure 2 and
electronic supplementary material, figures S3 and S4).
In our simulations of gene trees based on the nuclear
coalescent phylogeny, only 5% of trees where the four nuclear
clades were monophyletic mirrored the organellar topology
(i.e. clade III sister to clades I + II + IV). Similar results were
obtained across the range of parameter values explored
here (coefficient of variation = 0.31.1%).
(d) Population structure and genome composition
A principal component analysis grouped individuals largely
according to their nuclear phylogenetic relationships (electro-
nic supplementary material, figure S5). Admixture analyses
identified four and seven clusters as good fits for the data,
and again retrieved groups that match the nuclear phylogeny
(electronic supplementary material, figure S6). The five
unplaced individuals were positioned in between clades in
the principal component analysis (electronic supplementary
material, figure S5) and showed mixed ancestry (electronic
supplementary material figure S6).
APAN_MAD1
1
III
II
I
IV
Central Zambezian region
C4
C3+C4
C3
non-C4
ploidy
6x
x4x21
photosynthetic type
TAN3
MOZ2
ZIM1504-61
ACIM_MAD1
ACIM_MAD2
APAN_MOZ1
AANG_COD1
AANG_UGA1
RSA8-06
RSA1
RSA2
RSA2365-1
RSA5-03
RSA6-09
BDI1
TAN2-01
TAN5
TAN1-04A
DRC5
TAN1602-04
ZAM1507-19
DRC7
ZAM1705-03
ZAM1503-03
ZAM1723-05
DRC2
DRC1
CMR1601-02
CMR1
DRC6
DRC4
DRC3
ZAM1507-04
ZAM1506-01
ZAM1503-05
ZAM1505-10
ZAM1946-19
ZAM1946-10
TAN1603-01
TAN1603-08
ZAM1701-17
TAN4-O
RSA4-01
RSA3-01
TAN6
MOZ1601-04
MOZ1
ANG1
MAD1-03
MAW1
ETH1
KEN1
UGA1
NGR1
SLE1
GHA1
BUR1-02
THA1
IND1
MYA1
CHN1
TPE1-10
PHI1601-01
SRL1702-03
PNG1
INA1
AUS2-01
AUS1633-03
AUS1633-02
AUS1604-03
AUS1604-02
AUS1616-04
AUS1616-03
AUS1-01
2x
A. cimicina
A. paniculata
A. angusta
Cameroon
Angola
Madagascar
Eastern
Africa
Western
Africa
Southern
Africa
Southern
Africa
Southeastern
Asia
Oceania
A. semialata
Figure 2. Nuclear history of Alloteropsis. The multigene coalescent species tree was estimated from 3553 genome-wide nuclear markers. Pie charts, magnified for
key nodes, indicate the proportion of quartet trees that support the main (dark grey), first (pale blue) and second (light grey) alternative topologies. Dashed-line pie
charts indicate nodes with local posterior probability below 0.95. Branch lengths are in coalescent units, except the terminal branches, which are arbitrary. Roman
numbers IIV denote the four main nuclear clades of A. semialata, which are indicated with coloured shades. Major geographical regions are indicated. (Online
version in colour.)
royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 287: 20201960
5
The genome composition analysis showed that the vast
majority of reads from the Asian/Oceanian individuals
(more than 90%) were assigned to the C
4
nuclear clade IV
(figure 3; electronic supplementary material, table S2),
confirming the expectations based on the multigene coalescent
phylogeny. Low levels of assignment to other nuclear clades
might represent incomplete lineage sorting or methodological
noise. Similar high levels of assignments to the expected group
were observed in most individuals from C
3
clade I and C
3
+C
4
clade II, but this number dropped to 82% in some individuals
from clade II and to 78% in the Zimbabwean sample from clade
I (figure 3; electronic supplementary material, table S2). The
proportion of reads of C
4
diploids from Africa assigned to
the expected clade (either III and IV) was as low as 68%, with
up to 18% and 11% of reads assigned to the other C
4
clade
and to the non-C
4
clade II, respectively. The ancestry of the
polyploid individuals differed between geographical regions.
In particular, the reads from the dodecaploid individual from
Cameroon were almost equally spread among the C
4
clades
III and IV and the non-C
4
clade II (figure 3; electronic
supplementary material, table S2).
4. Discussion
(a) Limited seed dispersal in the region of origin
The organelle genomes, which are mostly maternally inher-
ited, track the history of seed dispersal in plants. In
A. semialata, four organelle lineages capturing the earliest
splits within the species (B, C, F and G) are restricted to the
Central Zambezian miombo woodlands dominated by
Brachystegia and Julbernardia trees (figure 1 and electronic
supplementary material, figure S2) [9]. Brachystegia was
already present in eastern Africa during the late Oligocene
[44], and throughout the Miocene [45], thus the first divergence
within A. semialata (23 Ma) likely happened in this biome.
The extent of miombo woodlands varied with glaciation
cycles [46,47], and restrictions to dispersal during glacial
maxima might have driven the vicariance of organelle lineages
FG and ABCDE in the Pleistocene, as previously reported
for other taxa occurring across the Great Rift System [48].
Indeed, the order of splits within group FG is compatible
with an origin west of the Rift Valley lakes, while the well-
supported relationships within lineages B and C support
their origin to the east of the lakes (figure 1). The present-day
co-occurrence of FG and BC organelle groups probably follows
a migration beyond their refugia after the re-expansion of
miombo woodlands (figure 1) [47,49,50].
Despite having originated about 2 Ma, lineages FG and BC
still occur within a relatively small geographical region in
central/eastern Africa. The visible geographical structure of
each of these lineages in the organelle phylogenies (figure 1)
further supports limited seed dispersal. These two lineages
occur in the wet miombo that occupies the mountains separ-
ating the Zambezi and Congo basins. Variations in elevation
coupled with relatively dense tree cover might limit seed dis-
persal for this species with seeds spread mainly by gravity.
By contrast, the lineages that escaped this centre of origin
IIIIII IV
Asia/Oceania
Central Zambezian region
South
Africa
Africa
Zimbabwe Cameroon
RSA5–03
RSA8–06
RSA6–09
RSA2365–1
ZIM1504–61
TAN1–04B
TAN1602–04
ZAM1503–03
ZAM1507–19
TAN 2 –0 1
TAN 1– 04 A
ZAM1705–03
ZAM1723–05
ZAM1505–10
TAN1603–01
TAN1603–08
ZAM1701–17
CMR1601–02
RSA3–01
BUR1–02
MAD1–03
MOZ1601–04
RSA4–01
AUS1–01
TPE1–10
PHIL1601–01
CHN1
AUS2–01
SRL1702–03
ZAM1506–01
unplaced
photosynthetic type
non-C4
C3
C4
ploidy
12x
6x
C3+C4
2x
Figure 3. Genomic composition of Alloteropsis semialata. The proportion of alleles of single-copy exons assigned to each of the four nuclear clades of A. semialata,
numbered and coloured as in figure 2, are represented by colours in bars (see electronic supplementary material, table S2). (Online version in colour.)
royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 287: 20201960
6
bear the footprint of a rapid geographical spread. Over the last
million years, lineage A migrated to the south of Africa, and the
ancestor of lineage DE reached the lowland surrounding
the wet Central Zambezian miombo from where it rapidly
spread around the world (figure 1). The rapid migration of line-
age DE was facilitated by the broader niche conferred by its C
4
photosynthetic type [10], but the concurrent dispersal of C
3
lineage A suggests that corridors of low elevation east, north
and south of the Central Zambezian region, coupled with
open grasslands and savannahs in these regions, facilitated
the long-distance spread of A. semialata seeds outside the
Central Zambezian region.
(b) Widespread pollen flow and sweep of the C
4
nuclear genome
The organelle lineages are loosely associated with distinct
nuclear groups (figures 1 and 2), indicating that the split
of seed-transported organelle lineages was accompanied by a
reduction of nuclear exchanges. However, the nuclear structure
is less marked and numerous discrepancies between nuclear
and organelle phylogenies indicate secondary genetic exchanges
mediated by pollen. Such cytoplasmic-nuclear discordances are
widespread in plants and animals [51,52], and have revealed
complex patterns of lineage diversification [5355].
The organelle phylogenies consistently identify two dis-
tinct C
4
groups (FG and DE; figures 1, 2 and electronic
supplementary material, figure S2), while all C
4
accessions
are monophyletic in nuclear analyses (figure 2 [11,18]). Our
simulations show that such patterns are unlikely to result
solely from incomplete lineage sorting. Instead, the sister
group relationship between nuclear clades III and IV, which
are associated with divergent organelles, suggests the swamp-
ing of one nuclear genome lineage by the other (figure 4). The
directionality of this exchange is unknown, but repeated,
unidirectional gene flow mediated by pollen must have
occurred in a region where only monoparentally inherited
organelles persisted [56], as previously reported for other
taxa [57]. One of these organelle lineages originates from the
Central Zambezian highlands (lineage FG), while the other
originates from the lowlands of eastern Africa (lineage DE;
figure 1). Differences in elevation along with predominantly
easterly winds could have restricted organelle transport via
seed migration but favoured nuclear gene flow via pollen
movements from the lowlands to the west. Given that C
4
genes are encoded by the nuclear genome, it is tempting to
hypothesize that an efficient C
4
trait evolved following
migration to lower elevation, where higher temperatures
increased selective pressures for photosynthetic transitions.
Pollen flow would then have brought the C
4
pathway to popu-
lations from the highlands, where the sweep would have been
mediated by selection for the derived pathway with its broader
ecological niche [10,58]. The marked incongruence between
organelle and nuclear phylogenies thus indicates that nuclear
genes encoding the C
4
trait were rapidly spread to other
habitats by hijacking seeds of the same species.
(c) Recurrent hybridization and polyploidization
The comparison of nuclear genomes suggests episodic hybrid-
ization between the different lineages. Similar proportions of
quartet trees place the C
3
+C
4
clade II as sister to the C
3
clade I and the C
4
clades III + IV (figure 2 and electronic
supplementary material, figures S3 and S4), which is not pre-
dicted with only incomplete lineage sorting [59]. Instead, the
patterns point to an ancient episode of hybridization that
might have brought some genes adapted for the C
4
pathway
into a different genomic background (figure 4). Indeed, some
genes for C
4
enzymes group C
3
+C
4
and C
4
individuals
[6,11], highlighting the importance of hybridization for photo-
synthetic diversification [60]. After this ancient introgression,
the C
4
and C
3
+C
4
types evolved mostly independently despite
their close geographical proximity, but African C
4
from both
clades III and IV possess alleles that group with C
3
+C
4
individuals (figure 3), pointing to recurrent gene flow.
Besides polyploids occurring in South Africa [4,10,13],
we report here, for the first time, hexaploids from Zambia
and Mozambique, and a dodecaploid from Cameroon (electronic
supplementary material, table S1). The genomic compositions
differ among the polyploids, which are placed in different parts
of the organelle phylogenies ( figure 3 and electronic supplemen-
tary material, figure S2). These patterns suggest a minimum of
three independent polyploidization events; (i) admixture
between C
4
nuclear clades III and IV leading to hexaploids
from Mozambique and South Africa, (ii) contributions of nuclear
clades II, III and IV leading to the dodecaploids from Cameroon,
and (iii) contribution from nuclear clades II and IV into nuclear
clade III leading to Zambian hexaploids. The organelle
phylogenies suggest that polyploids might have arisen multiple
times in Zambia (electronic supplementary material, figure S2),
although secondary gene flow (e.g. via tetraploids) might also
explain these patterns. Our results, therefore, indicate multiple
admixture events between three nuclear lineages, with and
without polyploidization.
5. Concluding remarks
We use phylogenomics of organelle and nuclear genomes to
obtain a detailed picture of the phylogeographic history of the
grass A. semialata, which is the only known species to encompass
C
3
,C
3
+C
4
and C
4
populations. The phylogenetic trees of
organelle genomes, which are generally maternally inherited,
indicate that seed dispersal is limited in the Central Zambezian
region where A. semialata originated. One organelle lineage
left this region via the adjacent lowlands, where warmer tempera-
tures might have created the selective impetus for the emergence
of C
4
photosynthesis. Seed dispersal was strongly accelerated out-
side of the Central Zambezian region, which likely reflects
landscape differences. This led to the rapid spread of the organelle
nuclear history
seed history
C4
BCA
DE
FG
III IV II I
Figure 4. Putative history of the C
4
nuclear genome of Alloteropsis semialata.
The C
4
nuclear genome is shown in orange, on top of the seed history. Gene
flow between lineages is indicated by horizontal connections. (Online version
in colour.)
royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 287: 20201960
7
lineage associated with the newly emerged C
4
trait and, to a
lesser degree, of a distinct organelle lineage that migrated to
southern Africa. Importantly, the patterns of nuclear genome
variation indicate that pollen-mediated transport of biparen-
tally transmitted genes occurred over longer distances and
recurrently among organelle lineages. This process allowed
episodic genetic exchanges between photosynthetic types, in
some cases via intraspecific allopolyploidization. In particular,
our study reveals an unprecedented instance of unidirectional
gene flow from a C
4
to a non-C
4
genome. We conclude that
pollen-mediated exchanges of nuclear genes between geo-
graphically isolated lineages allowed the rapid spread of the
novel C
4
trait into populations inhabiting distant regions.
Data accessibility. Sequencing data were deposited in the NCBI database
under BioProject PRJNA666779. Organelle genome alignments are
available from the Dryad Digital Repository (https://doi.org/10.
5061/dryad.zs7h44j6v) [24].
Authorscontributions. M.E.B., L.T.D., J.K.O., C.P.O. and P.-A.C. designed
the study. M.E.B. did the phylogenetic analyses. L.T.D. did the allele
analyses. E.V.C. did the population genomics analyses. J.K.O., S.M.
and G.B. generated sequence data. O.H., R.P., S.M. and I.L. generated
the genome sizes. M.R.L. did isotope analyses. M.S.V. contributed
samples. M.E.B. and P.A.C. wrote the manuscript with the help of all
co-authors.
Competing interests. We declare that we have no competing interests.
Funding. This study was funded by the European Research Council
(grant no. ERC-2014-STG-638333), the Royal Society (grant no. RGF\
EA\181050) and has benefited from Investissements dAvenir grants
managed by the Agence Nationale de la Recherche (CEBA, ref. ANR-
10-LABX-25-01 and TULIP, ref. ANR-10-LABX-41). Edinburgh Geno-
mics, which contributed to the sequencing, is partly supported
through core grants from the NERC (grant no. R8/H10/ 56), MRC
(grant no. MR/K001744/1) and BBSRC (grant no. BB/ J004243/1).
P.A.C. is funded by a Royal Society University Research Fellowship
(grant no. URF\R\180022).
Acknowledgements. We thank Pauline Raimondeau for laboratory
support.
References
1. Sage RF, Sage TL, Kocacinar F. 2012 Photorespiration
and the evolution of C
4
photosynthesis. Annu. Rev.
Plant Biol. 63,1947. (doi:10.1146/annurev-
arplant-042811-105511)
2. Sage RF, Stata M. 2015 Photosynthetic diversity
meets biodiversity: the C
4
plant example. J. Plant
Physiol. 172, 104119. (doi:10.1016/j.jplph.2014.
07.024)
3. Lehmann CER et al. 2019 Functional diversification
enabled grassy biomes to fill global climate space.
bioRxiv 583625. (doi:10.1101/583625)
4. Ellis RP. 1981 Relevance of comparative leaf
anatomy in taxonomic and functional research on
the South African Poaceae. DSc thesis, University of
Pretoria, Pretoria, South Africa.
5. Lundgren MR et al. 2016 Evolutionary implications
of C
3
C
4
intermediates in the grass Alloteropsis
semialata.Plant Cell Environ. 39, 18741885.
(doi:10.1111/pce.12665)
6. Dunning LT, Lundgren MR, Moreno-Villena JJ,
Namaganda M, Edwards EJ, Nosil P, Osborne CP,
Christin P-A. 2017 Introgression and repeated co-
option facilitated the recurrent emergence of C
4
photosynthesis among close relatives. Evolution 71,
15411555. (doi:10.1111/evo.13250)
7. Linder HP, de Klerk HM, Born J, Burgess ND, Fjeldså
J, Rahbek C. 2012 The partitioning of Africa:
statistically defined biogeographical regions in sub-
Saharan Africa. J. Biogeogr. 39, 11891205. (doi:10.
1111/j.1365-2699.2012.02728.x)
8. Droissart V et al. 2018 Beyond trees:
biogeographical regionalization of tropical Africa.
J. Biogeogr. 35, 11531167. (doi:10.1111/jbi.13190)
9. Burgess N, Hales JD, Underwood E, Dinerstein E,
Olson D, Itoua I, Schipper J, Ricketts T, Newman K.
2004 Terrestrial ecoregions of Africa and
Madagascar: a conservation assessment.
Washington, DC: Island Press.
10. Lundgren MR et al. 2015 Photosynthetic innovation
broadens the niche within a single species. Ecol.
Lett. 18, 10211029. (doi:10.1111/ele.12484)
11. Olofsson JK et al. 2016 Genome biogeography
reveals the intraspecific spread of adaptive
mutations for a complex trait. Mol. Ecol. 25,
61076123. (doi:10.1111/mec.13914)
12. Olofsson JK et al. 2019 Population-specific selection
on standing variation generated by lateral gene
transfers in a grass. Curr. Biol. 29, 39213927.
(doi:10.1016/j.cub.2019.09.023)
13. Liebenberg EJL, Fossey A. 2001 Comparative
cytogenetic investigation of the two subspecies of
the grass Alloteropsis semialata (Poaceae).
Bot. J. Linn. Soc. 137, 243248. (doi:10.1111/j.
1095-8339.2001.tb01120.x)
14. Patel RK, Jain M. 2012 NGS QC toolkit: a toolkit for
quality control of next generation sequencing data.
PLoS ONE 7, e30619. (doi:10.1371/journal.pone.
0030619)
15. Andrews S. 2010 FastQC: a quality control tool for
high throughput sequence data. See http://www.
bioinformatics.babraham.ac.uk/projects/fastqc/.
16. Doležel J, Greilhuber J, Suda J. 2007 Estimation of
nuclear DNA content in plants using flow cytometry. Nat.
Protoc. 2, 22332244. (doi:10.1038/nprot.2007.310)
17. Clark J et al. 2016 Genome evolution of ferns:
evidence for relative stasis of genome size across
the fern phylogeny. New Phytol. 210, 10721082.
(doi:10.1111/nph.13833)
18. Dunning LT et al. 2019 Lateral transfers of large
DNA fragments spread functional genes among
grasses. Proc. Natl Acad. Sci. USA 116, 44164425.
(doi:10.1073/pnas.1810031116)
19. Langmead B, Salzberg SL. 2012 Fast gapped-read
alignment with Bowtie 2. Nat. Methods 9,
357359. (doi:10.1038/nmeth.1923)
20. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J,
Homer N, Marth G, Abecasis G, Durbin R. 2009 The
sequence alignment/Map format and SAMtools.
Bioinformatics 25, 20782079. (doi:10.1093/
bioinformatics/btp352)
21. Larsson A. 2014 AliView: a fast and lightweight
alignment viewer and editor for large datasets.
Bioinformatics 30, 32763278. (doi:10.1093/
bioinformatics/btu531)
22. Katoh K, Standley DM. 2013 MAFFT multiple
sequence alignment software version 7:
improvements in performance and usability.
Mol. Biol. Evol. 30,772780. (doi:10.1093/molbev/
mst010)
23. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T.
2009 trimAl: a tool for automated alignment
trimming in large-scale phylogenetic analyses.
Bioinformatics 25, 19721973. (doi:10.1093/
bioinformatics/btp348)
24. Bianconi ME et al. 2020 Data from: Contrasted
histories of organelle and nuclear genomes
underlying physiological diversification in a grass
species. Dryad Digital Repository. (doi:10.5061/
dryad.zs7h44j6v)
25. Drummond AJ, Rambaut A. 2007 BEAST: Bayesian
evolutionary analysis by sampling trees. BMC Evol.
Biol. 7, 214. (doi:10.1186/1471-2148-7-214)
26. Schenk JJ. 2016 Consequences of secondary
calibrations on divergence time estimates. PLoS ONE
11, e0148228. (doi:10.1371/journal.pone.0148228)
27. Rambaut A, Suchard MA, Xie W, Drummond AJ.
2013 Tracer v1.6. See http://tree.bio.ed.ac.uk/
software/tracer/
28. Goodstein DM et al. 2012 Phytozome: a comparative
platform for green plant genomics. Nucleic
Acids Res. 40, D1178D1186. (doi:10.1093/nar/
gkr944)
29. Emms DM, Kelly S. 2019 OrthoFinder: phylogenetic
orthology inference for comparative genomics.
Genome Biol. 20, 238. (doi:10.1186/s13059-019-
1832-y)
30. Stamatakis A. 2014 RAxML version 8: a tool for
phylogenetic analysis and post-analysis of large
phylogenies. Bioinformatics 30, 13121313. (doi:10.
1093/bioinformatics/btu033)
31. Zhang C, Rabiee M, Sayyari E, Mirarab S. 2018
ASTRAL-III: polynomial time species tree
reconstruction from partially resolved gene trees.
royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 287: 20201960
8
BMC Bioinf. 19, 153. (doi:10.1186/s12859-018-
2129-y)
32. Karimi N, Grover CE, Gallagher JP, Wendel JF, Ané C,
Baum DA. 2020 Reticulate evolution helps explain
apparent homoplasy in floral biology and
pollination in baobabs (Adansonia; Bombacoideae;
Malvaceae). Syst. Biol. 69, 462478. (doi:10.1093/
sysbio/syz073)
33. Zhu S, Degnan JH, Goldstien SJ, Eldon B. 2015
Hybrid-Lambda: simulation of multiple merger and
Kingman gene genealogies in species networks and
species trees. BMC Bioinf. 16, 292. (doi:10.1186/
s12859-015-0721-y)
34. Birky CW, Maruyama T, Fuerst P. 1983 An approach
to population and evolutionary genetic theory for
genes in mitochondria and chloroplasts, and some
results. Genetics 103, 513527.
35. Korneliussen TS, Albrechtsen A, Nielsen R. 2014
ANGSD: analysis of next generation sequencing
data. BMC Bioinf. 15, 356. (doi:10.1186/s12859-
014-0356-4)
36. Meisner J, Albrechtsen A. 2018 Inferring population
structure and admixture proportions in low-depth
NGS data. Genetics 210, 719731. (doi:10.1534/
genetics.118.301336)
37. Skotte L, Korneliussen TS, Albrechtsen A. 2013
Estimating individual admixture proportions from
next generation sequencing data. Genetics 195,
693702. (doi:10.1534/genetics.113.154138)
38. Evanno G, Regnaut S, Goudet J. 2005 Detecting the
number of clusters of individuals using the software
STRUCTURE: a simulation study. Mol. Ecol.
14, 26112620. (doi:10.1111/j.1365-294X.2005.
02553.x)
39. Kopelman NM, Mayzel J, Jakobsson M, Rosenberg
NA, Mayrose I. 2015 CLUMPAK: a program for
identifying clustering modes and packaging
population structure inferences across K. Mol. Ecol.
Resour. 15, 11791191. (doi:10.1111/1755-0998.
12387)
40. Meirmans PG, Liu S, Van Tienderen PH. 2018 The
analysis of polyploid genetic data. J. Hered. 109,
283296. (doi:10.1093/jhered/esy006)
41. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva
EV, Zdobnov EM. 2015 BUSCO: assessing genome
assembly and annotation completeness with single-
copy orthologs. Bioinformatics 31, 32103212.
(doi:10.1093/bioinformatics/btv351)
42. Guindon S, Dufayard JF, Lefort V, Anisimova M,
Hordijk W, Gascuel O. 2010 New algorithms and
methods to estimate maximum-likelihood
phylogenies: assessing the performance of PhyML
3.0. Syst. Biol. 59, 307321. (doi:10.1093/sysbio/
syq010)
43. Quinlan AR, Hall IM. 2010 BEDTools: a flexible suite
of utilities for comparing genomic features.
Bioinformatics 26, 841842. (doi:10.1093/
bioinformatics/btq033)
44. Vincens A, Tiercelin JJ, Buchet G. 2006 New
oligocene-early Miocene microflora from the
southwestern Turkana Basin. Palaeoenvironmental
implications in the northern Kenya Rift.
Palaeogeogr. Palaeoclimatol. Palaeoecol. 239,
470486. (doi:10.1016/j.palaeo.2006.02.007)
45. Yemane K, Robert C, Bonnefille R. 1987 Pollen and
clay mineral assemblages of a late Miocene
lacustrine sequence from the northwestern
Ethiopian highlands. Palaeogeogr. Palaeoclimatol.
Palaeoecol. 60, 123141. (doi:10.1016/0031-
0182(87)90028-9)
46. Beuning KRM, Zimmerman KA, Ivory SJ, Cohen AS.
2011 Vegetation response to glacialinterglacial
climate variability near Lake Malawi in the southern
African tropics. Palaeogeogr. Palaeoclimatol. Palaeoecol.
303,8192. (doi:10.1016/j.palaeo.2010.01.025)
47. Ivory SJ, Russell J. 2016 Climate, herbivory, and fire
controls on tropical African forest for the last 60 ka.
Quat. Sci. Rev. 148, 101114. (doi:10.1016/j.
quascirev.2016.07.015)
48. Mairal M, Sanmartín I, Herrero A, Pokorny L, Vargas
P, Aldasoro JJ, Alarcón M. 2017 Geographic barriers
and Pleistocene climate change shaped patterns of
genetic variation in the Eastern Afromontane
biodiversity hotspot. Sci. Rep. 7, 45749. (doi:10.
1038/srep45749)
49. Vincens A. 1991 Late quaternary vegetation history
of the South-Tanganyika basin. Climatic implications
in South Central Africa. Palaeogeogr. Palaeoclimatol.
Palaeoecol. 86, 207226. (doi:10.1016/0031-
0182(91)90081-2)
50. Dupont LM, Behling H, Kim JH. 2008 Thirty
thousand years of vegetation development and
climate change in Angola (Ocean Drilling Program
Site 1078). Clim. Past 4, 107124. (doi:10.5194/cp-
4-107-2008)
51. Rieseberg LH, Soltis DE. 1991 Phylogenetic
consequences of cytoplasmic gene flow in plants.
Evol. Trends Plants 5,6584.
52. Toews DPL, Brelsford A. 2012 The biogeography of
mitochondrial and nuclear discordance in animals.
Mol. Ecol. 21, 39073930. (doi:10.1111/j.1365-
294X.2012.05664.x)
53. Folk RA, Mandel JR, Reudenstein JVF. 2017
Ancestral gene flow and parallel organellar
genome capture result in extreme phylogenomic
discord in a lineage of angiosperms. Syst. Biol. 66,
320337.
54. Lee-Yaw JA, Grassa CJ, Joly S, Andrew RL, Rieseberg
LH. 2019 An evaluation of alternative explanations
for widespread cytonuclear discordance in annual
sunflowers (Helianthus). New Phytol. 221, 515526.
(doi:10.1111/nph.15386)
55. Muñoz-Rodríguez P et al. 2018 Reconciling
conflicting phylogenies in the origin of sweet
potato and dispersal to Polynesia. Curr. Biol. 28,
12461256. (doi:10.1016/j.cub.2018.03.020)
56. Ray ML, Ray XA, Dickinson DB, Grossman M. 1979 A
model for the genetic modification of wild plant
species. J. Hered. 70, 309316. (doi:10.1093/
oxfordjournals.jhered.a109264)
57. Buggs RJA, Pannell JR. 2006 Rapid displacement of
a monoecious plant lineage is due to pollen
swamping by a dioecious relative. Curr. Biol. 16,
9961000. (doi:10.1016/j.cub.2006.03.093)
58. Aagesen L, Biganzoli F, Bena J, Godoy-Bürki AC,
Reinheimer R, Zuloaga FO. 2016 Macro-climatic
distribution limits show both niche expansion
and niche specialization among C
4
Panicoids. PLoS
ONE 11, e0151075. (doi:10.1371/journal.pone.
0151075)
59. Sayyari E, Mirarab S. 2016 Fast coalescent-based
computation of local branch support from quartet
frequencies. Mol. Biol. Evol. 33, 16541668. (doi:10.
1093/molbev/msw079)
60. Kadereit G, Bohley K, Lauterbach M, Tefarikis DT,
Kadereit JW. 2017 C
3
C
4
intermediates may be of
hybrid origina reminder. New Phytol. 215,
7076. (doi:10.1111/nph.14567)
royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 287: 20201960
9
... Genome data, d 13 C values, and population genetic analyses For the genomic analyses, we compiled previously published double digest restriction-site associated DNA sequencing (ddRADSeq) data sets for Alloteropsis semialata (R. Br.) Hitchc. individuals that also had known d 13 C values from field-collected leaves measured using mass spectrometry (Lundgren et al., 2015Bianconi et al., 2020;Olofsson et al., 2021;Alenazi et al., 2023). Depending on the source of the d 13 C values, these were either single measures (Lundgren et al., 2015Bianconi et al., 2020), replicated if the d 13 C values did not match other individuals of the population and genomic group (Olofsson et al., 2021), or medians of triplicate technical replicates if sufficient material was available (Alenazi et al., 2023). ...
... individuals that also had known d 13 C values from field-collected leaves measured using mass spectrometry (Lundgren et al., 2015Bianconi et al., 2020;Olofsson et al., 2021;Alenazi et al., 2023). Depending on the source of the d 13 C values, these were either single measures (Lundgren et al., 2015Bianconi et al., 2020), replicated if the d 13 C values did not match other individuals of the population and genomic group (Olofsson et al., 2021), or medians of triplicate technical replicates if sufficient material was available (Alenazi et al., 2023). In total, the data set comprised 420 individuals collected from 87 populations across Africa and Asia (Supporting Information Table S1), representing the full range of photosynthetic types found in A. semialata (45 9 C 3 ; 132 9 C 3 +C 4 ; 243 9 C 4 ). ...
... Finally, we used whole-genome resequencing data (Bianconi et al., 2020) for 45 A. semialata individuals to determine whether the genes in the GWAS regions were evolving under positive selection. In short, the datasets were downloaded from NCBI sequence read archive and mapped to the reference genome using BOWTIE2, and consensus sequences generated using previously developed methods (Olofsson et al., 2016;Dunning et al., 2022), and a maximum likelihood phylogeny tree for each gene was inferred using RAXML (Stamatakis, 2014) with 100 bootstrap. ...
Article
Full-text available
C4 photosynthesis is a complex trait requiring multiple developmental and metabolic alterations. Despite this complexity, it has independently evolved over 60 times. However, our understanding of the transition to C4 is complicated by the fact that variation in photosynthetic type is usually segregated between species that diverged a long time ago. Here, we perform a genome‐wide association study (GWAS) using the grass Alloteropsis semialata, the only known species to have C3, intermediate, and C4 accessions that recently diverged. We aimed to identify genomic regions associated with the strength of the C4 cycle (measured using δ¹³C), and the development of C4 leaf anatomy. Genomic regions correlated with δ¹³C include regulators of C4 decarboxylation enzymes (RIPK), nonphotochemical quenching (SOQ1), and the development of Kranz anatomy (SCARECROW‐LIKE). Regions associated with the development of C4 leaf anatomy in the intermediate individuals contain additional leaf anatomy regulators, including those responsible for vein patterning (GSL8) and meristem determinacy (GIF1). The parallel recruitment of paralogous leaf anatomy regulators between A. semialata and other C4 lineages implies the co‐option of these genes is context‐dependent, which likely has implications for the engineering of the C4 trait into C3 species.
... The photosynthetic types of A. semialata correspond to distinct genetic lineages that probably evolved during the Plio-Pleistocene (Lundgren et al., 2015;Bianconi et al., 2020a) from a common ancestor that used a weak C 4 cycle (Dunning et al., 2017;Fig. 1). ...
... The initial divergence between A. semialata C 4 and non-C 4 types most likely happened in the Central Zambezian miombo woodlands of Africa, where the species originated c. 3 Ma (Lundgren et al., 2015;Bianconi et al., 2020a). The C 3 lineage (clade I) later migrated to Southern Africa and a single C 4 lineage (clade IV) spread across Africa, Madagascar, Southeast Asia, and Oceania. ...
... The Central Zambezian region remained occupied by another C 4 lineage (clade III) and by C 3 + C 4 populations (clade II). The lineages evolved largely in isolation, but repeated episodes of genetic exchange might have contributed to the expansion of the different photosynthetic types (Lundgren et al., 2015;Olofsson et al., 2016Olofsson et al., , 2021Bianconi et al., 2020a). Currently, C 4 plants overlap with C 3 plants in Southern Africa and with C 3 + C 4 ones in the Central Zambezian region, but when they appear mixed (growing close to each other), the C 4 are polyploids and the non-C 4 are diploids, and this ploidy difference probably prevents gene flow between them (Olofsson et al., 2021). ...
Article
Full-text available
C4 photosynthesis is a key innovation in land plant evolution, but its immediate effects on population demography are unclear. We explore the early impact of the C4 trait on the trajectories of C4 and non‐C4 populations of the grass Alloteropsis semialata. We combine niche models projected into paleoclimate layers for the last 5 million years with demographic models based on genomic data. The initial split between C4 and non‐C4 populations was followed by a larger expansion of the ancestral C4 population, and further diversification led to the unparalleled expansion of descendant C4 populations. Overall, C4 populations spread over three continents and achieved the highest population growth, in agreement with a broader climatic niche that rendered a large potential range over time. The C4 populations that remained in the region of origin, however, experienced lower population growth, rather consistent with local geographic constraints. Moreover, the posterior transfer of some C4‐related characters to non‐C4 counterparts might have facilitated the recent expansion of non‐C4 populations in the region of origin. Altogether, our findings support that C4 photosynthesis provided an immediate demographic advantage to A. semialata populations, but its effect might be masked by geographic contingencies.
... These genes were acquired from at least nine different donors separated by 20-40 Myr of evolution and multiple speciation events that gave rise to thousands of descendant species Hibdige et al., 2021). Alloteropsis semialata originated in tropical Africa, where divergent genetic lineages and the sister species Alloteropsis angusta still occur (Bianconi et al., 2020). Previous analyses have identified laterally acquired genes that are present in multiple Alloteropsis species, that were either acquired before the divergence of the species or that were subsequently introgressed after their speciation (Olofsson et al., 2016Dunning et al., 2019). ...
... This includes three A. semialata (R. Br.) Hitchc. assemblies that, together with the previously sequenced genome from an Australian individual (accession AUS1; Clade IV; Dunning et al., 2019), encompass the four main nuclear clades within this species ( Fig. 1; Bianconi et al., 2020): one individual from South Africa (accession RSA5-3; Clade I), one from Tanzania (accession TAN1-04B; Clade II) and one from Zambia (accession ZAM1505-10; Clade III). We also generated an assembly for A. angusta Stapf from a Ugandan accession (AANG_UGA4). ...
... In all but two cases, a single reference gene was sufficient to recover all loci in the laterally acquired clade, with two nonoverlapping annotated sequences used as references where this was not possible. We then determined the presence of each of these representative genes in whole-genome short-read datasets belonging to 45 diploid Alloteropsis accessions (including the five individuals with reference genomes; Bianconi et al., 2020), using a combination of BLASTN searches (minimum alignment length 100 bp) and phylogenetic analyses. For each of the 45 datasets, putative reads corresponding to each laterally acquired gene were identified via a BLASTN analysis with default parameters. ...
Article
Full-text available
Lateral gene transfer (LGT) is the movement of DNA between organisms without sexual reproduction. The acquired genes represent genetic novelties that have independently evolved in the donor's genome. Phylogenetic methods have shown that LGT is widespread across the entire grass family, although we know little about the underlying dynamics. We identify laterally acquired genes in five de novo reference genomes from the same grass genus (four Alloteropsis semialata and one Alloteropsis angusta). Using additional resequencing data for a further 40 Alloteropsis individuals, we place the acquisition of each gene onto a phylogeny using stochastic character mapping, and then infer rates of gains and losses. We detect 168 laterally acquired genes in the five reference genomes (32–100 per genome). Exponential decay models indicate that the rate of LGT acquisitions (6–28 per Ma) and subsequent losses (11–24% per Ma) varied significantly among lineages. Laterally acquired genes were lost at a higher rate than vertically inherited loci (0.02–0.8% per Ma). This high turnover creates intraspecific gene content variation, with a preponderance of them occurring as accessory genes in the Alloteropsis pangenome. This rapid turnover generates standing variation that can ultimately fuel local adaptation.
... The photosynthetic variation existing in A. semialata was discovered based on leaf anatomical surveys and measurements of carbon isotopes independently by Ellis (1974) and Brown (1975). The differences between photosynthetic types within this species have been repeatedly studied since then, focusing on the ecological (Ripley et al., 2007(Ripley et al., , 2010bIbrahim et al., 2008;Osborne et al., 2008;Bateman and Johnson, 2011;Lundgren et al., 2015), cytogenetic (Frean and Marks, 1988;Liebenberg and Fossey, 2001;Lundgren et al., 2015;Bianconi et al., 2020;Olofsson et al., 2021), physiological (Frean et al., 1980(Frean et al., , 1983aLundgren et al., 2016) and biochemical variation (Ueno and Sentoku, 2006;Phansopa et al., 2020), and more recently, on evolutionary and genomic aspects of C 4 photosynthesis (Ibrahim et al., 2009;Christin et al., 2012;Lundgren et al., 2015Olofsson et al., 2016Olofsson et al., , 2021Dunning et al., 2017Dunning et al., , 2019aBianconi et al., 2018Bianconi et al., , 2020Curran et al., 2022). In this review, we consolidate the knowledge accumulated on this study system. ...
... The photosynthetic variation existing in A. semialata was discovered based on leaf anatomical surveys and measurements of carbon isotopes independently by Ellis (1974) and Brown (1975). The differences between photosynthetic types within this species have been repeatedly studied since then, focusing on the ecological (Ripley et al., 2007(Ripley et al., , 2010bIbrahim et al., 2008;Osborne et al., 2008;Bateman and Johnson, 2011;Lundgren et al., 2015), cytogenetic (Frean and Marks, 1988;Liebenberg and Fossey, 2001;Lundgren et al., 2015;Bianconi et al., 2020;Olofsson et al., 2021), physiological (Frean et al., 1980(Frean et al., , 1983aLundgren et al., 2016) and biochemical variation (Ueno and Sentoku, 2006;Phansopa et al., 2020), and more recently, on evolutionary and genomic aspects of C 4 photosynthesis (Ibrahim et al., 2009;Christin et al., 2012;Lundgren et al., 2015Olofsson et al., 2016Olofsson et al., , 2021Dunning et al., 2017Dunning et al., , 2019aBianconi et al., 2018Bianconi et al., , 2020Curran et al., 2022). In this review, we consolidate the knowledge accumulated on this study system. ...
... It has been suggested that A. papillosa is a possible hybrid, because the linear leaf-blades resemble A. semialata while the inflorescence and spikelet characters are reminiscent of A. cimicina (Clayton and Renvoize, 1982). The divergence of samples assigned to A. cimicina and A. paniculata was estimated as ~2.5 Ma (Olofsson et al., 2016;Bianconi et al., 2020), and these two species will be discussed jointly as 'A. cimicina' in this review. ...
Article
Background: Numerous groups of plants have adapted to CO2 limitations by independently evolving C4 photosynthesis. This trait relies on concerted changes in anatomy and biochemistry to concentrate CO2 within the leaf and thereby boost productivity in tropical conditions. The ecological and economical importance of C4 photosynthesis has motivated intense research, often relying on comparisons between distantly related C4 and non-C4 plants. The photosynthetic type is fixed in most species, with the notable exception of the grass Alloteropsis semialata. This species includes populations exhibiting the ancestral C3 state in southern Africa, intermediate populations in the Zambezian region and C4 populations spread around the paleotropics. Scope: We compile here the knowledge on the distribution and evolutionary history of the Alloteropsis genus as a whole and discuss how this has furthered our understanding of C4 evolution. We then present a chromosome-level reference genome for a C3 individual and compare the genomic architecture to that of a C4 accession of A. semialata. Conclusions: Alloteropsis semialata represents one of the best systems to investigate the evolution of C4 photosynthesis as the genetic and phenotypic variation provides a fertile ground for comparative and population-level studies. Preliminary comparative genomic investigations show the C3 and C4 genomes are highly syntenic, and have undergone a modest amount of gene duplication and translocation since the different photosynthetic groups diverged. The background knowledge and publicly available genomic resources make Alloteropsis semialata a great model for further comparative analyses of photosynthetic diversification.
... A notable example of introgression, potentially explained by gene flow through polyploid bridges, pertains to genes encoding the C 4 photosynthetic pathway in Alloteropsis semialata, which have been laterally acquired from the polyploid Setaria palmifolia complex, most likely in tropical Africa, where both species co-occur (46). Most of the non-C 4 individuals in A. semialata are diploids, with C 4 individuals ranging from diploid to dodecaploid levels (45,75). Although triploid hybrids have not been reported so far for A. semialata, the most plausible path through which genetic information coding for C 4 biochemistry could have introgressed into the diploid level of the species would be through compatible crosses that required the presence of a triploid intermediate, from which a cross of the type 3x × 2x would allow the information to spread down to the diploid level. ...
... An alternative hypothesis may relate the presence of C 4 diploid individuals to the process of diploidization. However, despite the fact that diploidization is a slow process measured at macroevolutionary scales (33), there is direct experimental evidence that crosses between photosynthetic types in diploid A. semialata are viable (75). That is, following introgression from polyploid S. palmifolia, information coding for C 4 biochemistry had to necessarily travel through the different ploidy levels in A. semialata to reach the diploid level, where C 4 and non-C 4 individuals are then capable of exchanging genetic material. ...
Article
Full-text available
Hybridization blurs species boundaries and leads to intertwined lineages resulting in reticulate evolution. Polyploidy, the outcome of whole genome duplication (WGD), has more recently been implicated in promoting and facilitating hybridization between polyploid species, potentially leading to adaptive introgression. However, because polyploid lineages are usually ephemeral states in the evolutionary history of life it is unclear whether WGD-potentiated hybridization has any appreciable effect on their diploid counterparts. Here, we develop a model of cytotype dynamics within mixed-ploidy populations to demonstrate that polyploidy can in fact serve as a bridge for gene flow between diploid lineages, where introgression is fully or partially hampered by the species barrier. Polyploid bridges emerge in the presence of triploid organisms, which despite critically low levels of fitness, can still allow the transfer of alleles between diploid states of independently evolving mixed-ploidy species. Notably, while marked genetic divergence prevents polyploid-mediated interspecific gene flow, we show that increased recombination rates can offset these evolutionary constraints, allowing a more efficient sorting of alleles at higher-ploidy levels before introgression into diploid gene pools. Additionally, we derive an analytical approximation for the rate of gene flow at the tetraploid level necessary to supersede introgression between diploids with nonzero introgression rates, which is especially relevant for plant species complexes, where interspecific gene flow is ubiquitous. Altogether, our results illustrate the potential impact of polyploid bridges on the (re)distribution of genetic material across ecological communities during evolution, representing a potential force behind reticulation.
... A notable example of introgression, potentially explained by gene flow through polyploid bridges, pertains to genes encoding the 4 phosynthetic pathway in Alloteropsis semialata, which have been laterally acquired from the polyploid Setaria palmifolia complex, most likely in tropical Africa, where both species co-occur(42). Most of the non-4 individuals in A. semialata are diploids, with 4 individuals ranging from diploid to dodecaploid levels(41,70). Although triploid hybrids have not been reported so far for A. semialata, the most plausible path through which genetic information coding for 4 biochemistry could have introgressed into the diploid level of the species would be through compatible crosses that required the presence of a triploid intermediate, from which a cross of the type 3 × 2 would allow the information to spread down to the diploid level. ...
... An alternative hypothesis may relate the presence of 4 diploid individuals to the process of diploidization. However, despite the fact that diploidization is a slow process measured at macroevolutionary scales(11), there is direct experimental evidence that crosses between photosynthetic types in A. semialata are viable(70). That is, following introgression from polyploid S. palmifolia, information coding for 4 biochemistry had to necessarily travel through the different ploidy levels in A. semialata to reach the diploid level, where 4 and non-4 individuals are then capable of exchanging genetic material. ...
Preprint
Full-text available
Many organisms have more than two sets of chromosomes, due to whole genome duplication (WGD), and are thus polyploid. Despite usually being an ephemeral state in the history of life, polyploidy is widely recognized as an important source of genetic novelty over macroevolutionary scales. More recently, polyploidy has also been shown to facilitate interspecific gene flow, circumventing reproductive barriers between their diploid ancestors. Yet, the implications of WGD-linked introgression on community-level evolutionary dynamics remain unknown. Here, we develop a model of cytotype dynamics within mixed-ploidy populations to demonstrate that polyploidy can in fact serve as a bridge for gene flow between diploid lineages, where introgression is fully or partially hampered by the species barrier. Polyploid bridges emerge in the presence of triploid organisms, which despite critically low levels of viability, can still allow the transfer of alleles between diploid states of independently evolving mixed-ploidy species. Notably, while marked genetic divergence prevents WGD-mediated interspecific gene flow, we show that increased recombination rates can offset these evolutionary constraints, which allows a more efficient sorting of alleles at higher-ploidy levels before introgression into diploid gene pools. Additionally, we derive an analytical approximation for the rate of gene flow at the tetraploid level necessary to supersede introgression between diploids with non-zero introgression rates, which is especially relevant for plant species complexes, where interspecific gene flow is ubiquitous. Altogether, our results illustrate the potential impact of polyploid bridges on evolutionary change within and between mixed-ploidy populations.
... In another study, genes encoding C 4 PEPCs were investigated in two phylogenetically distant C 4 species, Eleocharis baldwinii and E. vivipara (Besnard et al., 2009). Unexpectedly, C 4 PEPC genes have recently diverged between the two Eleocharis species, supporting the idea that one of them has borrowed this core C 4 gene from the other lineage by hybridization or by a horizontal transfer event, as widely reported in grasses (Bianconi et al., 2020). As E. vivipara belongs to a monotypic C 4 lineage, the C 4 PEPC genes were probably obtained from the more diverse E. baldwinii clade (Roalson et al., 2010;Larridon et al., 2021). ...
Article
Full-text available
The recently published study by Liu et al. (2024) on a high quality, chromosome‐level genome of Eleocharis vivipara provides new insight into the multiple evolution of C4 photosynthesis in Cyperaceae and in particular in Eleocharis. The species studied has the rare feature of alternately using C3 photosynthesis underwater and C4 photosynthesis on land (Ueno et al., 1988), making it an exciting model to better understand the genetic control and evolution of the C4 trait and, in particular, the evolutionary challenge to switch from C3 to C4 photosynthesis from the aquatic to the terrestrial environment. This may imply both the control of genes involved in the C4 pathway and deep cellular anatomical changes. Alternately using C3 or C4 photosynthesis may also lead to evolutionary trade‐offs (e.g., optimization of photosynthetic enzymes in contrasting C3 and C4 biochemical contexts). Maintaining C3 and C4 genes may therefore be necessary; hybridization (e.g., allopolyploidization) between non‐C4 and C4 taxa could have been involved to favor the emergence of such facultative photosynthetic strategy...
... The inclusion of the mitogenome in phylogenetic analysis has been increasingly applied in the angiosperm [22][23][24]. It is the diversity of genomic data that has brought the discordance of organelle and nuclear signalling into focus [25,26]. Cytonuclear discordance refers to the incongruence between the evolutionary histories of nuclear and cytoplasmic genomes within a species or a group of species. ...
Article
Full-text available
Background Caryodaphnopsis, a group of tropical trees (ca. 20 spp.) in the family Lauraceae, has an amphi-Pacific disjunct distribution: ten species are distributed in Southeast Asia, while eight species are restricted to tropical rainforests in South America. Previously, phylogenetic analyses using two nuclear markers resolved the relationships among the five species from Latin America. However, the phylogenetic relationships between the species in Asia remain poorly known. Results Here, we first determined the complete mitochondrial genome (mitogenome), plastome, and the nuclear ribosomal cistron (nrDNA) sequences of C. henryi with lengths of 1,168,029 bp, 154,938 bp, and 6495 bp, respectively. We found 2233 repeats and 368 potential SSRs in the mitogenome of C. henryi and 50 homologous DNA fragments between its mitogenome and plastome. Gene synteny analysis revealed a mass of rearrangements in the mitogenomes of Magnolia biondii, Hernandia nymphaeifolia, and C. henryi and only six conserved clustered genes among them. In order to reconstruct relationships for the ten Caryodaphnopsis species in Asia, we created three datasets: one for the mitogenome (coding genes and ten intergenic regions), another for the plastome (whole genome), and the other for the nuclear ribosomal cistron. All of the 22 Caryodaphnopsis individuals were divided into four, five, and six different clades in the phylogenies based on mitogenome, plastome, and nrDNA datasets, respectively. Conclusions The study showed phylogenetic conflicts within and between nuclear and organellar genome data of Caryodaphnopsis species. The sympatric Caryodaphnopsis species in Hekou and Malipo SW China may be related to the incomplete lineage sorting, chloroplast capture, and/or hybridization, which mixed the species as a complex in their evolutionary history.
... Our results suggest that Brachystegia evolved recently, as did most African geoxyles, potentially reflecting the development of fire-prone and high-precipitation savannas in Africa (Maurin et al., 2014). Such results challenge previous views on the origin of the miombo because it is usually perceived as much older than our divergence dating analysis suggested (e.g., Senut et al., 2018;Bianconi et al., 2020). Nevertheless, we have to keep in mind that miombolike vegetation sharing the same physiognomy could have existed before the Brachystegia diversification. ...
Article
Full-text available
Premise: Phylogenetic approaches can provide valuable insights on how and when a biome emerged and developed using its structuring species. In this context, Brachystegia Benth, a dominant genus of trees in miombo woodlands, appears as a key witness of the history of the largest woodland and savanna biome of Africa. Methods: We reconstructed the evolutionary history of the genus using targeted-enrichment sequencing on 60 Brachystegia specimens for a nearly complete species sampling. Phylogenomic inferences used supermatrix (RAxML‐NG) and summary-method (ASTRAL‐III) approaches. Conflicts between species and gene trees were assessed, and the phylogeny was time‐calibrated in BEAST. Introgression between species was explored using Phylonet. Results: The phylogenies were globally congruent regardless of the method used. Most of the species were recovered as monophyletic, unlike previous plastid phylogenetic reconstructions where lineages were shared among geographically close individuals independently of species identity. Still, most of the individual gene trees had low levels of phylogenetic information and, when informative, were mostly in conflict with the reconstructed species trees. These results suggest incomplete lineage sorting and/or reticulate evolution, which was supported by network analyses. The BEAST analysis supported a Pliocene origin for current Brachystegia lineages, with most of the diversification events dated to the Pliocene‐Pleistocene. Conclusions: These results suggest a recent origin of species of the miombo, congruently with their spatial expansion documented from plastid data. Brachystegia species appear to behave potentially as a syngameon, a group of interfertile but still relatively well-delineated species, an aspect that deserves further investigations.
Article
C 4 plants are expected to have faster stomatal movements than C 3 species because they tend to have smaller guard cells. However, little is known about how the evolution of C 4 photosynthesis influences stomatal dynamics in relation to guard cell size and environmental factors. We studied photosynthetically diverse populations of the grass Alloteropsis semialata , showing that the origin of C 4 photosynthesis in this species was associated with a shortening of stomatal guard and subsidiary cells. However, for a given cell size, C 4 and C 3 –C 4 intermediate individuals had similar or slower light‐induced stomatal opening speeds than C 3 individuals. Conversely, when exposed to decreasing light, stomata in C 4 plants closed as fast as those in non‐C 4 plants. Polyploid formation in some C 4 plants led to larger stomatal cells and was associated with slower stomatal opening. Conversely, diversification of C 4 diploid plants into wetter environments was associated with an acceleration of stomatal opening. Overall, there was significant relationship between light‐saturated photosynthesis and stomatal opening speed in the C 4 plants, implying that photosynthetic energy production was limiting for stomatal opening. Stomatal dynamics in this wild grass therefore arise from the evolving interplay between photosynthetic physiology and the size and biochemical function of stomatal complexes.
Article
Full-text available
Global change impacts on the Earth System are typically evaluated using biome classifications based on trees and forests. However, during the Cenozoic, many terrestrial biomes were transformed through the displacement of trees and shrubs by grasses. While grasses comprise 3% of vascular plant species, they are responsible for more than 25% of terrestrial photosynthesis. Critically, grass dominance alters ecosystem dynamics and function by introducing new ecological processes, especially surface fires and grazing. However, the large grassy component of many global biomes is often neglected in their descriptions, thereby ignoring these important ecosystem processes. Furthermore, the functional diversity of grasses in vegetation models is usually reduced to C3 and C4 photosynthetic plant functional types, omitting other relevant traits. Here, we compile available data to determine the global distribution of grassy vegetation and key traits related to grass dominance. Grassy biomes (where > 50% of the ground layer is covered by grasses) occupy almost every part of Earth’s vegetated climate space, characterising over 40% of the land surface. Major evolutionary lineages of grasses have specialised in different environments, but species from only three grass lineages occupy 88% of the land area of grassy vegetation, segregating along gradients of temperature, rainfall and fire. The environment occupied by each lineage is associated with unique plant trait combinations, including C3 and C4 photosynthesis, maximum plant height, and adaptations to fire and aridity. There is no single global climatic limit where C4 grasses replace C3 grasses. Instead this ecological transition varies biogeographically, with continental disjunctions arising through contrasting evolutionary histories.
Article
Full-text available
Here, we present a major advance of the OrthoFinder method. This extends OrthoFinder's high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted gene trees, gene duplication events, the rooted species tree, and comparative genomics statistics. Each output is benchmarked on appropriate real or simulated datasets, and where comparable methods exist, OrthoFinder is equivalent to or outperforms these methods. Furthermore, OrthoFinder is the most accurate ortholog inference method on the Quest for Orthologs benchmark test. Finally, OrthoFinder's comprehensive phylogenetic analysis is achieved with equivalent speed and scalability to the fastest, score-based heuristic methods. OrthoFinder is available at https://github.com/davidemms/OrthoFinder.
Article
Full-text available
Evidence of eukaryote-to-eukaryote lateral gene transfer (LGT) has accumulated in recent years [1-14], but the selective pressures governing the evolutionary fate of these genes within recipient species remain largely unexplored [15, 16]. Among non-parasitic plants, successful LGT has been reported between different grass species [5, 8, 11, 16-19]. Here, we use the grass Alloteropsis semialata, a species that possesses multigene LGT fragments that were acquired recently from distantly related grass species [5, 11, 16], to test the hypothesis that the successful LGT conferred an advantage and were thus rapidly swept into the recipient species. Combining whole-genome and population-level RAD sequencing, we show that the multigene LGT fragments were rapidly integrated in the recipient genome, likely due to positive selection for genes encoding proteins that added novel functions. These fragments also contained physically linked hitchhiking protein-coding genes, and subsequent genomic erosion has generated gene presence-absence polymorphisms that persist in multiple geographic locations, becoming part of the standing genetic variation. Importantly, one of the hitchhiking genes underwent a secondary rapid spread in some populations. This shows that eukaryotic LGT can have a delayed impact, contributing to local adaptation and intraspecific ecological diversification. Therefore, while short-term LGT integration is mediated by positive selection on some of the transferred genes, physically linked hitchhikers can remain functional and augment the standing genetic variation with delayed adaptive consequences.
Preprint
Full-text available
Global change impacts on the Earth System are typically evaluated using biome classifications based on trees and forests. However, during the Cenozoic, many terrestrial biomes were transformed through the displacement of trees and shrubs by grasses. While grasses comprise 3% of vascular plant species, they are responsible for more than 25% of terrestrial photosynthesis. Critically, grass dominance alters ecosystem dynamics and function by introducing new ecological processes, especially surface fires and grazing. However, the large grassy component of many global biomes is often neglected in their descriptions, thereby ignoring these important ecosystem processes. Furthermore, the functional diversity of grasses in vegetation models is usually reduced to C 3 and C 4 photosynthetic plant functional types, omitting other relevant traits. Here, we compile available data to determine the global distribution of grassy vegetation and key traits related to grass dominance. Grassy biomes (where > 50% of the ground layer is covered by grasses) occupy almost every part of Earth’s vegetated climate space, characterising over 40% of the land surface. Major evolutionary lineages of grasses have specialised in different environments, but species from only three grass lineages occupy 88% of the land area of grassy vegetation, segregating along gradients of temperature, rainfall and fire. The environment occupied by each lineage is associated with unique plant trait combinations, including C 3 and C 4 photosynthesis, maximum plant height, and adaptations to fire and aridity. There is no single global climatic limit where C 4 grasses replace C 3 grasses. Instead this ecological transition varies biogeographically, with continental disjunctions arising through contrasting evolutionary histories. Significance statement Worldviews of vegetation generally focus on trees and forests but grasses characterize the ground layer over 40% of the Earth’s vegetated land surface. This omission is important because grasses transform surface-atmosphere exchanges, biodiversity and disturbance regimes. We looked beneath the trees to produce the first global map of grass-dominated biomes. Grassy biomes occur in virtually every climate on Earth. However, three lineages of grasses are much more successful than others, characterizing 88% of the land area of grassy biomes. Each of these grass lineages evolved ecological specializations related to aridity, freezing and fire. Recognizing the extent and causes of grass dominance beneath trees is important because grassy vegetation plays vital roles in the dynamics of our biosphere and human wellbeing.
Article
Full-text available
A fundamental tenet of multicellular eukaryotic evolution is that vertical inheritance is paramount, with natural selection acting on genetic variants transferred from parents to offspring. This lineal process means that an organism’s adaptive potential can be restricted by its evolutionary history, the amount of standing genetic variation, and its mutation rate. Lateral gene transfer (LGT) theoretically provides a mechanism to bypass many of these limitations, but the evolutionary importance and frequency of this process in multicellular eukaryotes, such as plants, remains debated. We address this issue by assembling a chromosome-level genome for the grass Alloteropsis semialata , a species surmised to exhibit two LGTs, and screen it for other grass-to-grass LGTs using genomic data from 146 other grass species. Through stringent phylogenomic analyses, we discovered 57 additional LGTs in the A. semialata nuclear genome, involving at least nine different donor species. The LGTs are clustered in 23 laterally acquired genomic fragments that are up to 170 kb long and have accumulated during the diversification of Alloteropsis. The majority of the 59 LGTs in A. semialata are expressed, and we show that they have added functions to the recipient genome. Functional LGTs were further detected in the genomes of five other grass species, demonstrating that this process is likely widespread in this globally important group of plants. LGT therefore appears to represent a potent evolutionary force capable of spreading functional genes among distantly related grass species.
Article
Full-text available
Cytonuclear discordance is commonly observed in phylogenetic studies, yet few studies have tested whether these patterns reflect incomplete lineage sorting or organellar introgression. Here, we used whole‐chloroplast sequence data in combination with over 1000 nuclear single‐nucleotide polymorphisms to clarify the extent of cytonuclear discordance in wild annual sunflowers (Helianthus), and to test alternative explanations for such discordance. Our phylogenetic analyses indicate that cytonuclear discordance is widespread within this group, both in terms of the relationships among species and among individuals within species. Simulations of chloroplast evolution show that incomplete lineage sorting cannot explain these patterns in most cases. Instead, most of the observed discordance is better explained by cytoplasmic introgression. Molecular tests of evolution further indicate that selection may have played a role in driving patterns of plastid variation – although additional experimental work is needed to fully evaluate the importance of selection on organellar variants in different parts of the geographic range. Overall, this study represents one of the most comprehensive tests of the drivers of cytonuclear discordance and highlights the potential for gene flow to lead to extensive organellar introgression in hybridizing taxa.
Article
Full-text available
Background: Evolutionary histories can be discordant across the genome, and such discordances need to be considered in reconstructing the species phylogeny. ASTRAL is one of the leading methods for inferring species trees from gene trees while accounting for gene tree discordance. ASTRAL uses dynamic programming to search for the tree that shares the maximum number of quartet topologies with input gene trees, restricting itself to a predefined set of bipartitions. Results: We introduce ASTRAL-III, which substantially improves the running time of ASTRAL-II and guarantees polynomial running time as a function of both the number of species (n) and the number of genes (k). ASTRAL-III limits the bipartition constraint set (X) to grow at most linearly with n and k. Moreover, it handles polytomies more efficiently than ASTRAL-II, exploits similarities between gene trees better, and uses several techniques to avoid searching parts of the search space that are mathematically guaranteed not to include the optimal tree. The asymptotic running time of ASTRAL-III in the presence of polytomies is [Formula: see text] where D=O(nk) is the sum of degrees of all unique nodes in input trees. The running time improvements enable us to test whether contracting low support branches in gene trees improves the accuracy by reducing noise. In extensive simulations, we show that removing branches with very low support (e.g., below 10%) improves accuracy while overly aggressive filtering is harmful. We observe on a biological avian phylogenomic dataset of 14K genes that contracting low support branches greatly improve results. Conclusions: ASTRAL-III is a faster version of the ASTRAL method for phylogenetic reconstruction and can scale up to 10,000 species. With ASTRAL-III, low support branches can be removed, resulting in improved accuracy.
Article
Baobabs (Adansonia) are a cohesive group of tropical trees with a disjunct distribution in Australia, Madagascar, and continental Africa, and diverse flowers associated with two pollination modes. We used custom targeted sequence capture in conjunction with new and existing phylogenetic comparative methods to explore the evolution of floral traits and pollination systems while allowing for reticulate evolution. Our analyses suggest that relationships in Adansonia are confounded by reticulation, with network inference methods supporting at least one reticulation event. The best supported hypothesis involves introgression between A. rubrostipa and core Longitubae, both of which are hawkmoth pollinated with yellow/red flowers, but there is also some support for introgression between the African lineage and Malagasy Brevitubae, which are both mammal-pollinated with white flowers. New comparative methods for phylogenetic networks were developed that allow maximum-likelihood inference of ancestral states and were applied to study the apparent homoplasy in floral biology and pollination mode seen in Adansonia. This analysis supports a role for introgressive hybridization in morphological evolution even in a clade with highly divergent and geographically widespread species. Our new comparative methods for discrete traits on species networks are implemented in the software PhyloNetworks.
Article
Meisner and Albrechtsen present two methods for inferring population structure and admixture proportions in low depth next-generation sequencing (NGS). NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth... We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.
Article
The sweet potato is one of the world's most widely consumed crops, yet its evolutionary history is poorly understood. In this paper, we present a comprehensive phylogenetic study of all species closely related to the sweet potato and address several questions pertaining to the sweet potato that remained unanswered. Our research combined genome skimming and target DNA capture to sequence whole chloroplasts and 605 single-copy nuclear regions from 199 specimens representing the sweet potato and all of its crop wild relatives (CWRs). We present strongly supported nuclear and chloroplast phylogenies demonstrating that the sweet potato had an autopolyploid origin and that Ipomoea trifida is its closest relative, confirming that no other extant species were involved in its origin. Phylogenetic analysis of nuclear and chloroplast genomes shows conflicting topologies regarding the monophyly of the sweet potato. The process of chloroplast capture explains these conflicting patterns, showing that I. trifida had a dual role in the origin of the sweet potato, first as its progenitor and second as the species with which the sweet potato introgressed so one of its lineages could capture an I. trifida chloroplast. In addition, we provide evidence that the sweet potato was present in Polynesia in pre-human times. This, together with several other examples of long-distance dispersal in Ipomoea, negates the need to invoke ancient human-mediated transport as an explanation for its presence in Polynesia. These results have important implications for understanding the origin and evolution of a major global food crop and question the existence of pre-Columbian contacts between Polynesia and the American continent.