Access to this full-text is provided by Springer Nature.
Content available from BMC Genomics
This content is subject to copyright. Terms and conditions apply.
R E S E A R C H A R T I C L E Open Access
Genome sequence of Apostasia ramifera
provides insights into the adaptive
evolution in orchids
Weixiong Zhang
1†
, Guoqiang Zhang
2,3,4†
, Peng Zeng
1†
, Yongqiang Zhang
2,3,4,5
, Hao Hu
1
, Zhongjian Liu
5
and
Jing Cai
6*
Abstract
Background: The Orchidaceae family is one of the most diverse among flowering plants and serves as an important
research model for plant evolution, especially “evo-devo”study on floral organs. Recently, sequencing of several orchid
genomes has greatly improved our understanding of the genetic basis of orchid biology. To date, however, most
sequenced genomes are from the Epidendroideae subfamily. To better elucidate orchid evolution, greater attention
should be paid to other orchid lineages, especially basal lineages such as Apostasioideae.
Results: Here,wepresentagenomesequenceofApostasia ramifera, a terrestrial orchid species from the
Apostasioideae subfamily. The genomes of A. ramifera and other orchids were compared to explore the
genetic basis underlying orchid species richness. Genome-based population dynamics revealed a continuous
decrease in population size over the last 100 000 years in all studied orchids, although the epiphytic orchids
generally showed larger effective population size than the terrestrial orchids over most of that period. We
also found more genes of the terpene synthase gene family, resistant gene family, and LOX1/LOX5 homologs
in the epiphytic orchids.
Conclusions: This study provides new insights into the adaptive evolution of orchids. The A. ramifera genome
sequence reported here should be a helpful resource for future research on orchid biology.
Keywords: Orchidaceae, Apostasia ramifera, Comparative genomics, Adaptive evolution
Background
The Orchidaceae family is one of the largest among
flowering plants, with many species exhibiting great or-
namental value due to their colorful and distinctive
flowers. At present, there are more than 28 000 orchid
species assigned to 763 genera [1]. According to their
phylogeny, orchids can be divided into five subfamilies,
i.e., Apostasioideae, Vanilloideae, Cypripedioideae, Epi-
dendroideae, and Orchidoideae. It has been proposed
that whole-genome duplication occurred in the ancestor
of all orchid species, which contributed to their survival
under significant climatic change [2,3]. Orchids are a di-
verse and widespread family of flowering plants. Notably,
several orchid species with specialized floral structures,
such as labella and gynostemia, appear to have co-
evolved with animal pollinators to facilitate reproductive
success. In addition to their role in research on evolution
and pollination biology, orchids are invaluable to the
horticultural industry due to their elegant and distinctive
flowers [4].
© The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if
changes were made. The images or other third party material in this article are included in the article's Creative Commons
licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons
licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain
permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the
data made available in this article, unless otherwise stated in a credit line to the data.
* Correspondence: jingcai@nwpu.edu.cn
Weixiong Zhang, Guoqiang Zhang, and Peng Zeng are co-first authors
†
Weixiong Zhang, Guoqiang Zhang and Peng Zeng contributed equally to
this work.
6
School of Ecology and Environment, Northwestern Polytechnical University,
710129 Xi’an, China
Full list of author information is available at the end of the article
Zhang et al. BMC Genomics (2021) 22:536
https://doi.org/10.1186/s12864-021-07852-3
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
The genome sequences of several orchid species have
been published recently, thereby greatly improving our
understanding of orchid biology and evolution. The first
reported orchid genome (Phalaenopsis equestris) showed
evidence of an ancient whole-genome duplication event
in the orchid lineage and revealed that expansion of
MADS-box genes may be related to the diverse morph-
ology of orchid flowers [2]. The subsequent publication
of other orchid genome sequences, such as that of
Dendrobium officinale,Dendrobium catenatum,Phalae-
nopsis aphrodite,Apostasia shenzhenica, and Vanilla
planifolia, has provided data for further investigations
on the genetic mechanisms underlying orchid species
richness [3,5–8].
The Apostasioideae subfamily consists of terrestrial or-
chid species [9]. Species within Apostasioideae exhibit
various primitive traits, such as radially symmetrical
flowers and no labella, supporting the placement of this
subfamily as a sister clade to all other orchids [10]. These
primitive features are considered ancient characteristics of
the orchid lineage [10]. Thus, Apostasioideae species can
serve as an important outgroup for evolutionary study of
all other orchid subfamilies. Recently, Zhang et al. [3]pub-
lished the A. shenzhenica genome and identified an
orchid-specific whole-genome duplication event as well as
changes in the MADS-box gene family associated with dif-
ferent orchid characteristics. This is the first (and only)
genome reported for the Apostasioideae subfamily, with
most currently published genomes belonging to the Epi-
dendroideae subfamily. Obtaining genomes for other or-
chid lineages, especially basal lineages, will greatly
facilitate our understanding of orchid evolution. Here, we
performed de novo assembly and analysis of the Apostasia
ramifera genome sequence, the second Apostasia genome
after A. shenzhenica. Comparative genomics were carried
out with six other published orchid genomes to provide
insight into orchid evolution.
Results
Genome sequencing and assembly
The genomic DNA of A. ramifera was sequenced using
the Illumina Hiseq 2000 platform. Sequencing of five li-
braries with different insert sizes ranging from 250 to 5
000 bp generated more than 57 Gb of clean data, account-
ing for 156X of the genome sequence (Additional file 1,
Table S1). Based on the clean reads, we generated a
365.59-Mb long assembly with a scaffold N50 of
287.45 kb (Table 1and Additional file 1, Table S2). To as-
sess the quality of the final assembly, clean reads were
mapped to the genome sequence, resulting in a mapping
ratio of 99.7%. The completeness of the gene regions in
the assembly was examined by BUSCO (Benchmarking
Universal Single-Copy Orthologs) assessment [11]. In
total, 94.9 % (1 304/1 375) of the universal single-copy
orthologs were found in our assembly (Additional file 1,
Table S3).
Genome annotation
Using both de novo and library-based repetitive sequence
annotation, 164.49 Mb of repetitive elements were un-
covered, accounting for 44.99 % of the total assembly
(Additional file 1, Table S4). The proportion of repetitive
DNA in A. ramifera was similar to that in A. shenzhe-
nica (43.74 %) but less than that in P. equestris (62 %)
and D. catenatum (78 %). Among the repetitive se-
quences, transposable elements (TEs) were the most
abundant (43.1 %), among which long terminal repeats
(LTR) were dominant, accounting for 24.07 % of the
total genome (Additional file 1, Table S5 and Fig. S1).
The protein-coding gene models were predicted
through a combination of de novo and homology-based
annotation. In total, 22 841 putative genes were identified
in the A. ramifera genome, similar to that in A. shenzhe-
nica (21 831) but less than that in V. planifolia (28 279),
P. equestris (29 545), and D. catenatum (29 257) (Add-
itional file 1, Table S6). Further functional annotation of
the predicted genes was carried out by homology searches
against various databases, including Gene Ontology (GO),
Kyoto Encyclopedia of Genes and Genomes (KEGG),
SwissProt, TrEMBL, nr database, and InterPro. Results
showed that 19 551 (85.6 %) predicted genes could be an-
notated (Additional file 1, Table S7). In addition, we iden-
tified 40 microRNA, 616 transfer RNA, 1 450 ribosomal
RNA, and 108 small nuclear RNA genes in the A. ramifera
genome (Additional file 1,TableS8).
Synteny comparison based on gene annotations of A.
ramifera and A. shenzhenica identified 927 synteny
blocks with an average block size of 12.89 genes (Add-
itional file 1, Table S9). A total of 11 950 gene pairs were
covered by these synteny blocks, accounting for 61 and
66 % of the genome sequences of A. ramifera and A.
shenzhenica, respectively (Additional file 1, Table S9).
The high co-linearity between their genomes suggested a
close relationship between these two species.
Table 1 Statistics related to A. ramifera genome assembly
Feature Summary
Genome Size 365 588 417 bp
Scaffold N50 287 449 bp
Contig N50 30 765 bp
Longest Scaffold 1 388 560 bp
GC Rate 33.38 %
Repeat Content 44.99 %
BUSCO Assessment 94.9 %
Gene Number 22 841
Zhang et al. BMC Genomics (2021) 22:536 Page 2 of 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Gene family identification
Gene family identification was carried out for the predicted
protein-coding genes in A. ramifera, together with genes
from other species, including P. equestris,P. aphrodite,D.
officinale,D. catenatum,A. shenzhenica,V. planifolia,As-
paragus officinalis,andOryza sativa. A total of 19 422 puta-
tive genes in the A. ramifera assembly were assigned to 13
251 gene families (Fig. 1AandAdditionalfile1, Table S10).
The remaining 3 419 genes could not be grouped with other
genes and were considered orphans. Among the compared
species, 266 gene families were only shared by orchid species.
KEGG and GO enrichment analyses of those orchid-specific
gene families revealed various significantly enriched pathways
and terms, including ‘Stilbenoid, diarylheptanoid and gin-
gerol biosynthesis’(ko00945), ‘Zeatin biosynthesis’(ko00908),
‘Flavonoid biosynthesis’(ko00941), ‘Circadian rhythm - plant’
(ko04712), ‘Regulation of gene expression’(GO:0010468),
and ‘Aromatic compound biosynthetic process’(GO:
0019438) (Additional file 1,TableS11andS12).Further-
more, a total of 1 145 gene families were specifically ex-
panded in Apostasia (see Methods), and were significantly
enriched in several pathways, such as ‘Ribosome biogenesis
in eukaryotes’(ko03008), ‘mRNA surveillance pathway’
(ko03015) and ‘Plant-pathogen interaction’(ko04626) (Add-
itional file 1, Table S13 and S14).
Phylogenetic analysis
We constructed a phylogenetic tree using MrBayes with
gene sequences of 381 single copy genes shared by 16
plant species, including A. ramifera. The divergence
times among these species were estimated using PAML
MCMCTree based on our phylogeny. Results showed
that the Apostasia species separated from other orchids
82 million years ago (Fig. 1B), consistent with previously
published results [3]. The divergence time between A.
ramifera and A. shenzhenica was estimated to be 8 mil-
lion years ago (Fig. 1B). Gene family expansions and
contractions on each phylogenetic branch of the 16 spe-
cies were estimated using CAFE [12] (Fig. 1B). We fur-
ther carried out GO/KEGG enrichment analyses on the
significantly expanded gene families in A. ramifera and
found some functionally enriched pathways and terms,
including ‘Zeatin biosynthesis’(ko00908), Glyceropho-
spholipid metabolism (ko00564), ‘Flavin adenine di-
nucleotide binding’(GO:0050660), and ‘UDP-N-
acetylmuramate dehydrogenase activity’(GO:0008762)
(Additional file 1, Table S15 and S16). In addition, the
significantly contracted gene families were enriched in
‘Homologous recombination’(ko03440), ‘Glycosphingo-
lipid biosynthesis’(ko00604), ‘Transferase activity, trans-
ferring phosphorus-containing groups’(GO:0016772),
and ‘Transferase activity’(GO:0016740) (Additional file
1, Table S17 and S18).
History of orchid population size
Population size history is important for understanding
the underlying mechanisms leading to current patterns
of species and population diversity [13]. Several investi-
gations on orchid population size have been published
[14,15]. Here, the pairwise sequential Markovian coales-
cent (PSMC) model, which uses the coalescent approach
to estimate population size changes [13], was applied to
infer population size history based on the genome
381
450
1816
1728
1250
248
73
58
1039
178
116
70 704
124
178
397
67
42
94
244
101
168
114
144
106
395
386
342
648
801
8301
Ash
Ara
Dca
Peq
Vpl
AGene Families
0255075100125150175200
Asparagus officinalis
Vitis vinifera
Arabidopsis thaliana
million
y
ears a
g
o
Amborella trichopoda
Apostasia ramifera
Phoenix dactylifera
Spirodela polyrhiza
Ananas comosus
Populus trichocarpa
Dendrobium catenatum
Oryza sativa
Apostasia shenzhenica
Brachypodium distachyon
Sorghum bicolor
Phalaenopsis equestris
Musa acuminata
126
157
130
192
118
180
42
137
82
53
104
109
44
8
125
+685/-1470
+556/-3892
+616/-1363
+963/-707
+275/-832
+479/-858
+1334/-2240
+880/-582
+637/-696
+629/-869
+1219/-1323
+4069/-973
+1585/-3253
+400/-1333
+5099/-334
+1951/-1504
/
MRCA
(11,080)
+2 /-7
+7/-338
+189/-87
+3/-377
+168/-823
+280/-647
+323/-1079
+994 /-605
+126/-132
+236/-438
+324 /-201
+16/-0
+885 /-639
+514 /-144
BExpansion Contraction
Fig. 1 Gene family and phylogenetic relationship analysis. (A) Venn diagram showing distribution of shared gene families among five orchid
species, i.e., A. ramifera (Ara), A. shenzhenica (Ash), P. equestris (Peq), D. catenatum (Dca), and V. planifolia (Vpl). (B) Phylogenetic tree showing
relationship and divergence times for 16 species. Purple bars at internal nodes represent 95% confidence interval of divergence times. Numbers
of expanded and contracted gene families are presented as green and red values, respectively. MRCA, most recent common ancestor
Zhang et al. BMC Genomics (2021) 22:536 Page 3 of 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
sequences of seven orchid species, i.e., A. ramifera,A.
shenzhenica,P. equestris,P. aphrodite,D. officinale,D.
catenatum, and V. planifolia. For the Apostasia species,
population size changed between 10 000 and 250 000
years ago, with similar population dynamics (Fig. 2).
Earlier history could not be recovered because the low-
level heterozygosity of the genome sequences of A. rami-
fera and A. shenzhenica provided limited information on
ancient changes in population size. For the other or-
chids, population size histories showed similar patterns,
especially D. catenatum,D. officinale, and P. equestris
(Fig. 2). First, a period of population growth was ob-
served for each of these orchid species. Then, all orchid
populations experienced a severe contraction (bottle-
neck) over the last 100 000 years, from which they have
not recovered (Fig. 2). During the reporting period (10
000 to 250 000 years ago), the Apostasia species had the
smallest population size compared to other orchid spe-
cies. The population size of Vanilla was slightly higher
than that of Apostasia, but lower than that of all Epiden-
droideae orchids.
Gene family evolutionary analysis
MADS-box transcription factors
In plants, MADS-box transcription factors are involved
in various developmental processes, such as floral devel-
opment, flowering control, and root growth. All MADS-
box gene family members are categorized as type I or
type II based on their gene tree. Using HMMER software
and a MADS-box domain profile (PF00319), we identi-
fied 30 putative MADS-box genes in the A. ramifera
genome, fewer than that detected in the other sequenced
orchids (Additional file 1, Table S19). Phylogenetic ana-
lysis of the putative MADS-box genes revealed that 23
belonged to the type II MADS-box clade (Fig. 3A),
fewer again than that found in other orchids, e.g., A.
shenzhenica (27 members) [3], V. planifolia (30 mem-
bers, Additional file 1, Fig. S2A), P. equestris (29) [2],
and D. catenatum (35) [5]. Compared to P. equestris,
there were fewer members in the A-class, B-class, E-
class, and AGL6-class in A. ramifera and V. planifolia
(Additional file 1, Table S19). In contrast, there were
more SVP-class, ANR1-class, and AGL12-class members
in A. ramifera and V. planifolia than in P. equestris
(Additional file 1, Table S19).
Type I MADS-box transcription factors are involved in
plant reproduction and endosperm development [16].
Here, we identified seven and six type I MADS-box
genes in A. ramifera and V. planifolia, respectively
(Fig. 3B and Additional file 1, Fig. S2B and Table S19).
Phylogenetic analysis showed that genes in the Mβ-class
were absent in A. ramifera and V. planifolia, (Fig. 3B
and Additional file 1, Fig. S2B).
Terpene synthase (TPS) gene family
In plants, TPS family members are responsible for the
biosynthesis of terpenoids, which are involved in various
physiological processes in plants such as primary metab-
olism and development [17]. The architecture of the
TPS gene family is proposed to be modulated by natural
selection for adaptation to specific ecological niches
[18]. We used both terpene_synth and terpene_synth_C
domains to search for TPS genes in the orchid genomes.
A small TPS gene family size was observed in the two
Apostasia species compared with the other orchids stud-
ied (Fig. 4). Only eight and six copies of TPS genes were
found in A. shenzhenica and A. ramifera, respectively
(Fig. 4and Additional file 1, Table S20). A small TPS
family size in Apostasia may indicate a loss of chemical
0
10
20
30
40
50
60
70
10
410510
610
7
Effective population size (x104
Years (g=4, =0.5x10-8)
)
P. aphrodite
D. catenatum
A. ramifera
A. shenzhenica
P. equestris
D. officinale
V. planifolia
Fig. 2 Population size histories of seven orchid species, including P. aphrodite (yellow), D. catenatum (green), P. equestris (purple), D. officinale
(dark blue), V. planifolia (pink), A. shenzhenica (light blue) and A. ramifera (red), between 10 000 and 10 million years ago. Generation times of
orchids were assumed to be four years, and mutation rate per generation was 0.5 × 10
−8
Zhang et al. BMC Genomics (2021) 22:536 Page 4 of 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
diversity of terpenoid compounds. To resolve the phylo-
genetic relationship of TPS genes in orchids, a gene tree
was constructed using the TPS gene sequences derived
from orchids and Arabidopsis. Phylogenetic analysis
showed that four TPS subfamilies were found in Aposta-
sia (Fig. 4). In Apostasia, members of both TPS-c and
TPS-f subfamilies, which encode enzymes responsible
for the synthesis of 20-carbon diterpenes, were lost
(Fig. 4and Additional file 1, Table S20). In addition,
fewer members of TPS-a and TPS-b subfamilies were
observed in Apostasia compared with other orchids
(Fig. 4and Additional file 1, Table S20). Genes from
these two subfamilies are reportedly involved in the bio-
synthesis of 10- and 15-carbon volatile terpenoids [19],
which are the components of floral scent.
Pathogen resistance genes
Pathogen resistance-related genes are closely associated
with plant fitness and adaptive evolution [20]. Here, the
NB-ARC domain profile was used to search for Rgenes
in the predicted gene models of A. ramifera and other
orchids, including A. shenzhenica,V. planifolia,P. eques-
tris,P. aphrodite,D. catenatum, and D. officinale.We
identified 71 Rgenes in A. ramifera and 66 in A. shenz-
henica, considerably fewer than that found for P. eques-
tris (114), P. aphrodite (109), D. officinale (172), D.
catenatum (182), and V. planifolia (86) (Fig. 5). Thus,
the size of the Rgene family varied greatly among the
different Orchidaceae genera (Fig. 5).
In Apostasia, in addition to the small Rgene family
size, we also discovered lower copy numbers in both the
NAC and WRKY gene families (Fig. 5), which are known
to play important roles in plant immune response [21,
22]. We identified 55 and 64 NAC transcription factor
members in A. ramifera and A. shenzhenica, respectively,
markedly fewer than that found in Dendrobium,Phalae-
nopsis, and Vanilla (77 to 113) (Fig. 5). We also identi-
fied 56 and 50 WRKY transcription factors in A.
ramifera and A. shenzhenica, respectively, again fewer
than that found in other orchids (64 to 83) (Fig. 5).
Apostasia LOX1/LOX5 genes may contribute to lateral root
development, an important trait for terrestrial growth
LOX1 and LOX5 are involved in the development of
lateral roots in Arabidopsis,andlossofthesetwo
genes causes a significant increase in lateral root
emergence [23]. Here, we searched the homologs of
LOX1 and LOX5 in six published orchid genomes
using protein sequences from Arabidopsis as the
query, and then constructed a gene tree to elucidate
the phylogenetic relationship among these genes. We
detected multiple copies of LOX1/LOX5 homologs in
AGL70
AGL24
Ara020644
AGL104
FUL
CAL
Ara016347
Ara009011
AGL15
SHP1
FLC
AGL33
AGL21
AGL66
AGL94
Ara000690
SOC1
Ara005769
MAF2
AGL12
AGL16
Ara005184
AGL14
STK
Ara006720
Ara001077
SEP4
AGL19
AGL6
Ara003595
TT16
SEP3
FLM
AGL18
Ara011141
AGL69
SEP2
AGL67
Ara012716
AGL71
SHP2
SVP
Ara005262
SEP1
Ara005659
AGL13
AGL68
Ara003524
Ara014487
ANR1
AGL63
Ara017345
AP1
Ara018206
AGL79
AGL72
Ara016558
Ara000222
AP3
Ara009096
Ara000709
AGL65
Ara009203
PI
AGL17
AG
FYF
Ara005076
99
78
100
99
70
80
89
100
95
73
99
62
98
100
54
90
100
61
75
59
79
92
99
72
69
100
59
92
76
95
100
50
99
70
59
100
69
56
99
98
100
A
AGL47
Ara005203
AGL39
AGL48
AGL51
AGL23
AGL85
AGL96
AGL60
AGL97
AGL101
AGL81
AGL56
AGL78
AGL55
AGL35
AGL26
AGL82
AGL95
AGL36
AGL45
AGL93
AGL54
AGL58
AGL29
AGL86
AGL80
Ara021968
AGL84
PHE2
AGL61
AGL53
AGL46
AGL49
AGL59
AGL99
Ara003954
AGL52
AGL34
AGL28
AGL43
AGL57
Ara022801
AGL83
Ara020411
AGL62
AGL98
AGL75
AGL91
Ara005764
AGL73
Ara005112
AGL64
AGL74
AGL100
AGL90
AGL40
AGL102
AGL50
AGL89
AGL103
AGL92
AGL77
AGL87
PHE1
AGL76
85
90
99
57
99
79
62
74
55
100
57
66
75
82
90
56
99
83
59
99
52
97
99
79
70
83
71
94
100
100
100
90
86
100
60
100
72
99
99
86
75
97
100
95
100
100
100
98
100
86
B
FLC
MIKC*
SVP
ANR1
C/D
SOC1
A
AGL12
AGL6
E
B
AGL15
Fig. 3 Phylogenetic analysis of MADS-box genes in A. ramifera. (A) Type II MADS-box genes. (B) Type I MADS-box genes. Neighbor-joining gene
trees were constructed using MADS-box genes from A. ramifera and Arabidopsis. Genes from A. ramifera are marked in red. Different MADS-box
classes are indicated. Numbers above branches are bootstrap support values of at least 50
Zhang et al. BMC Genomics (2021) 22:536 Page 5 of 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
R
NAC
WRKY
109 77 64
114 82 65
172 113 83
182 91 71
86 92 77
66 64 50
71 55 56
Phalaenopsis aphrodite
Phalaenopsis equestris
Dendrobium officinale
Dendrobium catenatum
Vanilla planifolia
Apostasia shenzh enica
Apostasia ram ifera
R
NAC
WRKY
40
80
120
160
Fig. 5 Number of members of Rgenes and NAC and WRKY gene families in different orchids. These gene families are marked in blue, green, and
yellow, respectively. Sizes of circles are directly proportional to number of members in gene family
Ash010893
Vpl002448
AT4G15870
Dca005188
Ash001839
Dca011215
Vpl024694
Vpl023468
Dca022838
Vpl000430
Dca011214
XP_020580217.1
XP_020576698.1
AT4G16740
Vpl013757
PAXXG215100
Dca003139
AT4G20200
Dca000725
PAXXG249710
PAXXG379100
AT3G14540
Ara016901
Vpl001457
Dca020940
AT1G79460
Ash001894
PAXXG344580
Dca016979
Vpl019210
Vpl022696
XP_020590622.1
XP_020584124.1
PAXXG276750
Ara019716
PAXXG034410
Ara004686
Dca026890
XP_020591710.1
PAXXG278350
Ara008027
PAXXG024450
Dca003295
PAXXG010370
Ara013999
Dca017971
AT3G25830
XP_020596455.1
Ash000699
Vpl014635
PAXXG215110
PAXXG045650
PAXXG276820
AT4G13300
AT4G02780
Vpl000975
AT3G25820
XP_020588804.1
XP_020584121.1
AT1G33750
AT1G61120
Vpl013173
PAXXG352310
Dca003141
PAXXG024540
XP_020590461.1
Dca025698
XP_020590463.1
Vpl003795
Dca018407
AT4G20230
PAXXG149140
AT3G25810
AT5G48110
27
7
1
2
0lp
V
AT5G23960
Ash014324
XP_020597358.1
AT3G29190
Ash010892
Dca008309
AT4G13280
Dca007747
AT2G23230
XP_020590464.1
AT1G31950
PAXXG276740
Dca003142
AT3G29110
XP_020588788.1
PAXXG034430
XP_020599757.1
AT1G66020
Ash010138
Vpl019259
XP_020576641.1
PAXXG024480
Dca026369
AT3G14520
Vpl017783
Vpl014945
XP_020576699.1
PAXXG010350
XP_020579525.1
Dca019411
Dca019412
Ara010433
Vpl008182
PAXXG034420
AT3G14490
XP_020588364.1
PAXXG049850
Vpl008741
PAXXG276730
AT3G32030
Dca007746
Vpl012224
Dca026570
XP_020586098.1
Dca000723
688
9
0
0l
p
V
Vpl012059
AT2G24210
AT4G16730
Ash013010
XP_020590460.1
Dca018946
AT1G70080
XP_020598459.1
PAXXG370420
AT1G61680
0
1
49
2G3
TA
AT1G48800
XP_020576697.1
Dca000724
Vpl003138
PAXXG346400
Vpl016604
AT5G44630
AT4G20210
Dca013782
100
99
98
100
100
100
98
67
100
71
100
100
98
100
100
100
100
100
94
99
100
51
100
100
100
100
91
85
100
77
100
100
100
99
81
51
99
61
100
59
69
100
98
100
100
100
100
100
100
100
78
100
100
100
100
54
91
100
51
100
99
92
100
71
100
99
65
100
100
85
95
96
100
84
80
78
89
78
94
86
9
9
100
9
9
100
100
100
100
100
100
81
92
100
98
100
99
81
99
99
100
99
64
100
100
68
100
96
100
72
85
100
100
99
100
100
100
87
100
100
100
100
100
97
72
96
50
Arabidopsis thaliana
Apostasia ramifera
Apostasia shenzhenica
Dendrobium catenatum
Phalaenopsis equestris
Phalaenopsis aphrodit
e
Vanilla planifolia
TPS-a
TPS-b
TPS-c
TPS-e
TPS-f
TPS-g
Fig. 4 Phylogenetic tree for TPS genes predicted in six orchid species and Arabidopsis. Numbers above branches are bootstrap support values of
at least 50
Zhang et al. BMC Genomics (2021) 22:536 Page 6 of 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
the epiphytic orchid genomes (Fig. 6and Additional
file 1, Table S21). However, only one homologous
gene was found in A. ramifera,andtheLOX1/LOX5
homologs were completely lost in A. shenzhenica
(Fig. 6and Additional file 1, Table S21). We also
found one copy of the LOX1/LOX5 genes in the
hemi-epiphytic orchid V. planifolia (Fig. 6and Add-
itional file 1,TableS21).
Discussion
With worldwide distribution, orchids are one of the lar-
gest flowering plant families and their extraordinary di-
versity provides an excellent opportunity to explore
plant evolution. Certain evolutionary adaptations in or-
chids, e.g., pollinium, labella and epiphytism, are pro-
posed to have played key roles in their adaptive
evolution and radiation. However, the genetic basis
underlying those innovations remains incompletely
known. In the current study, we sequenced the genome
of A. ramifera, a basal Apostasioideae lineage terrestrial
orchid, and carried out comparative genomic analyses of
seven orchid genomes including that of A. ramifera. Sev-
eral gene families related to adaptations in orchids (e.g.,
MADS-box, pathogen resistance, TPS,andLOX genes)
were compared among different orchid lineages.
MADS-box transcription factors
Compared with other orchids, we found smaller gene
families in the B- and E-classes of type II MADS genes
in Apostasia and Vanilla. Genes in these classes of type
II MADS are involved in floral development [24]. Fur-
thermore, it has been proposed that small size in these
gene families may be related to the maintenance of the
ancestral state in Apostasia flowers, which exhibit radial
symmetry and no specialized labellum [3]. However,
small gene families in the B- and E-classes of the type II
MADS family were also found in V. planifolia, which has
bilaterally symmetrical flower petals and a specialized la-
bellum. These results indicate that members in the B-
and E-classes may not contribute to the different flower
morphologies found among Apostasioideae and other
orchids.
Recent research has suggested that genes from the
MIKC* family are involved in pollen development [25,
26]. Here, we found a MIKC*P-subclass member in the
A. ramifera genome. Furthermore, P- and S-subclasses
Ash000329
AT3G45140
XP_020571523.1
Dca023278
Vpl013972
Ash011707
Ash003356
Dca022825
XP_020592800.1
XP_020577224.1
AT1G72520
Ara021787
Vpl004147
PAXXG023340
Dca016964
XP_020574744.1
XP_020590849.1
PAXXG088720
Vpl009145
Dca016356
Ara004227
Ara003798
XP_020586977.1
Dca020949
Ash010227
Ara006511
XP_020580260.1
Dca016452
PAXXG056000
AT1G55020
Dca004884
AT3G22400
Ash018896
Dca003527
AT1G67560
PAXXG180070
Ara016944
Ara009668
8
83
020lpV
PAXXG098190
AT1G17420
Dca003526
PAXXG354990
100
81
100
100
100
100
100
100
100
100
100
99
100
96
100
100
54
100
99
100
100
99
100
96
99
76
100
100
100
100
100
92
67
100
100
100
73
100
99
Fig. 6 LOX gene tree showing LOX1/LOX5 genes in orchids. Phylogenetic analysis was conducted using LOX gene sequences from A. ramifera,A.
shenzhenica,D. catenatum,P. equestris,P. aphrodite,V. planifolia, and Arabidopsis. Branches leading to orchid LOX1/LOX5 genes are marked in
green. Numbers above branches are bootstrap support values of at least 50
Zhang et al. BMC Genomics (2021) 22:536 Page 7 of 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
members of MIKC* were identified in A. shenzhenica,
while P-subclass genes were lost in P. equestris [3]. It
has been proposed that loss of P-subclass genes is asso-
ciated with the evolution of pollinia [3]. However, both
P- and S-subclass members have been identified in the
genome assembly of P. aphrodite [7] and V. planifolia
(Additional file 1, Fig. S2). Thus, loss of MIKC* genes in
some orchids might not be relevant to the evolution of
pollinia.
A lack of Mβgenes has been reported in some orchid
genomes, including A. shenzhenica,P. equestris, and D.
catenatum [2,3,5]. Here, we found that the Mβgene
was also absent in A. ramifera and V. planifolia. Zhang
et al. [3] suggested that loss of Mβ-class type I MADS-
box transcription factors is related to the absence of
endosperm in the seeds of all orchids. However, Mβ
genes have been discovered in the genome of P. aphro-
dite and transcriptome of Orchis italica [7,27]. Thus, in-
stead of Mβgenes, other genes or mechanisms may
contribute to the absence of endosperm in orchid seeds.
TPS gene family
In comparison to that in Apostasia, more members in
the TPS-a and TPS-b clades of the TPS gene family were
found in Vanilla,Dendrobium, and Phalaenopsis.Mem-
bers of these clades are involved in the biosynthesis of
volatile terpenoids, which are the components of floral
scent [19]. In addition, it has been proposed that expan-
sion of TPS subfamilies may promote the emergence of
novel compounds [18]. As the flowers of orchids in the
Epidendroideae and Vanilloideae subfamilies are highly
adapted to animal pollination via many pollination syn-
dromes, including the development of various volatile
compounds, this result may provide new insight into the
genetic basis of adaptation to insect pollination in epi-
phytic orchids. Gene duplication and divergence are
more effective ways of evolving new enzymatic functions
than de novo evolution of a new gene [18]. Thus, more
members of the TPS-a and TPS-b subfamilies may facili-
tate the emergence of novel volatile compounds, which
may contribute to their adaptation to diverse animal pol-
linators via the production of diverse flower scents.
Lateral root development
For higher land-based plants, roots play a significant role
in their successful colonization of the terrestrial environ-
ment by providing mechanical support as well as water
and nutrient uptake from the soil (or air for epiphytic
plants) [28]. Root architecture, i.e., the spatial
organization of roots, also has a significant impact on
the functional performance of the root system and is im-
portant for plant survival [28,29]. Environmental fac-
tors, such as water and nutrient availability, contribute
to the shaping of root architecture [30]. Significant root
system differences have been reported between Aposta-
sia and other orchids [3]. Among them, branch roots
have been found in Apostasia but not in epiphytic or-
chids, such as Phalaenopsis [3]. In land plants, the for-
mation of lateral roots plays a crucial role in root
architecture, uptake, and anchoring. Following their
adaptation to soil-free environments, however, various
orchids have lost the ability to develop lateral roots, in-
stead forming specialized root structures, such as spongy
epidermis, to help preserve nutrients. Zhang et al. [3] re-
ported that variation in the copy number of ANR1 sub-
family MADS-box genes results in different lateral root
formation between A. shenzhenica and epiphytic P.
equestris and D. catenatum. However, as the develop-
ment of lateral roots is a complicated process that in-
volves intricate regulation and phytohormone
interactions [31,32], the genetic mechanisms controlling
the emergence of lateral roots in orchids await further
investigation. In this study, we found fewer copies of the
LOX1/LOX5 homologous genes in the Apostasia species
and hemi-epiphytic V. planifolia than that in the epi-
phytic orchids. Given the function of LOX1 and LOX5 in
Arabidopsis [23], we propose that copy number variation
in these genes may contribute to the differences in lat-
eral root development between terrestrial and epiphytic
orchids. In addition, according to the phylogenetic rela-
tionship of LOX genes in orchids, there are six different
subclades of LOX genes in the common ancestor of or-
chids. The variation in copy number among the different
orchid lineages may be due to the various degrees of
gene retention, rather than gene duplication.
Conclusions
In this study, we performed de novo assembly and ana-
lysis of the genome of A. ramifera, a terrestrial orchid
from the Apostasioideae subfamily. We revealed the
population size histories of different orchid species and
discovered a continuous decrease in population size
from the genomes of these species over the last 100 000
years. In addition, the gene family size and subfamily
architecture of TPS genes varied greatly among species
from different orchid subfamilies, which may be associ-
ated with the adaptive evolution of orchids. Genes asso-
ciated with pathogen resistance were significantly
reduced in the genomes of Apostasia compared with
that of other orchids. In Apostasia, we also found genes
that were likely involved in the regulation of lateral root
development, which is an important trait for terrestrial
growth. The A. ramifera genome sequence reported here
should be an important resource for further investiga-
tions on orchid biology. Comparative genomics analysis
of A. ramifera and other orchids should provide new in-
sights into the adaptive evolution of these species.
Zhang et al. BMC Genomics (2021) 22:536 Page 8 of 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Methods
Sample preparation and sequencing
The A. ramifera samples were collected from Jianfeng
Mountain, Hainan Province, China. No permission was
required to collect these samples. The formal identifica-
tion of plant material was conducted by Prof. Zhongjian
Liu. A voucher specimen of the material was deposited
at the National Orchid Conservation Center of China
under deposition number Liu.JZ6475. For genome se-
quencing, we collected fresh leaves from A. ramifera. Ex-
traction of genomic DNA was carried out using the
modified cetyltrimethylammonium bromide protocol
[33]. Five DNA libraries with different insert sizes were
constructed using an Illumina library construction kit
(NEB DNA Library Rapid Prep Kit for Illumina) and
then sequenced using the Illumina HiSeq 2000 platform.
After filtering the raw reads according to sequencing
quality and adaptor contamination, a total of 57.4 Gb of
clean data were retained for assembling the Apostasia
genome.
Genome assembly
The estimated genome size of A. ramifera was
332.24 Mb according to k-mer frequency distribution.
Only one peak was observed in the k-mer distribution,
indicating high homozygosity of the Apostasia genome.
For genome assembly, SOAPdenovo2 [34] was used for
contig construction and scaffolding, and GapCloser was
used for extending the length of the final contigs. In
total, 57.4 Gb of clean reads derived from the DNA li-
braries with five insert sizes (Additional file 1, Table S1)
were used by SOAPdenovo2 assembler and GapCloser
for de novo genome assembly.
Repeat annotation
Repeat sequences consist of tandem repeats, such as
small and micro-satellite DNA, and interspersed repeats
(also known as transposable elements, TEs). In the A.
ramifera genome, tandem repeat sequences were identi-
fied by TRF software [35]. Identification of TEs was con-
ducted by homology searches of the RepBase database
[36] and de novo prediction. Briefly, RepeatMasker [37]
and RepeatProteinMask [38] were applied to identify
TEs in the Apostasia genome with a RepBase-derived li-
brary of known repeat elements. For de novo prediction,
we used RepeatModeler and LTR-FINDER [39] to con-
struct a de novo repetitive element library for the A.
ramifera genome. RepeatMasker was then applied to
search the genome for TEs with the constructed data-
base. Finally, these results were combined, and the re-
dundant sequences were removed to generate a
complete repeat annotation.
Gene and non-coding RNA prediction
Because previously published work on the V. planifolia draft
genome [8] did not include gene prediction, we carried out
protein-coding gene prediction for the A. ramifera and V.
planifolia genomes. Firstly, we used AUGUSTUS [40]and
GlimmerHMM [41] to generate the de novo predicted gene
sets for our assembly and the V. planifolia genome (BioPro-
ject: PRJNA507095). Protein sets derived from five plant ge-
nomes, including Arabidopsis thaliana,Phalaenopsis
equestris,Oryza sativa,Sorghum bicolor,andZea mays,were
then applied to search against the Apostasia and Vanilla ge-
nomes using TBLASTN with an E-value cutoff of 1e-5 and
minimum query coverage of 25 %. GeneWise [42]wasused
to annotate the gene structures. The RNA-seq datasets
(SRR1509356, SRR1509370, and SRR1509674) for V. planifo-
lia were downloaded from NCBI SRA, and were de novo as-
sembled by Trinity software. Vanilla transcripts were applied
to annotate the V. planifolia genome using the PASA pro-
gram. The annotation results derived from different methods
were then integrated to generate integrated protein-coding
gene sets for A. ramifera and V. planifolia with the MAKER
[43]program.
Non-coding RNAs do not translate into protein se-
quences but exert significant roles in cellular metabol-
ism, and include microRNAs (miRNAs), transfer RNAs
(tRNAs), ribosomal RNAs (rRNAs), and small nuclear
RNAs (snRNAs). Here, we applied previously described
methods to search for non-coding RNAs in the Aposta-
sia genome [3]. The miRNA- and snRNA-coding genes
were predicted using INFERNAL [44] and the tRNA-
coding genes were identified using tRNAscan-SE [45].
Genes encoding rRNAs were annotated by searching the
genome with the rRNA sequences of Arabidopsis.
Functional annotation
Functional analysis of the predicted genes in the Aposta-
sia genomes was performed by searching their protein-
coding regions against sequences derived from publicly
available databases, including Gene Ontology (GO) [46,
47], Kyoto Encyclopedia of Genes and Genomes (KEGG)
[48], SwissProt [49], TrEMBL [49], non-redundant (nr)
protein database, and InterProScan [50].
Gene family identification
Gene family clustering was conducted using OrthoFinder[51]
with complete protein sets from seven species, including P.
equestris,P. aphrodite,D. officinale,D. catenatum,A. shenz-
henica,A. officinalis,andO. sativa,aswellasthepredicted
protein sequences from A. ramifera. To limit the disturbance
of alternative splicing variants on gene family clustering, the
longest transcript of each gene was selected for analysis.
Gene families in which the number of genes from Apostasia
(including A. ramifera and A. shenzhenica) was 1.5 times
Zhang et al. BMC Genomics (2021) 22:536 Page 9 of 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
higher than that from other orchids were considered ex-
panded in Apostasia.
Phylogenetic analysis
To build a high-confidence phylogenetic tree, we constructed
a multi-species protein set containing protein sequences
from A. ramifera and 15 other species, including 11 mono-
cots (Spirodela polyrhiza,D. catenatum,P. equestris,A.
shenzhenica,A. officinalis,Ananas comosus,Musa acumi-
nata,Phoenix dactylifera,Brachypodium distachyon,S. bi-
color,andO. sativa), three eudicots (Vitis vinifera,A.
thaliana,andPopulus trichocarpa), and the outgroup
Amborella trichopoda. Protein sequences that contained less
than 50 amino acids were removed from the constructed
dataset. The pairwise similarities between protein sequences
were calculated through all-against-all BLASTP with cutoff
criteria: i.e., (i) E-value < 1e-5, (ii) query coverage > 30 %, (iii)
alignment identify > 30%. The results were then entered into
OrthoMCL [52] (v2.0.9) to construct orthologous groups. In
total, 381 single-copy gene families shared by all 16 species
were applied to construct a species tree using MrBayes [53]
with the GTR + invgamma model. PAML MCMCTree [54]
was used to estimate the species divergence times with the
following time calibrations: (i) O. sativa and B. distachyon di-
vergence time (40–54 million years ago) [55], (ii) P. tricho-
carpa and A. thaliana divergence time (100–120 million
years ago) [56], (iii) lower boundary of monocot and eudicot
divergence time (140 million years ago) [57], and (iv) upper
boundary for angiosperm divergence time (200 million years
ago) [58]. Gene family expansions or contractions were iden-
tified using CAFÉ [12].
Heterozygosity analysis and estimation of effective
population size
Identification of heterozygous loci was performed via a
previously described method [59]. Briefly, clean reads
were aligned to the genome sequence of A. ramifera
using the BWA tool [60]. Duplicate reads were then re-
moved by Picard. SAMtools [61] was used for calling
heterozygous loci, and bcftools was used for generating
consensus sequences. The effective population sizes of
the orchid species were estimated using the PSMC pro-
gram [13]. The parameters for PSMC analysis were set
to default except for -g 4 and -u 0.5 × 10
−8
.
Identification of MADS-box, TPS,NAC,WRKY,R, and LOX
genes
The hidden Markov model profiles [62] were applied to
search for MADS-box (Pfam Accession: PF00319), TPS
(Pfam Accession: PF01397 and PF03936), NAC (Pfam Acces-
sion: PF02365), WRKY (Pfam Accession: PF03106), and R
(Pfam Accession: PF00931) genes using HMMER [63]
(v3.2.1). MADS-box genes in A. thaliana reported in [3]were
used to reconstruct gene trees with the MADS-box genes
identified in A. ramifera and V. planifolia. EvolView [64]was
used to visualize the number of members in the NAC,
WRKY and Rgene families for the selected species. For the
TPS genes, the protein sequences that possessed both Pfam
domains and contained more than 500 amino acids were
considered as functional genes and used for further analysis.
To identify LOX genes, protein sequences of the LOX gene
family in A. thaliana (Gene ID: AT1G55020, AT1G72520,
AT1G67560, AT1G17420, AT3G22400, and AT3G45140)
were used to search for homologous genes in orchids. The
identified protein sequences of each gene family were aligned
using MUSCLE [65](v3.8.31)withdefaultsettings.MEGA7
[66] was then used to construct an unrooted neighbor-
joining tree for each gene family with 500 bootstrap
replicates.
Abbreviations
TEs: Transposable elements.; GO: Gene Ontology.; KEGG: Kyoto Encyclopedia
of Genes and Genomes.; PSMC: Pairwise sequential Markov coalescent.;
TPS: Terpene synthase.
Supplementary Information
The online version contains supplementary material available at https://doi.
org/10.1186/s12864-021-07852-3.
Additional file 1.
Acknowledgements
Not applicable.
Authors’contributions
G.Z., H.H., Z.L., and J.C. designed and managed the project. J.C. and W.Z.
wrote and revised the manuscript. W.Z., G.Z., P.Z., and Y.Z. contributed to
genome sequencing and data analysis. The final manuscript was read and
approved by all authors.
Funding
This study was funded by the Science and Technology Development Fund
Macau SAR (File no. 031/2017/A1) to H.H., Talents Team Construction Fund
of Northwestern Polytechnical University (NWPU) to J.C., Fundamental
Research Funds for the Central Universities (3102019JC007) to J.C., and
National Thousand Youth Talents Plan to J.C. The funders played no role in
the study.
Availability of data and materials
Raw data and the genome assembly from this study were deposited in NCBI
under the BioProject ID: PRJNA635894. The datasets supporting the
conclusions of this article are included within the article and its additional
files.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Author details
1
State Key Laboratory of Quality Research in Chinese Medicine, Institute of
Chinese Medical Sciences, University of Macau, 999078 Macau, China.
2
Key
Laboratory of National Forestry and Grassland Administration for Orchid
Zhang et al. BMC Genomics (2021) 22:536 Page 10 of 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Conservation and Utilization, 518114 Shenzhen, China.
3
Shenzhen Key
Laboratory for Orchid Conservation and Utilization, 518114 Shenzhen, China.
4
National Orchid Conservation Center of China and Orchid Conservation and
Research Center of Shenzhen, 518114 Shenzhen, China.
5
Key Laboratory of
NFGA for Orchid Conservation and Utilization, Fujian Agriculture and Forestry
University, 350002 Fuzhou, China.
6
School of Ecology and Environment,
Northwestern Polytechnical University, 710129 Xi’an, China.
Received: 19 June 2020 Accepted: 23 June 2021
References
1. Christenhusz MJM, Byng JW. The number of known plants species in the
world and its annual increase. Phytotaxa. 2016;261(3).
2. Cai J, Liu X, Vanneste K, Proost S, Tsai WC, Liu KW, et al. The genome
sequence of the orchid Phalaenopsis equestris. Nat Genet. 2015;47(1):65–72.
3. Zhang GQ, Liu KW, Li Z, Lohaus R, Hsiao YY, Niu SC, et al. The Apostasia
genome and the evolution of orchids. Nature. 2017;549(7672):379–83.
4. Wei S, Shih C-C, Chen N-H, Tung S-J, editors. Value chain dynamics in the
Taiwan orchid industry. I International Orchid Symposium 878; 2010.
5. Zhang GQ, Xu Q, Bian C, Tsai WC, Yeh CM, Liu KW, et al. The Dendrobium
catenatum Lindl. genome sequence provides insights into polysaccharide
synthase, floral development and adaptive evolution. Sci Rep. 2016;6:19029.
6. Yan L, Wang X, Liu H, Tian Y, Lian J, Yang R, et al. The Genome of
Dendrobium officinale Illuminates the Biology of the Important Traditional
Chinese Orchid Herb. Mol Plant. 2015;8(6):922–34.
7. Chao YT, Chen WC, Chen CY, Ho HY, Yeh CH, Kuo YT, et al. Chromosome-
level assembly, genetic and physical mapping of Phalaenopsis aphrodite
genome provides new insights into species adaptation and resources for
orchid breeding. Plant Biotechnol J. 2018;16(12):2027–41.
8. Hu Y, Resende MF, Bombarely A, Brym M, Bassil E, Chambers AH. Genomics-
based diversity analysis of Vanilla species using a Vanilla planifolia draft
genome and Genotyping-By-Sequencing. Scientific reports. 2019;9(1):1–16.
9. Kocyan A, Qiu Y-L, Endress P, Conti E. A phylogenetic analysis of
Apostasioideae (Orchidaceae) based on ITS, trnL-F and matK sequences.
Plant Syst Evol. 2004;247(3–4):203–13.
10. Kocyan A, Endress PK. Floral structure and development of Apostasia and
Neuwiedia (Apostasioideae) and their relationships to other Orchidaceae. Int
J Plant Sci. 2001;162(4):847–67.
11. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM.
BUSCO: assessing genome assembly and annotation completeness with
single-copy orthologs. Bioinformatics. 2015;31(19):3210–2.
12. De Bie T, Cristianini N, Demuth JP, Hahn MW. CAFE: a computational tool
for the study of gene family evolution. Bioinformatics. 2006;22(10):1269–71.
13. Li H, Durbin R. Inference of human population history from individual
whole-genome sequences. Nature. 2011;475(7357):493–6.
14. Gillman MP, Dodd M. The variability of orchid population size. Botanical
journal of the Linnean Society. 1998;126(1–2):65–74.
15. Alexandersson R, Ågren J. Population size, pollinator visitation and fruit
production in the deceptive orchid Calypso bulbosa. Oecologia. 1996;107(4):
533–40.
16. Masiero S, Colombo L, Grini PE, Schnittger A, Kater MM. The emerging
importance of type I MADS box transcription factors for plant reproduction.
Plant Cell. 2011;23(3):865–72.
17. Pazouki L, Niinemets Ü. Multi-substrate terpene synthases: their occurrence
and physiological significance. Front Plant Sci. 2016;7:1019.
18. Karunanithi PS, Zerbe P. Terpene synthases as metabolic gatekeepers in the
evolution of plant terpenoid chemical diversity. Frontiers in plant science.
2019;10:1166.
19. Chen F, Tholl D, Bohlmann J, Pichersky E. The family of terpene synthases in
plants: a mid-size family of genes for specialized metabolism that is highly
diversified throughout the kingdom. Plant J. 2011;66(1):212–29.
20. Tian D, Traw M, Chen J, Kreitman M, Bergelson J. Fitness costs of R-gene-
mediated resistance in Arabidopsis thaliana. Nature. 2003;423(6935):74–7.
21. Yuan X, Wang H, Cai J, Li D, Song F. NAC transcription factors in plant
immunity. Phytopathology Research. 2019;1(1):1–13.
22. Pandey SP, Somssich IE. The role of WRKY transcription factors in plant
immunity. Plant physiology. 2009;150(4):1648–55.
23. Vellosillo T, Martínez M, López MA, Vicente J, Cascón T, Dolan L, et al.
Oxylipins produced by the 9-lipoxygenase pathway in Arabidopsis regulate
lateral root development and defense responses through a specific
signaling cascade. Plant Cell. 2007;19(3):831–46.
24. Tsai W-C, Chen H-H. The orchid MADS-box genes controlling floral
morphogenesis. The Scientific World Journal. 2006;6:1933–44.
25. Liu Y, Cui S, Wu F, Yan S, Lin X, Du X, et al. Functional conservation of
MIKC*-Type MADS box genes in Arabidopsis and rice pollen maturation.
Plant Cell. 2013;25(4):1288–303.
26. Kwantes M, Liebsch D, Verelst W. How MIKC* MADS-box genes originated and
evidence for their conserved function throughout the evolution of vascular plant
gametophytes. Molecular biology evolution. 2012;29(1):293–302.
27. Valoroso MC, Censullo MC, Aceto S. The MADS-box genes expressed in the
inflorescence of Orchis italica (Orchidaceae). PloS one. 2019;14(3).
28. Smith S, De Smet I. Root system architecture: insights from Arabidopsis and
cereal crops. The Royal Society; 2012.
29. Li X, Zeng R, Liao H. Improving crop nutrient efficiency through root
architecture modifications. Journal of integrative plant biology. 2016;58(3):
193–202.
30. Kiba T, Krapp A. Plant nitrogen acquisition under low availability: regulation
of uptake and root architecture. Plant Cell Physiol. 2016;57(4):707–14.
31. Du Y, Scheres B. Lateral root formation and the multiple roles of auxin. J
Exp Bot. 2018;69(2):155–67.
32. Fukaki H, Tasaka M. Hormone interactions during lateral root formation.
Plant molecular biology. 2009;69(4):437.
33. Murray M, Thompson WF. Rapid isolation of high molecular weight plant
DNA. Nucleic acids research. 1980;8(19):4321–6.
34. Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, et al. SOAPdenovo2: an
empirically improved memory-efficient short-read de novo assembler.
Gigascience. 2012;1(1):2047–217X-1-18.
35. Benson G. Tandem repeats finder: a program to analyze DNA sequences.
Nucleic acids research. 1999;27(2):573–80.
36. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J.
Repbase Update, a database of eukaryotic repetitive elements. Cytogenet
Genome Res. 2005;110(1–4):462–7.
37. Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive
elements in genomic sequences. Current protocols in bioinformatics. 2009;
25(1):4. 10. 1–4. 4.
38. Tempel S. Using and understanding RepeatMasker. Mobile Genetic
Elements: Springer; 2012. pp. 29–51.
39. Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length
LTR retrotransposons. Nucleic acids research. 2007;35(suppl_2):W265-W8.
40. Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS:
ab initio prediction of alternative transcripts. Nucleic acids research. 2006;
34(suppl_2):W435-W9.
41. Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open
source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20(16):2878–9.
42. Birney E, Clamp M, Durbin R. GeneWise and genomewise. Genome
research. 2004;14(5):988–95.
43. Holt C, Yandell M. MAKER2: an annotation pipeline and genome-database
management tool for second-generation genome projects. BMC Bioinform.
2011;12(1):491.
44. Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments.
Bioinformatics. 2009;25(10):1335–7.
45. Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of
transfer RNA genes in genomic sequence. Nucleic acids research. 1997;25(5):
955–64.
46. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene
ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9.
47. Consortium GO. The gene ontology resource: 20 years and still GOing
strong. Nucleic acids research. 2019;47(D1):D330-D8.
48. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes.
Nucleic acids research. 2000;28(1):27–30.
49. Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its
supplement TrEMBL in 2000. Nucleic acids research. 2000;28(1):45–8.
50. Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, et al.
InterPro: the integrative protein signature database. Nucleic acids research.
2009;37(suppl_1):D211-D5.
51. Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole
genome comparisons dramatically improves orthogroup inference accuracy.
Genome biology. 2015;16(1):157.
52. Li L, Stoeckert CJ, Roos DS. OrthoMCL: identification of ortholog groups for
eukaryotic genomes. Genome research. 2003;13(9):2178–89.
Zhang et al. BMC Genomics (2021) 22:536 Page 11 of 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
53. Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of phylogenetic
trees. Bioinformatics. 2001;17(8):754–5.
54. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecular
biology evolution. 2007;24(8):1586–91.
55. Initiative IB. Genome sequencing and analysis of the model grass
Brachypodium distachyon. Nature. 2010;463(7282):763.
56. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, et al.
The genome of black cottonwood, Populus trichocarpa (Torr. & Gray).
science. 2006;313(5793):1596–604.
57. Chaw S-M, Chang C-C, Chen H-L, Li W-H. Dating the monocot–dicot
divergence and the origin of core eudicots using whole chloroplast
genomes. Journal of molecular evolution. 2004;58(4):424–41.
58. Magallón S, Hilu KW, Quandt D. Land plant evolutionary timeline: gene
effects are secondary to fossil constraints in relaxed clock estimation of age
and substitution rates. Am J Bot. 2013;100(3):556–73.
59. Chaw S-M, Liu Y-C, Wu Y-W, Wang H-Y, Lin C-YI, Wu C-S, et al. Stout
camphor tree genome fills gaps in understanding of flowering plant
genome evolution. Nature plants. 2019;5(1):63–73.
60. Li H, Durbin R. Fast and accurate short read alignment with Burrows–
Wheeler transform. bioinformatics. 2009;25(14):1754–60.
61. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The
sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):
2078–9.
62. El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, et al. The
Pfam protein families database in 2019. Nucleic acids research. 2019;47(D1):
D427-D32.
63. Eddy SR. Accelerated profile HMM searches. PLoS computational biology.
2011;7(10).
64. Subramanian B, Gao S, Lercher MJ, Hu S, Chen W-H. Evolview v3: a
webserver for visualization, annotation, and management of phylogenetic
trees. Nucleic acids research. 2019;47(W1):W270-W5.
65. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and
high throughput. Nucleic acids research. 2004;32(5):1792–7.
66. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics
analysis version 7.0 for bigger datasets. Molecular biology evolution. 2016;
33(7):1870–4.
Publisher’sNote
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Zhang et al. BMC Genomics (2021) 22:536 Page 12 of 12
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
Available via license: CC BY 4.0
Content may be subject to copyright.