ArticlePDF Available

Chloroplast genomes and nuclear sequences reveal the interspecific relationships of Crataegus bretschneideri C. K. Schneid. and related species in China

Authors:

Abstract and Figures

Crataegus bretschneideri C. K. Schneid. is one of the species cultivated in China. Due to its unclear taxonomic classification status, the conservation and utilization of this germplasm resource have been limited. In this study, we analyzed the chloroplast genomes and nuclear sequences to reveal the taxonomic relationships among C. bretschneideri and related species. We assembled the chloroplast genomes of C. bretschneider and related species and varieties, including C. maximowiczii C. K. Schneid., C. maximowiczii var. ninganensis S. Q. Nie & B. J. Jen., C. pinnatifida Bunge, and C. pinnatifida var. major N. E. Br. The lengths of the chloroplast genomes ranged from 159,644 bp (C. bretschneideri) to 159,947 bp (C. pinnatifida var. major). The five Crataegus chloroplast genomes had similar features and possessed 86 to 88 protein-coding genes, 37 tRNA genes, and eight rRNA genes which were arranged in the same order. Eight mutation hotspot regions, including matk, psaB, accD, petA, clpP, trnD-GUC, psbH-petB, and trnN-GUU-trnR-ACG could be used as potential molecular markers for further studies of Crataegus genetic diversity. Phylogenetic analyses based on 17 chloroplast genomes of Crataegus and Amelanchier indicated that C. bretschneideri was related to C. maximowiczii and C. maximowiczii var. ninganensis. However, the phylogenetic trees constructed by nuclear sequences of 36 Crataegus accessions reflected a closer relationship between C. bretschneideri and C. pinnatifida. Furthermore, divergence time estimation suggested that C. bretschneideri and C. maximowiczii diverged in the late Miocene and that speciation of C. pinnatifida occurred during the middle to late Miocene. These findings revealed that C. bretschneideri is an independent species and may be of hybrid origin.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
1 3
https://doi.org/10.1007/s11295-022-01556-9
ORIGINAL ARTICLE
Chloroplast genomes andnuclear sequences reveal theinterspecific
relationships ofCrataegus bretschneideri C. K. Schneid. andrelated
species inChina
XiaoZhang1· XinyuSun2· TongLi1· JianWang1· MiliaoXue1· ChaoSun1· WenxuanDong1
Received: 27 January 2022 / Revised: 3 April 2022 / Accepted: 13 May 2022
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022
Abstract
Crataegus bretschneideri C. K. Schneid. is one of the species cultivated in China. Due to its unclear taxonomic classification
status, the conservation and utilization of this germplasm resource have been limited. In this study, we analyzed the chlo-
roplast genomes and nuclear sequences to reveal the taxonomic relationships among C. bretschneideri and related species.
We assembled the chloroplast genomes of C. bretschneider and related species and varieties, including C. maximowiczii C.
K. Schneid., C. maximowiczii var. ninganensis S. Q. Nie & B. J. Jen., C. pinnatifida Bunge, and C. pinnatifida var. major
N. E. Br. The lengths of the chloroplast genomes ranged from 159,644 bp (C. bretschneideri) to 159,947 bp (C. pinnatifida
var. major). The five Crataegus chloroplast genomes had similar features and possessed 86 to 88 protein-coding genes, 37
tRNA genes, and eight rRNA genes which were arranged in the same order. Eight mutation hotspot regions, including matk,
psaB, accD, petA, clpP, trnD-GUC , psbH-petB, and trnN-GUU-trnR-ACG could be used as potential molecular markers
for further studies of Crataegus genetic diversity. Phylogenetic analyses based on 17 chloroplast genomes of Crataegus and
Amelanchier indicated that C. bretschneideri was related to C. maximowiczii and C. maximowiczii var. ninganensis. However,
the phylogenetic trees constructed by nuclear sequences of 36 Crataegus accessions reflected a closer relationship between
C. bretschneideri and C. pinnatifida. Furthermore, divergence time estimation suggested that C. bretschneideri and C.
maximowiczii diverged in the late Miocene and that speciation of C. pinnatifida occurred during the middle to late Miocene.
These findings revealed that C. bretschneideri is an independent species and may be of hybrid origin.
Keywords Crataegus· C. bretschneideri· Chloroplast genome· Comparative genomics· Interspecific relationships
Introduction
The genus Crataegus belongs to the Rosaceae family. It is
widely distributed in northern temperate zones in eastern
North America, Europe, and East Asia (Phipps 1990; Chang
etal. 2002; Xu etal. 2016). Over 12 Crataegus species are
used as herbal drugs or drug materials worldwide (Chang
etal. 1986). More than 150 compounds, including steroids,
triterpenoids, flavonoids, and organic acids have been identi-
fied and isolated from the Crataegus plant, which benefits
the endocrine, digestive, and cardiovascular systems of the
Communicated by V. Decroocq
* Wenxuan Dong
dongwx63@syau.edu.cn
Xiao Zhang
zhangxiao8866@syau.edu.cn
Xinyu Sun
sunxinyu612@163.com
Tong Li
litong0327@126.com
Jian Wang
botelongma@163.com
Miliao Xue
1250164964@qq.com
Chao Sun
sun13940536629@163.com
1 College ofHorticulture, Shenyang Agricultural University,
Shenyang110866, LiaoningProvince, China
2 Tonghua Horticulture Institute, Tonghua134001,
JilinProvince, China
/ Published online: 25 May 2022
Tree Genetics & Genomes (2022) 18: 24
1 3
human body (Wu etal. 2014; Nazhand etal. 2020; Zhang
etal. 2020a).
China, a major origin of Crataegus, has a long history of
hawthorn cultivation (Guo and Jiao 1995). A total of 20 spe-
cies and seven varieties of Crataegus are widely distributed
across China (Dong and Li 2015). Among these Crataegus,
including C. bretschneideri C. K. Schneid., C. maximowiczii
C. K. Schneid., C. maximowiczii var. ninganensis S. Q. Nie
& B. J. Jen., C. sanguinea Pall., and some populations of
C. pinnatifida Bunge are naturally distributed in Northeast
China (Dong and Li 2015; Du etal. 2019). As a cultivated
species, C. bretschneideri has many excellent characteristics,
such as high yield and cold tolerance. The mature fruit of C.
bretschneideri is sweet and flavorsome with bright colors,
and it is popular among local consumers. Since most of the
C. bretschneideri are polyploidy, their original parents are
still not clear. C. bretschneideri has been recognized as a
taxonomically challenging species (Dong and Li 2015).
Morphological traits are important indices for identifying
Crataegus species (Dickinson etal. 1996). However, the tra-
ditional classification of Crataegus based on morphological
traits has been contested and influenced by the environment
(Christensen 1984; Gosler etal. 1994). The morphological
traits of fruits and leaves between C. bretschneideri and C.
pinnatifida are very similar (Supplementary Fig.1). Accord-
ing to the “Repertorium specierum novarum regni vegeta-
bilis” (Friderico 1903), C. bretschneideri is the synonym
of C. pinnatifida var. major N. E. Br. However, Sokolov
(1954) regarded C. bretschneideri as the synonym of C. pin-
natifida. Moreover, some plant taxonomy websites, such as
“The Plant List” (http:// www. thepl antli st. org/ tpl1.1/ record/
rjp- 18388) and the “Chinese Field Herbarium” (CFH, http://
www. cfh. ac. cn/ 13728 79. sp? AspxA utoDe tectC ookie Suppo
rt=1) defined C. bretschneideri as the synonym of C. pin-
natifida rather than a true species.
Several studies have examined the interspecific relation-
ships of partial Chinese Crataegus accessions. Ten polymer-
ase chain reaction-restriction fragment length polymorphism
(PCR-RFLP) markers amplified the same bands from chloro-
plast DNA (cpDNA) of C. bretschneideri, C. maximowiczii,
C. sanguinea, C. kansuensis E. H. Wilson, and C. dahurica
Koehne ex C. K. Schneid., indicating that these species had
the closer genetic relationships (Wu etal. 2008). C. bretsch-
neideri, C. pinnatifida, and C. pinnatifida var. major N. E.
Br. were clustered into the same branch of the phylogenetic
tree which were constructed by ten apple simple sequence
repeat (SSR) markers (Zhang etal. 2008). Recently, SSRs
and specific locus-amplified fragment sequencing (SLAF-
seq) were used to clarify the origin of Chinese Crataegus
and identify germplasm resources (Du etal. 2019; Zhang
etal. 2021). The results generated from SSRs and SLAF-seq
revealed that C. bretschneideri had a close relationship with
C. pinnatifida. In some early investigations, the complex
origin of C. bretschneideri has not been fully revealed due
to the limited molecular markers.
The chloroplast genome has been the primary objective
for plant phylogeny and evolution studies (Daniell etal.
2021). cpDNA barcode genes, such as psbA-trnH, trnS-
trnG, trnH-rpl2, rpl16, atpF-atpH, trnL-trnF, rpl20-rps12,
matK, and rbcL have been used to estimate the phylogeny
of Crataegus (Verbylaitė etal. 2006; Lo etal. 2009; Zarrei
etal. 2015; Brown etal. 2016; Emami et al. 2018). The
next-generation sequencing has been applied to sequence
the complete plastid genome, and large phylogenetic trees
across green plants have been constructed (Gitzendanner
etal. 2018). The complete or partial chloroplast genomes of
several Crataegus species have been published, including
C. pinnatifida var. major, C. chungtienensis W. W. Smith,
C. marshallii Eggleston, Crataegus sp., C. pinnatifida, C.
kansuensis, and C. hupehensis Sarg. (Zhang etal. 2017; Liu
etal. 2019; He etal. 2020; Zhang etal. 2020b; Hu etal.
2021). With such abundant chloroplast genome data, the
comparative genomic analysis will be an effective tool to
reveal the interspecific relationships among C. bretschnei-
deri and related species.
In this study, five Crataegus chloroplast genomes (C.
bretschneideri, C. maximowiczii, C. maximowiczii va r.
ninganensis, C. pinnatifida, and C. pinnatifida var. major)
were newly sequenced and compared with published Cra-
taegus and Amelanchier chloroplast genomes. We analyzed
potential molecular markers, phylogenetic relationships, and
divergence times based on the above comparison results.
In addition, the phylogenetic relationships of eight species
and two varieties of Crataegus native to China were evalu-
ated by nuclear genes sequencing, including nrDNA internal
transcribed spacer region (ITS) and floral meristem identity
control protein LEAFY intron 1. These results will provide
a reliable basis for revealing the classification status and
possible origin of C. bretschneideri.
Materials andmethods
Plant materials
C. bretschneideri, C. maximowiczii, C. maximowiczii var.
ninganensis, C. pinnatifida, and C. pinnatifida var. major
were used for chloroplast genomes sequencing. In total,
thirty-six accessions of Crataegus were sampled, which
consisted of C. bretschneideri (9 accessions), C. maximo-
wiczii (3 accessions), C. maximowiczii var. ninganensis (1
accession), C. sanguinea (3 accessions), C. altaica (Loudon)
Lange (2 accessions), C. pinnatifida (7 accessions), C. pin-
natifida var. major (4 accessions), C. hupehensis (3 acces-
sions), C. scabrifolia (Franch.) Rehder (3 accessions), and C.
songarica K. Koch (1 accession). The thirty-six Crataegus
24 Page 2 of 16 Tree Genetics & Genomes (2022) 18: 24
1 3
accessions were subjected to ITS and LEAFY sequencing.
Voucher specimen information for these Crataegus acces-
sions was provided in Table1. Young, healthy, and fresh
Crataegus leaves were collected from the National Fruit-tree
Germplasm Resources, Shenyang Hawthorn Repository in
2019. All the samples were immediately frozen in liquid
nitrogen and stored at −80 °C.
Chloroplast genome sequencing andassembly
For genomic DNA extraction, 1 g of frozen leaf material was
ground using the cetyl-trimethylammonium bromide method
(Du etal. 2019). The harvested DNA was detected via aga-
rose gel electrophoresis and quantified using an Agilent
2100 bioanalyzer. The DNA concentration (> 30 ng·μL−1)
Table 1 Biogeographic region and botanical characteristics of Crataegus accessions
*These accessions were used for chloroplast genomes sequencing
# Botanical characteristics of Crataegus taxa were taken “Flora of China 9: 111–117. 2003” as a reference
Taxon ID Biogeographic region Botanical characteristics#
Crataegus bretschneideri C. K. Schneid. JF1H Northeast, China It is mainly distributed in Northeast China and is similar to
C. pinnatifida. Leaves lobed, fruit globose, red, 26 mm in
diameter, pyrenes 3–5.
JF2H Northeast, China
JF3H Northeast, China
JF4H Northeast, China
ZF1H* Northeast, China
ZF2H Northeast, China
ZF3H Northeast, China
SF1H Northeast, China
CH Northeast, China
Crataegus maximowiczii C. K. Schneid. MSZ1H Northeast, China Leaves pubescent on both surfaces. Leaf basally cuneate or
broadly cuneate, occasionally truncate; fruit globose, red or
purplish-brown.
MSZ2H Northeast, China
MSZ3H* Northeast, China
Crataegus maximowiczii var. ninganensis
S. Q. Nie & B. J. Jen.
NASZ* Northeast, China It is a variety of C. maximowiczii. Stipules serrate; pedicel
glabrous; fruit persistent sepals pilose.
Crataegus sanguinea Pall. LNSZ1H Northeast, China It is native to the extreme north of China. Pedicel and peduncle
glabrous. Leaf basally cuneate; fruit red, 1 mm in diameter.
LNSZ2H Northeast, China
LNSZ3H Northeast, China
Crataegus altaica (Loudon) Lange AETSZ2H Northwest, China Leaves deeply pinnatifid to more than 1/2 width of the blade.
Fruit golden-yellow, 8–10 mm in diameter, pyrenes 4 or 5;
leaves glabrous or slightly pubescent.
AETSZ3H Northwest, China
Crataegus pinnatifida Bunge NMGSLH North, China It is known as the Chinese hawthorn which originated in North
China. Leaves deeply pinnatifid to more than 1/2 width of the
blade. Leaves truncate or broadly cuneate, with 3–5 pairs of
lobes, pubescent along midvein and lateral veins; fruit red, 15
mm in diameter, pyrenes 3–5.
WTSSLH North, China
YR5H Northeast, China
YP6H Northeast, China
YP8H Northeast, China
1541SLH Northeast, China
MDFSLH* Northeast, China
Crataegus pinnatifida var. major N. E. Br. QJX* Northeast, China It is a variety of C. pinnatifida. Leaves lobed. Leaves truncate
or broadly cuneate, with 3–5 pairs of lobes, pubescent along
midvein and lateral veins; fruit red, 20–30 mm in diameter,
pyrenes 3–5.
MPSZ North, China
XLZR North, China
MYDJX North, China
Crataegus hupehensis Sarg. HBSZ1H Central, China Leaves lobed or not divided, lateral veins extending to apices
of lobes or teeth only. Fruit red, rarely yellow; inflorescence
pubescent or glabrous.
HBSZ2H Central, China
HBSZ3H Central, China
Crataegus scabrifolia (Franch.) Rehder YNSZ1H Southwest, China The tree is apparently not cultivated outside China. Leaves
lobed or not divided, lateral veins extending to apices of
lobes or teeth only. Fruit red, rarely yellow.
YNSZ2H Southwest, China
YNSZ3H Southwest, China
Crataegus songarica K. Koch ZGESZ Northwest, China Leaves lobed or not divided, lateral veins extending to apices
of lobes or teeth only. Pulp yellow; pyrenes 2 or 3, smooth on
2 inner sides.
Page 3 of 16 24Tree Genetics & Genomes (2022) 18: 24
1 3
was quantified using a NanoDrop 2000 spectrophotometer,
and fragmentation was achieved using a Covaris instrument.
The fragmented DNA was purified and end-repaired, and
the fragment sizes were determined via gel electrophoresis.
After purification, an A-tailing was done at the 3 end of
the DNA fragments and then adaptors were ligated to the
end of the DNA fragments using the T4 DNA ligase. PCR
was done to amplify the adaptor-ligated DNA for the con-
struction of the sequencing library. The PCR products were
used to produce short-insert (400 bp) libraries, and libraries
sizes were detected using an Agilent 2100 bioanalyzer. The
control library was used to test the quality of sequencing.
We sequenced the complete chloroplast genomes of C.
bretschneider, related species, and varieties using the Illu-
mina NovaSeq platform (Illumina, USA) in paired-end (2 ×
150 bp) sequencing mode (LC-Bio Biotechnologies Co., Ltd,
Hangzhou, Zhejiang, China). Raw reads were filtered using
SOAPec v2.01 (Luo etal. 2012) to obtain high-quality reads.
The chloroplast genomes were assembled with A5-MiSeq
v20150522 (Coil etal. 2015) and SPAdes v3.9.0 (Bankevich
etal. 2012) using clean data. Then the filtered reads were
assembled using BLAST to C. kansuensis (MF784433), with
an >80% match cutoff and gaps filled by filtered reads with
90% similarity over 50% of the gap length.
Chloroplast genome annotation
The whole chloroplast genomes were annotated using PGA-
Plastid Genome Annotator (Qu etal. 2019) and CPGAVAS2
(Shi etal. 2019) with the default parameters. Subsequently,
all tRNAs were verified by tRNAscan-SE v2.0 (Chan and
Lowe 2019). The structure diagram of Crataegus chloroplast
genomes with annotations was obtained using CHLOROP-
LOT (Greiner etal. 2019). The GC content was calculated
using EditSeq (Burland 1999).
Repeat structure andmicrosatellite analyses
The repeat structures, including forward, reverse, comple-
mentary, and palindromic repeats, were identified using
REPuter online program (Kurtz etal. 2001). The REPuter
parameters were set to a minimal repeat size of ≥ 30 bp and
a Hamming distance of 3 (90% or greater sequence iden-
tity). Tandem repeats were identified using Tandem Repeats
Finder v4.07b (Benson 1999), and the alignment parameters
match, mismatch, and indels were set to 2, 7, and 7, respec-
tively. The minimum alignment scores to report repeats and
maximum period size were 70 bp and 500 bp, respectively.
The SSRs within these chloroplast genomes were detected
using MISA-web (Beier etal. 2017). When the SSR motif
length was 1, 2, 3, 4, 5, and 6, the minimum numbers of
repeats in the SSR search parameters were 10, 5, 4, 3, 3,
and 3, respectively. The maximum sequence length between
two SSRs for registration as a compound SSR was 100 bp.
Sequence divergence analysis
Seventeen chloroplast genomes were aligned using MAFFT
v7 (Katoh and Standley 2013) on the FFT-NS-2 module.
DNA polymorphism analyses (Sliding-window analyses)
were calculated using DnaSP v5 (Librado and Rozas 2009)
based on the alignment results to generate the nucleotide
diversity (Pi) of these chloroplast genomes. The window
length was set to 600 bp, with a step size of 200 bp.
Divergence time estimation
The chloroplast genomes of Crataegus and Amelanchier
genera used in this study were downloaded from GenBank
as follows: C. chungtienensis (KY419947), C. hupehensis
(MW201730), C. kansuensis (MF784433), C. marshallii
(MK920293), C. pinnatifida var. major (KY419945), C. pin-
natifida (MN102356), Crataegus sp. (MK920294), Mespi-
lus germanica L. (MK920295), A. alnifolia (Nutt.) Nutt. ex
M.Roem. (MN068255), A. ovalis Medik. (MK920297), A.
sanguinea (Pursh) DC. (MN068262), and A. spicata (Lam.)
K. Koch. (MK920292).
The divergence times of 17 species and varieties
were estimated by BEAST2 (Bouckaert etal. 2014). The
Amelanchier species were selected as the out-group, and the
divergence node of Crataegus and Amelanchier was con-
strained using a lognormal distribution with an offset of 45
Mya (Mya = million years ago) and a mean and standard
deviation of 0.5 (Lo etal. 2009; Lo and Donoghue 2012).
We referred to detailed parameter settings of BEAST2 from
Kim etal. (2020).
ITS andLEAFY intron 1 sequencing
We sequenced the ITS and LEAFY intron 1 of 36 accessions
representing eight species and two varieties from Crataegus.
The primers used in this study were listed in Table2. PCR
amplification was performed in a reaction mixture with a
final volume of 20 μL consisting of 1 μL of template DNA
(40–50 ng), 10 μL of Takara ExTaq® (RR001A), and 2 μL
of primers. The PCR conditions were as follows: initial
denaturation at 94 °C for 3 min; followed by 35 cycles of
30 s at 94 °C, 30 s at 50–70 °C, 1 min at 72 °C; and a final
extension of 10 min at 72 °C. PCR amplification was carried
out in a thermal cycler (Applied Biosystems, USA). PCR
products were separated on a 1.5% agarose gel in 5 × TBE
(Tris-borate-EDTA) buffer and submitted to Sangon Biotech
(Co., Ltd., Shanghai, China) for sequencing. All sequences
were aligned using MAFFT v7 with the FFT-NS-2 module.
24 Page 4 of 16 Tree Genetics & Genomes (2022) 18: 24
1 3
Phylogenetic analyses
Seventeen chloroplast genomes of Crataegus and Amelanch-
ier genera were visualized using mVISTA (Frazer etal.
2004). IQ-TREE 2 (Minh etal. 2020) was used to build a
maximum likelihood (ML) tree with the TVM+F+R3 mod-
ule and 1000 bootstrap replicates. Mrbayes v3.2.7a (Ron-
quist etal. 2012) was used to build a Bayesian inference (BI)
tree. The parameter settings were regarded Ramans method
(2020) as a reference. The ML and BI trees were visualized
using FigTree v1.4.4 (http:// tree. bio. ed. ac. uk/ softw are/ figtr
ee/). Multiple sequence alignment files of the 36 Crataegus
accessions generated from MAFFT v7 were conversed to
the rdf files using DnaSP 5.0. Network 10.2 software (Ban-
delt etal. 1999) was used to draw the haplotype network of
the ITS and LEAFY intron 1 sequences using the median-
joining module.
Results
Chloroplast genomes features ofCrataegus species
andvarieties
Five Crataegus complete chloroplast genomes sequenced
ranged from 159,644 (C. bretschneideri, ZF1H) to 159,947
bp (C. pinnatifida var. major, QJX) in length, with differ-
ences ranging from 101 to 303 bp (Fig.1, Table3, Supple-
mentary Fig.6–9). The structures of Crataegus chloroplast
genomes were similar to that of most terrestrial plants. The
genomes contained the typical quadripartite structure that
included the inverted repeats a (IRa) and inverted repeats
b (IRb) regions (26,170–26,387 bp) separated by large sin-
gle copy region (LSC, 87,694–88,303 bp) and small single
copy region (SSC, 19,140–19,273 bp). The GC contents of
the complete genomes ranged from 36.60 to 36.68%, 34.31
to 34.37% in the LSC regions, 42.64 to 42.76% in the IR
regions, and 32.73 to 32.91% in the SSC region, revealing
the high level of similarity among different Crataegus spe-
cies and varieties.
Generally, five Crataegus chloroplast genomes encoded
an identical set of 109–113 genes, including 76–79
protein-coding genes, 29 tRNA genes, and four rRNA genes
(Table3). The representative annotated chloroplast genomes
including gene number, order, and names were illustrated by
circular maps in Fig.1 and Supplementary Fig.6–9. Seven
protein-coding genes (rps19, rpl23, rpl2, ndhB, ycf2, rps12,
and rps7), seven tRNA genes (trnR-ACG , trnL-CAA , trnI-
CAU , trnI-GAU , trnA-UGC , trnV-GAC , and trnN-GUU ),
and four rRNA genes (rrn16, rrn23, rrn4.5, and rrn5) were
duplicated in the IR regions. CPGAVAS2 annotations iden-
tified four tRNA (trnG-GCC
, trnG-UCC
, trnfM-CAU
, and
trnS-GCU
) that may be trnG-UCC
, trnG-GCC
, trnM-CAU
,
and trnS-GGA
, respectively.
Among 113 unique genes identified (excluding 22 dupli-
cated genes), eight protein-coding genes (atpF, ndhA, ndhB,
petB, petD, rpoC1, rpl16, and rpl2) and six tRNA genes
(trnA-UGC
, trnG-GCC
, trnI-GAU
, trnK-UUU
, trnL-UAA
,
and trnV-UAC
) had one intron; and two protein-coding genes
(clpP and ycf3) contained two introns (Table4). In addition,
12 genes (trnK-UUU
, rps16, trnG-GCC
, atpF, rpoC1, ycf3,
trnL-UAA
, trnV-UAC
, clpP, petB, petD, and rpl16) that con-
tained one or two introns were distributed in the LSC region,
eight genes (four duplicated genes: rpl2, ndhB, trnI-GAU
, and trnA-UGC
) with one intron were located in the IR
region, and one gene (ndhA) was in the SSC region.
Variations intheborder regions
The adjacent genes and border regions of five Crataegus
chloroplast genomes were analyzed, and C. kansuensis
(MF784433) was used as a reference (Fig.2). Although
the general genomes structures, including the order and
number of genes, were relatively conserved, six Cratae-
gus chloroplast genomes exhibited visible differences at
the LSC/IRb and IRa/LSC borders. The LSC/IRb borders
differed significantly among the six Crataegus chloroplast
genomes. The IRb region expanded into the rpl19 gene
with 120 bp in the IRb region for C. maximowiczii and
C. kansuensis, while it generated distances of 25 bp and
29 bp to the junction in C. bretschneideri and C. pinnati-
fida, respectively. The IRb region expanded into the rpl2
gene by 26 bp in the LSC regions for C. maximowiczii var.
ninganensis. The IRa/LSC borders also presented several
Table 2 Information of nuclear
sequences primers used in this
study
ITS, internal transcribed spacer region; LEAFY, floral meristem identity control protein
Gene Forward and reverse primers Tm Reference sequence
ITS1-5.8S-ITS2 F: 5-TCC TCC GCT TAT TGA TAT GC-3
R: 5-GGA AGG AGA AGT CGT AAC AAGG-3
58 °C Crataegus laevi-
gata (Poir.) DC.
(EU500466)
LEAFY intron 1 F: 5-GGA TCC RGA TGC CTT CTC TGC GAA CTT
GTT CAA GTG G-3
R: 5-GTT CTT TTT GCC ACG CGC CAC CTC CCC
CGG -3
70 °C Crataegus sp.
(EU500483)
Page 5 of 16 24Tree Genetics & Genomes (2022) 18: 24
1 3
differences among the six Crataegus chloroplast genomes.
The rpl2 gene was located close to the IRa/LSC junction
with distances of 2 bp, 43 bp, and 47 bp in C. maximowic-
zii var. ninganensis, C. pinnatifida, and C. bretschneideri,
respectively. The trnH gene was located close to the junc-
tion with a distance of 57 bp in C. maximowiczii. The IRa
region expanded into the rpl19 gene by 1 bp in the LSC
region for C. kansuensis. Unlike the above junction, the
SSC/IRa and IRb/SSC junctions were relatively conserved.
The yfc1 gene crossed the SSC/IRa junction and expanded
to the same length in the SSC region (4557 bp) and the IRa
region (1074 bp) in all six Crataegus chloroplast genomes.
The ycf1 gene was located in the IRb region next to the
junction, with no gaps in C. kansuensis. The ndhF gene
was located near the junction, with no gaps. It crossed the
IRb/SSC junction and expanded by 12 bp in the IRb region
in C. maximowiczii, C. maximowiczii var. ninganensis., C.
pinnatifida, C. pinnatifida var. major, C. bretschneideri,
and C. kansuensis. The variation in these boundary regions
was responsible for the differences in length of the six
Crataegus chloroplast genomes and their LSC, IR, and
SSC regions.
Fig. 1 Gene map of Crataegus bretschneideri C. K. Schneid. chloro-
plast genome. Genes shown outside of the outer circle are transcribed
clockwise and those inside are transcribed counterclockwise. Genes
belonging to different functional groups are color-coded. The dashed
area in the inner circle indicates the GC content of the chloroplast
genome
24 Page 6 of 16 Tree Genetics & Genomes (2022) 18: 24
1 3
Table 3 Statistics on the basic features of five Crataegus chloroplast genomes
The numbers in parenthesis indicate the duplicated genes in chloroplast genomes. LSC, large single-copy region; SSC, small single-copy region;
IR, inverted repeat regions; tRNA, transfer RNA; rRNA, ribosomal RNA
Crataegus maxi-
mowiczii C. K.
Schneid.
Crataegus maximowiczii var.
ninganensis S. Q. Nie & B. J.
Jen.
Crataegus pin-
natifida Bunge
Crataegus pinnatifida
var. major N. E. Br.
Crataegus bretsch-
neideri C. K.
Schneid.
Genome size (bp) 159,945 159,916 159,749 159,947 159,644
LSC size (bp) 87,904 88,303 87,841 87,946 87,694
SSC size (bp) 19,273 19,273 19,140 19,234 19,256
IR size (bp) 52,768 52,340 52,768 52,768 52,694
Number of total genes 133 (22) 135 (22) 131 (22) 131 (22) 131 (22)
Protein coding genes 88 (10) 89 (10) 86 (10) 86 (10) 86 (10)
tRNA genes 37 (8) 37 (8) 37 (8) 37 (8) 37 (8)
rRNA genes 8 (4) 8 (4) 8 (4) 8 (4) 8 (4)
Duplicated genes in IR 17 17 17 17 17
GC content (%) 36.60 36.60 36.68 36.63 36.61
GC content in LSC (%) 34.34 34.31 34.41 34.37 34.34
GC content in SSC (%) 32.81 32.73 32.91 32.88 32.76
GC content in IR (%) 42.65 42.76 42.72 42.64 42.73
Table 4 Genes identified in five Crataegus chloroplast genomes
a Genes containing a single intron; bGenes containing two introns; cTwo gene copies in the IR regions
Category of genes Group of genes Name of genes
Genes for photosynthesis Photosystem I psaA, psaB, psaC, psaI, psaJ
Photosystem II psbA, psbB, psbC, psbD, psbE, psbF, psbI, psbJ, psbk, psbL, psbM, psbN, psbT,
psbZ, bycf3
Cytochrome b/f complex petA, apetB, apetD, petG, petL, petN
NADH-dehydrogenase andhA, andhB, ndhB, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
ATP synthase atpA, atpB, atpE, aatpF, atpH, atpI
Rubisco rbcL
Transcription and transla-
tion related genes
DNA dependent RNA polymerase rpoA, rpoB, arpoC1, rpoC2
Ribosome (large submit) rpl14, arpl16, arpl2, rpl20, rpl22, rpl23, rpl23, rpl32, rpl33, rpl36
Ribosome (small subunit) rps11, rps12, rps12, rps14, rps15, rps16, rps18, rps19, rps2, rps3, rps4, rps7,
rps8
RNA genes Ribosomal RNA rrn4.5S, rrn5S, rrn16S, rrn23S
Transfer RNA trnH-GUG , trnK-UUU , trnQ-UUG , trnG-GCC , trnR-UCU , trnC-GCA , trnD-
GUC , trnY-GUA , trnE-UUC , trnT-GGU , trnS-UGA , trnG-UCC , trnfM-CAU ,
trnT-UGU
, trnL-UAA
, trnF-GAA
, trnV-UAC
, trnM-CAU
, trnW-CCA
, trnP-
UGG
, trnL-UAG
, ctrnS-GCU , ctrnL-CAA , ctrnV-GAC , ctrnI-GAU , ctrnI-CAU ,
ctrnA-UGC , ctrnR-ACG , ctrnN-GUU
Other genes Acetyl-CoA-carboxylase accD
c-type cytochrom synthesis gene ccsA
Envelop membrane protein cemA
Protease bclpP
Translational initiation factor infA
Maturase matK
Function-unknown genes Conserved open reading frames ycf1, ycf2, ycf4
Page 7 of 16 24Tree Genetics & Genomes (2022) 18: 24
1 3
Fig. 2 Comparison of the LSC, IRs, and SSC border regions of five
Crataegus chloroplast genomes. The chloroplast genome of Cratae-
gus kansuensis E. H. Wilson is considered as a reference. LSC, large
single copy region; SSC, small single copy region; IRa, inverted
repeats a region; IRb, inverted repeats b region
Fig. 3 Type and distribution of repeated sequences and SSRs in five
Crataegus chloroplast genomes. a Repeat types number. b Number of
repeat sequences by length. c SSR type number. d Number of identi-
fied SSR motifs. Mono., Di., Tri., Tetra., and Penta. represent mono-
nucleotide, dinucleotide, trinucleotide, tetranucleotide, and pentanu-
cleotide short sequence repeats
24 Page 8 of 16 Tree Genetics & Genomes (2022) 18: 24
1 3
Repeat sequences andmicrosatellites assays
The repeat structures within each Crataegus genome, including
forward, reverse, complementary, and palindromic repeats, were
identified. Generally, 39–49 repeat sequences were observed,
including 18–28 forward repeat sequences, 17–22 palindromic
repeat sequences, and 2–4 reverse repeat sequences (Fig.3,
Supplementary Table1). One complementary repeat sequence
was identified in C. pinnatifida var. major. The lengths of the
repeat sequences in these chloroplast genomes ranged from 14
to 95 bp, with sequences between 31 and 39 bp representing
the majority (38.78–65.31%). There were relatively few repeat
sequences shorter than 30 bp (10.20–16.33%), from 40 to 49 bp
(6.12–18.37%), from 50 to 59 bp (4.08–12.24%), and 60 bp or
longer (4.08–12.24%). The most frequent length of repeats in
C. pinnatifida var. major was 32 bp. Meanwhile, 28–39 tandem
repeats were also detected in the genomes, with repeat lengths
of 30 to 150 bp (Fig.3a, b, Supplementary Table2). The repeat
sequences were mainly distributed in the LSC region. Several
protein-coding genes and tRNA genes, such as ycf1, ycf2, ndhA,
clpP, trnS-GCU
, and trnT-UGU
contained repeat sequences
(Supplementary Table1, Supplementary Table2).
We also analyzed the mononucleotide, dinucleotide,
trinucleotide, tetranucleotide, and pentanucleotide short
sequence repeats (SSRs, or microsatellites) (Fig.3c, d, Sup-
plementary Table3). In general, 68–73 microsatellites were
predicted across five Crataegus chloroplast genomes. The
mononucleotides were the most common microsatellites in
each genome, and most of these were T repeats, ranging in
quantity from 27 in C. maximowiczii and C. maximowiczii
var. ninganensis to 30 in C. pinnatifida var. major. The dinu-
cleotides were the second most numerous microsatellites,
and the main dinucleotide type was AT, with seven in C.
maximowiczii, C. maximowiczii var. ninganensis, and C. pin-
natifida and six in C. pinnatifida var. major and C. bretsch-
neideri. The trinucleotides (TAAs) were only detected in
C. pinnatifida and C. pinnatifida var. major. Each chloro-
plast genome contained three to six tetranucleotide SSRs,
of which at least two were TTTA-type SSRs. In addition, a
pentanucleotide SSR of the type ATTTA was predicted in C.
bretschneideri. Similar to the case for the repeat sequences,
73% or more SSRs were distributed in the LSC region, fol-
lowed by the SSC and IR regions. Several protein-coding
and tRNA genes contained SSRs, including matk, atpF,
rpoC1, ndhK, cemA, clpP, ycf1, ycf3, trnG-GCC
, and trnL-
UAA
(Supplementary Table3).
Sequence divergence andmutational hotspots
analyses
DNA polymorphism analyses were conducted to deter-
mine the nucleotide diversity (Pi) from the LSC, SSC,
and IR regions of five Crataegus chloroplast genomes
(Supplementary Fig.2). The SSC region showed the high-
est nucleotide diversity (0.003023), followed by the LSC
region (0.002695) and the IR region (0.000586). The most
diverse region was in the LSC region between 50,000 and
60,000 bp. In total, 16 diverse coding genes and non-cod-
ing sequences had high variability (Pi > 0.007). Twelve
mutational hotspots were located in the LSC region, includ-
ing matk, trnQ-UUG-psbK, atpI-rps2, ndhJ, atpB, atpA,
trnD-GUC
, psaB, accD, petA, clpP, and psbH-petB; three
hotspots were located in the LSC region (ycf1, ndhD, and
trnL-UAG
); trnN-GUU-TrnR-ACG
was the hotspot in the
SSC/IRb boundary region. In addition, six genes (matk,
psaB, accD, petA, clpP, and trnD-GUC
) and two spaces
(psbH-petB and trnN-GUU-trnR-ACG
) exhibited higher
variability, and these hotspots could be regarded as poten-
tial molecular markers for phylogenetic analyses due to Pi
> 0.01.
Phylogenetic analysis anddivergence time
estimation ofCrataegus andAmelanchier
Five newly sequenced chloroplast genomes of Cratae-
gus and 12 chloroplast genomes from the NCBI data-
base were used to evaluate the phylogenetic relationships
among the genera Crataegus and Amelanchier (Supple-
mentary Fig.3). The ML tree and BI tree were highly
congruent (Supplementary Fig.4). Overall, 17 species
and varieties were classified into three major clades.
Crataegus and Amelanchier were separated into two
clades. In the Crataegus clade, 11 species and varieties
were divided into two distinct subclades. C. pinnatifida,
C. pinnatifida var. major, and C. hupehensis clustered
together and showed close relationships; M. germanica
was monophyletic. Furthermore, C. maximowiczii, C.
maximowiczii var. ninganensis, and C. bretschneideri
exhibited a close relationship and clustered into a com-
mon subclade, forming a sister group to the subclade
formed by C. marshallii, Crataegus sp., C. chungtienen-
sis, and C. kansuensis.
To estimate the divergence times for Crataegus species
and varieties, the chloroplast genomes of the Amelanchier
group (A. alnifolia, A. ovalis, A. sanguinea, and A. spicata)
were selected as the out-group. The divergence clades of
these genera were the same as those for the ML and BI trees
(Fig.4, Supplementary Fig.4). It is estimated that the diver-
gence time of the two main clades was approximately 44.989
Mya (middle Eocene). Mespilus germanica and Crataegus
were differentiated around 31.171 Mya (early Oligocene).
The divergence time of C. pinnatifida, C. hupehensis, and
C. pinnatifida var. major was 15.542 Mya (middle Miocene).
C. bretschneideri and C. maximowiczii were differentiated
around 7.867 Mya (late Miocene).
Page 9 of 16 24Tree Genetics & Genomes (2022) 18: 24
1 3
Phylogenetic analyses ofCrataegus accessions
based onnuclear sequences
The ITS and LEAFY intron 1 sequences were used to reveal
the relationships among 36 Crataegus accessions, which
belong to eight species and two varieties. ML and BI trees
had the same structure, constructed using ITS and LEAFY
intron 1 sequences (Fig.5). The ITS sequences resolved
three major clades within Crataegus, labeled A–C (Fig.5a).
Clade A was made up of C. maximowiczii, C. maximowic-
zii var. ninganensis, C. sanguinea, and C. altaica; clade B
contained C. scabrifolia, three C. bretschneideri materials,
two C. pinnatifida var. major materials, one C. hupehensis
material, and one C. pinnatifida material; clade C included
six C. pinnatifida materials, six C. bretschneideri materials,
two C. pinnatifida var. major materials, two C. hupehensis
materials, one C. maximowiczii materials, and C. songarica.
The LEAFY intron 1 sequences resolved three major clades
within Crataegus, labeled A–C (Fig.5b). Clade A included
C. hupehensis, C. pinnatifida var. major, C. scabrifolia, and
four C. pinnatifida materials; clade B included C. maximo-
wiczii, C. maximowiczii var. ninganensis, C. sanguinea, and
C. altaica; clade C was a sister group of clade B, and it
contained C. bretschneideri, C. songarica, and three C. pin-
natifida materials.
The median-joining haplotype networks constructed using
Network 10.0 produced 21 and 30 active haplotypes gener-
ated from the ITS and LEAFY intron 1 sequences, respec-
tively (Supplementary Fig.5, Supplementary Table4). Hap_1
belonged to the network constructed by ITS sequence was the
principal haplotype consisting of C. bretschneideri, C. pinnati-
fida, C. pinnatifida var. major, C. songarica, and C. hupehen-
sis accessions (Supplementary Fig.5a). Hap_2–6 were derived
from Hap_1. The haplotypes generated from the LEAFY
intron 1 sequence were mainly divided into three groups
(Supplementary Fig.5b): group 1 contained Hap_10, Hap_11,
Hap_12, Hap_14, and Hap_15, which mainly consisted of C.
maximowiczii, C. sanguinea, and C. altaica; group 2 con-
sisted mainly of C. bretschneideri (Hap_1, Hap_2, Hap_4,
and Hap_13) and partial C. pinnatifida (Hap_3 and Hap_5);
group 3 contained Hap_6, Hap_7, Hap_8, and Hap_16, which
consisted mainly of C. pinnatifida, C. pinnatifida var. major,
C. songarica, and C. hupehensis accessions.
Discussion
The complete chloroplast genome is remarkably conserved
in size and structure, with a relatively slow evolution rate
involving few gains and losses (Grewe etal. 2013). Five
Fig. 4 Divergence time estimation for Crataegus and Amelanchier
based on the chloroplast genomes. The number at each node repre-
sents the median divergence time, and the node bars represent 95%
HPD (highest posterior density). The accession numbers in Gen-
bank (C. chungtienensis (KY419947), C. hupehensis (MW201730),
C. kansuensis (MF784433), C. marshallii (MK920293), C. pinnati-
fida var. major (KY419945), C. pinnatifida (MN102356), Crataegus
sp. (MK920294), Mespilus germanica (MK920295), A. alnifolia
(MN068255), A. ovalis (MK920297), A. sanguinea (MN068262),
and A. spicata (LMK920292)) are listed here. The ruler on the lower
left represents the geologic timescale. Paleogene (23.03–66 Mya);
Eocene (33.90–55.80 Mya); OLI (Oligocene, 23.03–33.90 Mya);
Neogene (0–23.03 Mya); Miocene (5.33–23.03 Mya); Pliocene
(1.81–5.33 Mya); PLE (Pleistocene, 0.01–1.81 Mya)
24 Page 10 of 16 Tree Genetics & Genomes (2022) 18: 24
1 3
newly sequenced Crataegus chloroplast genomes had typi-
cal quadripartite structures with LSC, SSC, and IR regions.
The structures of these genomes were similar to those of
the previously reported chloroplast genomes of C. pinnati-
fida (He etal. 2020) and C. hupehensis (Hu etal. 2021).
The chloroplast genomes of vegetable and fruit species
are 120–160 kb in length, while those of cereal species are
110–140 kb in length (Daniell etal. 2021). In this study, the
chloroplast genomes were conserved and similar in length,
ranging from 159,644 bp in C. bretschneideri to 159,947
bp in C. pinnatifida var. major. Five Crataegus chloroplast
genomes encoded 131/135 genes. The LSC regions were
87,694–88,303 bp, and the SSC regions were 19,140–19,273
bp. The pair of inverted IRa/IRb regions was 26,170–26,384
bp (Table3). The GC contents and composition of C.
bretschneideri and related species were very similar, indicat-
ing that these Crataegus chloroplast genomes were relatively
conserved.
Fig. 5 Phylogenetic trees of Crataegus accessions using maximum
likelihood (ML) and Bayesian inference (BI) based on ITS (a) and
LEAFY (b) sequences. The midpoints in ML and BI analyses are
listed above the branches (ML/BI), and the root is positioned at the
midpoint between the two longest branches. The color of each acces-
sion represents the different accession of Crataegus.
Page 11 of 16 24Tree Genetics & Genomes (2022) 18: 24
1 3
The fluctuating lengths of IRs are the main contributors
to increases or decreases in the cpDNA sizes of many angio-
sperms (Wolf etal. 2010). Dynamic expansion of IRs in
groups such as Geraniaceae (Weng etal. 2014), Mimosoid
legumes (Dugas etal. 2015), and the contraction of IRs in
the Lauraceae family (Song etal. 2015) have previously been
reported. Our results showed that the length of IRs regions
among five Crataegus chloroplast genomes did not exhibit a
significant change, ranging only from 26,168 to 26,384 bp.
Two junctions across the genomes showed high similarity,
particularly for the ndhF gene located at the border between
IRb and SSC regions, and ycf1 at the border between IRa
and SSC regions (Fig.2). mVISTA results (Supplementary
Fig.3) indicated that the LSC and SSC regions were rela-
tively more divergent than the IRs regions.
Previous researchers have found that larger and more
complex repeat sequences contribute more to sequence
arrangements and the evolution of the chloroplast genome
(Huang etal. 2014; Weng etal. 2014). We have identified
four types of repeat sequences in five Crataegus chloroplast
genomes (Supplementary Tables1, Supplementary Table2).
There were significant differences in the number and posi-
tion of dispersed repeats within the genomes. The repeat
sequences were mainly distributed in the LSC region, and
C. bretschneideri had the lowest number of dispersed repeats
(37), and tandem repeats (37). The copy number variability
of SSRs is highly polymorphic among chloroplast genomes,
and these could be employed as molecular markers in popu-
lation genetics, phylogeography, and species identification
(Xue etal. 2012; Wang etal. 2013). In this study, we identi-
fied numerous SSRs within the five genomes, and 68–73
microsatellites were predicted (Supplementary Tables3).
Among these SSRs, 73% or more SSRs were distributed
in the LSC region. Furthermore, the SSRs were composed
mainly of thymine (T) or adenine (A) repeats. These results
were consistent with those of related studies and indicate
that SSRs contribute to the AT richness of plastid genomes
(Nie etal. 2012). Furthermore, our findings were identical to
those produced by previous studies that the protein-coding
genes, clpP and ycf1, contained both repeat sequences and
SSRs (Curci etal. 2015; Zhao etal. 2015; Li etal. 2020a).
Overall, these polymorphic sites may be considered as
potential molecular markers in further studies of species
delimitation and phylogeny in Crataegus.
Multigenome comparisons can facilitate explorations of
mutational hotspots which were used for interspecies dis-
crimination and phylogenetic studies at the species level
(Yang etal. 2018; Abdullah etal. 2019). Several plas-
tid DNA markers generated from coding and non-coding
regions with higher levels of variation could be applied to
resolve the phylogenetic problems of different plant species.
Prior successes have been found with the coding genes ycf1
in Debregeasia (Wang etal. 2020); accD in Artemisia (Kim
etal. 2020); the first and second exons of clpP, the first
intron of clpP, the first exon of atpF, the second intron of
ycf3, matK, and ndhF in Abelmoschus (Li etal. 2020b). In
this study, we proposed a set of eight divergent coding genes
and non-coding sequences (Pi > 0.01) from Crataegus,
including matk, psaB, accD, petA, clpP, trnD-GUC , psbH-
petB, and trnN-GUU-trnR-ACG . These mutational hotspots
could resolve taxonomic discrepancies and provide genetic
barcodes for the Crataegus genus.
Phylogenetic relationships in Rosaceae have been prob-
lematic because of frequent hybridization, apomixis, pre-
sumed rapid radiation, and complex historical diversifica-
tion (Xue etal. 2019). The chloroplast genome has typical
maternal inheritance characteristics. A growing number of
studies have used the complete chloroplast genome to evalu-
ate phylogenetic relationships among plants (Daniell etal.
2021). In this study, 17 chloroplast genomes were used to
construct the ML tree and BI tree (Supplementary Fig.4).
The clade B included the East Asian species C. maximow-
iczii, C. bretschneideri, C. chungtienensis, and C. kansuen-
sis and eastern North American species C. marshallii and
M. germanica. These results supported Phipps’s hypothesis
(1990) that Crataegus has migrated eastward from East Asia
to North America. In addition, Mespilus and Crataegus are
sister genera in the Rosaceae tribe Pyrea based on the phy-
logeny analyses of nuclear sequences and intergenic cpDNA
regions (Lo etal. 2007; Talent etal. 2008). The phyloge-
netic trees in this study also reflect the close relationship
between Mespilus and Crataegus (Supplementary Fig.4).
However, Phipps (2016) clarified the morphological distinc-
tion between Mespilus and Crataegus and argued for the
retention of a monotypic Mespilus.
The divergence times of 17 Crataegus and Amelanchier
accessions were estimated. One constraint was based on the
oldest fossil record of Amelanchier leaves from the Mid-
dle Eocene, or approximately 40 Mya, around One Mile
Creek, Princeton, British Columbia (Wolfe and Wehr 1988).
Another constraint was that Amelanchier and Crataegus
were expected to have differentiated from each other around
45 Mya (Lo and Donoghue 2012). Our results found that
Amelanchier and Crataegus differentiated around 44.989
Mya; Mespilus germanica and Crataegus were differentiated
around 31.171 Mya. These two divergence times were quite
similar to those reported by Lo and Donoghue (2012). Mon-
soon weather patterns during the Miocene affected Asian
vegetation due to the slightly warmer and wetter climate
(Su etal. 2013). In the late Miocene (5.333–11.63 Mya),
intraspecific migration and interspecific divergence events
occurred among Crataegus species (Wen etal. 2016; Du
etal. 2019). The divergence time of C. pinnatifida origi-
nated in Southwest China was around 15.542 Mya while
the divergence time of C. pinnatifida originated in Northeast
China was around 7.148 Mya. This result indicated that C.
24 Page 12 of 16 Tree Genetics & Genomes (2022) 18: 24
1 3
pinnatifida may have migrated from Southwest to North-
east China. Moreover, C. bretschneideri and C. maximow-
iczii were differentiated around 7.867 Mya. This divergence
time was overlapped with that of C. pinnatifida originated
in Northeast China.
Hybrids are often found in areas where different spe-
cies overlap and crossbreed (Bugaj-Nawrocka etal. 2020).
The intraspecific hybridization of Crataegus species often
occurs where the distribution areas of different species over-
lap (Talent and Dickinson 2007). The conflicts detected in
the sequencing results of chloroplasts, ITS, and LEAFY
intron 1 suggested that three species (C. marshalli, Cratae-
gus spathulata Michx., and Crataegus phaenopyrum (L. f.)
Medik.) from the southeastern USA were hybrids derived
from European and North American ancestors (Lo etal.
2009). Thus, Phipps (2005) posited that hybridization is a
potential explanatory factor for speciation in Crataegus. C.
bretschneideri, C. maximowiczii, and partial C. pinnatifida
originated in Northeast China. Several studies consider that
C. bretschneideri was the variety of C. pinnatifida (Dai etal.
2007; Guo and Jiao 1995). Du etal. (2019) found that C.
bretschneideri was closely related to C. pinnatifida based
on SSRs and SLAF-seq data and that gene flow occurred
from C. maximowiczii to C. bretschneideri. In this study, the
structure of the chloroplast phylogenetic tree was similar to
those of Wu etal. (2008). C. bretschneideri and C. maximo-
wiczii were closely related from the perspective of maternal
inheritance (Fig.4). However, the phylogenetic trees con-
structed by LEAFY intron 1 and ITS sequences showed that
C. bretschneideri and C. pinnatifida had a closer relation-
ship (Fig.5). C. bretschneideri and C. pinnatifida accessions
shared the same haplotype in the network constructed by
ITS sequences (Supplementary Fig.5). On the contrary, the
LEAFY intron 1 haplotype of C. bretschneideri was inde-
pendent in the middle of the network, which was linked with
group 1 haplotypes (C. maximowiczii and C. sanguinea) and
group 3 haplotypes (MDFSLH, C. pinnatifida). In general,
our studies revealed that C. bretschneideri was an independ-
ent species. C. maximowiczii may be the maternal origin
of C. bretschneideri. MDFSLH (C. pinnatifida) may be the
paternal origin of C. bretschneideri.
Conclusion
The comparative analyses of five Crataegus chloroplast
genomes have provided rich genome data for further stud-
ies of Crataegus genetic diversity. In total, 403 repeats
sequences, 352 SSRs, and 8 mutational hotspots could be
applied for the development of molecular markers related
to Crataegus phylogeny. Furthermore, the chloroplast
genomes and nuclear sequences supported the proposal that
C. bretschneideri is an independent species and may be of
hybrid origin. The present work will promote the identifica-
tion and conservation of Crataegus in the future.
Supplementary Information The online version contains supplemen-
tary material available at https:// doi. org/ 10. 1007/ s11295- 022- 01556-9.
Funding This work was supported by “The Conservation and Utiliza-
tion of Crop Germplasm Resource–Hawthorn (Project Nos. 19190178;
19200357).”
Declarations
Conflict of interest The authors declare no competing interests.
Data archiving statement The raw sequences of chloroplast genomes
reported in this paper have been deposited in the Genome Sequence
Archive (Genomics, Proteomics & Bioinformatics 2017) in National
Genomics Data Center (Nucleic Acids Res 2021), China National
Center for Bioinformation/Beijing Institute of Genomics, Chinese
Academy of Sciences, under accession number CRA004494 that are
publicly accessible at https:// ngdc. cncb. ac. cn/ gsa. All the nuclear
sequences have been uploaded to the GenBank (https:// www. ncbi.
nlm. nih. gov/ genba nk/) with the accession numbers: MZ688339–
MZ688374; MZ686456–MZ686491.
References
Abdullah SI, Mehmood F, Ali Z, Malik MS, Waseem S, Mirza B,
Ahmed I, Waheed MT (2019) Comparative analyses of chloroplast
genomes among three Firmiana species: identification of muta-
tional hotspots and phylogenetic relationship with other species
of Malvaceae. Plant Gene 19:100199. https:// doi. org/ 10. 1016/j.
plgene. 2019. 100199
Bandelt HJ, Forster P, Röhl A (1999) Median-joining networks for
inferring intraspecific phylogenies. Mol Biol Evol 16:37–48.
https:// doi. org/ 10. 1093/ oxfor djour nals. molbev. a0260 36
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov
AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin
AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner
PA (2012) SPAdes: a new genome assembly algorithm and its
applications to single-cell sequencing. J Comput Biol 19:455–477.
https:// doi. org/ 10. 1089/ cmb. 2012. 0021
Beier S, Thiel T, Münch T, Scholz U, Mascher M (2017) MISA-web: a
web server for microsatellite prediction. Bioinformatics 33:2583–
2585. https:// doi. org/ 10. 1093/ bioin forma tics/ btx198
Benson G (1999) Tandem repeats finder: a program to analyze DNA
sequences. Nucleic Acids Res 27:573–580. https:// doi. org/ 10.
1093/ nar/ 27.2. 573
Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, Suchard
MA, Rambaut A. Drummond AJ (2014) Beast 2: a software plat-
form for bayesian evolutionary analysis. PLoS Comput Biol 10:
e1003537. https:// doi. org/ 10. 1371/ journ al. pcbi. 10035 37
Brown JA, Beatty GE, Finlay CMV, Montgomery WI, Tosh DG, Pro-
van J (2016) Genetic analyses reveal high levels of seed and pollen
flow in hawthorn (Crataegus monogyna Jacq.), a key component
of hedgerows. Tree Genet Genomes 12:58. https:// doi. org/ 10.
1007/ s11295- 016- 1020-0
Bugaj-Nawrocka A, Sawka-Gądek N, Chłond D (2020) Prediction of
hybridisation zones of selected species of the genus Platymeris
(Hemiptera: Reduviidae) supported by laboratory crossbreeding.
Austral Entomol 59:323–336. https:// doi. org/ 10. 1111/ aen. 12452
Page 13 of 16 24Tree Genetics & Genomes (2022) 18: 24
1 3
Burland TG (1999) DNASTAR’s Lasergene sequence analysis soft-
ware. In: Misener S, Krawetz SA (eds) Bioinformatics methods
and protocols. Humana Press, Totowa. https:// doi. org/ 10. 1385/1-
59259- 192-2: 71
Chan PP, Lowe TM (2019) tRNAscan-SE: Searching for tRNA genes in
genomic sequences. In: Kollmar M (ed) Gene prediction: methods
and protocols. New York, Springer New York. https:// doi. org/ 10.
1007/ 978-1- 4939- 9173-0_1
Chang HM, But PPH, Yao SC (1986) Pharmacology and applications
of Chinese materia medica. World Sci. https:// doi. org/ 10. 1142/
0284
Chang Q, Zuo Z, Harrison F, Chow MSS (2002) Hawthorn. J Clin
Pharmacol 42:605–612. https:// doi. org/ 10. 1177/ 00970 00204
20060 03
Christensen KI (1984) The morphological variation of some Crataegus
populations (Rosaceae) in Greece and Yugoslavia. Nord J Bot
4:585–595. https:// doi. org/ 10. 1111/j. 1756- 1051. 1984. tb019 83.x
Coil D, Jospin G, Darling AE (2015) A5-miseq: an updated pipeline to
assemble microbial genomes from Illumina MiSeq data. Bioinfor-
matics 31:587–589. https:// doi. org/ 10. 1093/ bioin forma tics/ btu661
Curci PL, Paola DD, Danzi D, Vendramin GG, Sonnante G (2015)
Complete chloroplast genome of the multifunctional crop globe
artichoke and comparison with other Asteraceae. PLoS One
10:e0120589. https:// doi. org/ 10. 1371/ journ al. pone. 01205 89
Dai H, Zhang Z, Zhou C, Li H, Guo X (2007) Establishment and opti-
mization of ISSR system in Crataegus spp. J Fruit Sci:3
Daniell H, Jin S, Zhu XG, Gitzendanner MA, Soltis DE, Soltis PS
(2021) Green giant—a tiny chloroplast genome with mighty
power to produce high-value proteins: history and phylogeny.
Plant Biotechnol J 19:430–447. https:// doi. org/ 10. 1111/ pbi. 13556
Dickinson TA, Belaoussoff S, Love RM, Muniyamma M (1996) North
American black-fruited hawthorns. I. Variation in floral construc-
tion, breeding system correlates, and their possible evolutionary
significance in Crataegus sect. Douglasii London. Folia Geobot
31:355–371. https:// doi. org/ 10. 1007/ BF028 15380
Dong W, Li Z (2015) The science and practice of Chinese fruit tree:
Hawthorn. Science Press, Shanxi
Du X, Zhang X, Bu H, Zhang T, Lao Y, Dong W (2019) Molecular
analysis of evolution and origins of cultivated hawthorn (Cratae-
gus spp.) and related species in China. Front. Plant Sci 10:443.
https:// doi. org/ 10. 3389/ fpls. 2019. 00443
Dugas DV, Hernandez D, Koenen EJM, Schwarz E, Straub S, Hughes
CE, Jansen RK, Nageswara-Rao M, Staats M, Trujillo JT, Haj-
rah NH, Alharbi NS, Al-Malki AL, Sabir JS, Bailey CD (2015)
Mimosoid legume plastome evolution: IR expansion, tandem
repeat expansions and accelerated rate of evolution in clpP. Sci
Rep 5:16958. https:// doi. org/ 10. 1038/ srep1 6958
Emami A, Shabanian N, Rahmani MS, Khadivi A, Mohammad-Panah
N (2018) Genetic characterization of the Crataegus genus: impli-
cations for in situ conservation. Sci Hortic 231:56–65. https:// doi.
org/ 10. 1016/j. scien ta. 2017. 12. 014
Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I (2004)
VISTA: computational tools for comparative genomics. Nucleic
Acids Res 32:W273–W279. https:// doi. org/ 10. 1093/ nar/ gkh458
Friderico F (1903) Crataegus bretschneideri. Repertorium specierum
novarum regni vegetabilis. 3:233
Gitzendanner MA, Soltis PS, Wong GKS, Ruhfel BR, Soltis DE (2018)
Plastid phylogenomic analysis of green plants: a billion years of
evolutionary history. Am J Bot 105:291–301. https:// doi. org/ 10.
1002/ ajb2. 1048
Gosler AG, Kelly CK, Blakey JK (1994) Phenotypic plasticity in leaf
morphology of Crataegus monogyna (Rosaceae): an experimental
study with taxonomic implications. Bot J Linn Soc 115:211–219.
https:// doi. org/ 10. 1111/j. 1095- 8339. 1994. tb017 79.x
Greiner S, Lehwark P, Bock R (2019) OrganellarGenomeDRAW
(OGDRAW) version 1.3.1: expanded toolkit for the graphical
visualization of organellar genomes. Nucleic Acids Res 47:W59–
W64. https:// doi. org/ 10. 1093/ nar/ gkz238
Grewe F, Guo W, Gubbels EA, Hansen AK, Mower JP (2013) Com-
plete plastid genomes from Ophioglossum californicum, Psilo-
tum nudum, and Equisetum hyemale reveal an ancestral land plant
genome structure and resolve the position of Equisetales among
monilophytes. BMC Evol Biol 13:8. https:// doi. org/ 10. 1186/
1471- 2148- 13-8
Guo T, Jiao P (1995) Hawthorn (Crataegus) resources in China.
HortScience 30:1132–1134. https:// doi. org/ 10. 21273/ HORTS
CI. 30.6. 1132
He SL, Xie J, Yang Y, Tian Y (2020) Chloroplast genome for Cra-
taegus pinnatifida (Rosaceae) and phylogenetic analyses with
its coordinal species. Mitochondrial DNA Part B 5:2097–2098.
https:// doi. org/ 10. 1080/ 23802 359. 2019. 16672 73
Hu G, Zheng S, Pan Q, Dong N (2021) The complete chloroplast
genome of Crataegus hupehensis Sarg. (Rosaceae), a medicinal
and edible plant in China. Mitochondrial DNA Part B 6:315–317.
https:// doi. org/ 10. 1080/ 23802 359. 2020. 18664 64
Huang H, Shi C, Liu Y, Mao SY, Gao LZ (2014) Thirteen Camel-
lia chloroplast genome sequences determined by high-through-
put sequencing: genome structure and phylogenetic rela-
tionships. BMC Evol Biol 14:151. https:// doi. org/ 10. 1186/
1471- 2148- 14- 151
Katoh K, Standley DM (2013) MAFFT multiple sequence alignment
software version 7: improvements in performance and usabil-
ity. Mol Biol Evol 30:772–780. https:// doi. org/ 10. 1093/ molbev/
mst010
Kim GB, Lim CE, Kim JS, Kim K, Lee JH, Yu HJ, Mun JH (2020)
Comparative chloroplast genome analysis of Artemisia (Aster-
aceae) in East Asia: insights into evolutionary divergence and
phylogenomic implications. BMC Genomics 21:415. https:// doi.
org/ 10. 1186/ s12864- 020- 06812-7
Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J,
Giegerich R (2001) REPuter: the manifold applications of repeat
analysis on a genomic scale. Nucleic Acids Res 29:4633–4642.
https:// doi. org/ 10. 1093/ nar/ 29. 22. 4633
Li C, Zhao Y, Xu Z, Yang G, Peng J, Peng X (2020a) Initial characteri-
zation of the chloroplast genome of Vicia sepium, an important
wild resource plant, and related inferences about its evolution.
Front Genet 11:73. https:// doi. org/ 10. 3389/ fgene. 2020. 00073
Li J, Ye GY, Liu HL, Wang ZH (2020b) Complete chloroplast genomes
of three important species, Abelmoschus moschatus, A. manihot
and A. sagittifolius: Genome structures, mutational hotspots,
comparative and phylogenetic analysis in Malvaceae. PLoS One
15:e0242591. https:// doi. org/ 10. 1371/ journ al. pone. 02425 91
Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive
analysis of DNA polymorphism data. Bioinformatics 25:1451–
1452. https:// doi. org/ 10. 1093/ bioin forma tics/ btp187
Liu BB, Hong DY, Zhou SL, Xu C, Dong WP, Johnson G, Wen J
(2019) Phylogenomic analyses of the Photinia complex support
the recognition of a new genus Phippsiomeles and the resurrec-
tion of a redefined Stranvaesia in Maleae (Rosaceae). J Syst Evol
57:678–694. https:// doi. org/ 10. 1111/ jse. 12542
Lo EYY, Donoghue MJ (2012) Expanded phylogenetic and dating
analyses of the apples and their relatives (Pyreae, Rosaceae). Mol
Phylogenet Evol 63:230–243. https:// doi. org/ 10. 1016/j. ympev.
2011. 10. 005
Lo EYY, Stefanović S, Dickinson TA (2007) Molecular reappraisal
of relationships between Crataegus and Mespilus (Rosaceae,
Pyreae)—Two genera or one? Syst Bot 32:596–616. https:// doi.
org/ 10. 1600/ 03636 44077 82250 562
Lo EYY, Stefanović S, Christensen KI, Dickinson TA (2009) Evi-
dence for genetic association between East Asian and western
North American Crataegus L. (Rosaceae) and rapid divergence
of the eastern North American lineages based on multiple DNA
24 Page 14 of 16 Tree Genetics & Genomes (2022) 18: 24
1 3
sequences. Mol Phylogenet Evol 51:157–168. https:// doi. org/ 10.
1016/j. ympev. 2009. 01. 018
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q,
Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu
Y, Han C etal (2012) SOAPdenovo2: an empirically improved
memory-efficient short-read de novo assembler. Gigascience 1:18.
https:// doi. org/ 10. 1186/ 2047- 217X-1- 18
Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD,
Von Haeseler A, Lanfear R (2020) IQ-TREE 2: new models and
efficient methods for phylogenetic inference in the genomic era.
Mol Biol Evol 37:1530–1534. https:// doi. org/ 10. 1093/ molbev/
msaa0 15
Nazhand A, Lucarini M, Durazzo A, Zaccardelli M, Cristarella S,
Souto SB, Silva AM, Severino P, Souto EB, Santini A (2020)
Hawthorn (Crataegus spp.): an updated overview on its beneficial
properties. Forests 11:564. https:// doi. org/ 10. 3390/ f1105 0564
Nie X, Lv S, Zhang Y, Du X, Wang L, Biradar SS, Tan X, Wan F,
Weining S (2012) Complete chloroplast genome sequence of a
major invasive species, crofton weed (Ageratina adenophora).
PLoS One 7:e36869. https:// doi. org/ 10. 1371/ journ al. pone. 00368
69
Phipps JB (1990) Mespilus canescens, a new rosaceous endemic from
Arkansas. Syst Bot 15:26–32. https:// doi. org/ 10. 2307/ 24190 13
Phipps JB (2005) A review of hybridization in North American haw-
thorns. Another look at “the Crataegus problem”. Ann Missouri
Bot Gard 92: 113-126. https:// www. jstor. org/ stable/ 32986 51
Phipps JB (2016) Studies in Mespilus, Crataegus, and × Crataemespi-
lus (Rosaceae), II. The academic and folk taxonomy of the med-
lar, Mespilus germanica, and hawthorns, Crataegus (Rosaceae).
Phytotaxa 260:25–35. https:// doi. org/ 10. 11646/ phyto taxa. 260.1.3
Qu XJ, Moore MJ, Li DZ, Yi TS (2019) PGA: a software package for
rapid, accurate, and flexible batch annotation of plastomes. Plant
Methods 15:50. https:// doi. org/ 10. 1186/ s13007- 019- 0435-7
Raman G, Park KT, Kim JH, Park S (2020) Characteristics of the
completed chloroplast genome sequence of Xanthium spinosum:
comparative analyses, identification of mutational hotspots and
phylogenetic implications. BMC Genomics 21:855. https:// doi.
org/ 10. 1186/ s12864- 020- 07219-0
Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna
S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes
3.2: efficient bayesian phylogenetic inference and model choice
across a large model space. Syst Bot 61:539–542. https:// doi. org/
10. 1093/ sysbio/ sys029
Shi L, Chen H, Jiang M, Wang L, Wu X, Huang L, Liu C (2019)
CPGAVAS2, an integrated plastome sequence annotator and ana-
lyzer. Nucleic Acids Res 47:W65–W73. https:// doi. org/ 10. 1093/
nar/ gkz345
Sokolov SJ (1954) Trees and shrubs of the U. S. S. R. III. Angiosper-
mae, Families Trochodendraceae- Rosaceae, Izdateljstvo Aka-
demii Nauk SSSR, Moskva Leningrad Press
Song Y, Dong W, Liu B, Xu C, Yao X, Gao J, Corlett RT (2015) Com-
parative analysis of complete chloroplast genome sequences of
two tropical trees Machilus yunnanensis and Machilus balansae
in the family Lauraceae. Front Plant Sci 6:662. https:// doi. org/ 10.
3389/ fpls. 2015. 00662
Su T, Jacques FMB, Spicer RA, Liu YS, Huang YJ, Xing YW, Zhou
ZK (2013) Post-Pliocene establishment of the present monsoonal
climate in SW China: evidence from the late Pliocene Longmen
megaflora. Clim Past 9:1911–1920. https:// doi. org/ 10. 5194/
cp-9- 1911- 2013
Talent N, Dickinson TA (2007) The potential for ploidy level increases
and decreases in Crataegus (Rosaceae, Spiraeoideae, tribe
Pyreae). Can J Bot 85:570–584. https:// doi. org/ 10. 1139/ B07- 028
Talent N, Eckenwalder JE, Lo EYY, Christensen KI, Dickinson TA
(2008) Proposal to conserve the names Crataegus L. against
Mespilus L. (Rosaceae). Taxon 57:1007–1008. https:// doi. org/
10. 1002/ tax. 573042
Verbylaitė R, Ford-Lloyd B, Newbury J (2006) The phylogeny of
woody Maloideae (Rosaceae) using chloroplast trnL-trnF
sequence data. Biologija 52:60–63
Wang S, Shi C, Gao LZ (2013) Plastid genome sequence of a wild
woody oil species, Prinsepia utilis, provides insights into evolu-
tionary and mutational patterns of Rosaceae chloroplast genomes.
PLoS One 8:e73946. https:// doi. org/ 10. 1371/ journ al. pone. 00739
46
Wang RN, Milne RI, Du XY, Liu J, Wu ZY (2020) Characteristics and
mutational hotspots of plastomes in Debregeasia (Urticaceae).
Front Genet 11:729. https:// doi. org/ 10. 3389/ fgene. 2020. 00729
Wen J, Nie ZL, Ickert-Bond SM (2016) Intercontinental disjunctions
between eastern Asia and western North America in vascular
plants highlight the biogeographic importance of the Bering land
bridge from late Cretaceous to Neogene. J Syst Evol 54:469–490.
https:// doi. org/ 10. 1111/ jse. 12222
Weng ML, Blazier JC, Govindu M, Jansen RK (2014) Reconstruction
of the ancestral plastid genome in Geraniaceae reveals a corre-
lation between genome rearrangements, repeats, and nucleotide
substitution rates. Mol Biol Evol 31:645–659. https:// doi. org/ 10.
1093/ molbev/ mst257
Wolf PG, Roper JM, Duffy AM (2010) The evolution of chloroplast
genome structure in ferns. Genome 53:731–738. https:// doi. org/
10. 1139/ G10- 061
Wolfe JA, Wehr W (1988) Rosaceous Chamaebatiaria-like foliage from
the Paleogene of western North America. Aliso: J Syst Florist Bot
12:177–200. https:// doi. org/ 10. 5642/ aliso. 19881 201. 14
Wu F, Zhang Z, Dai H, Zhang Y, Chang L (2008) Genetic relationships
of some hawthorns (Crataegus spp.) derived from cp DNA PCR-
RFLP. J Shenyang Agric Univ 39:664–668
Wu J, Peng W, Qin R, Zhou H (2014) Crataegus pinnatifida: Chemical
constituents, pharmacology, and potential applications. Molecules
19:1685–1712. https:// doi. org/ 10. 3390/ molec ules1 90216 85
Xu J, Zhao Y, Zhang X, Zhang L, Hou Y, Dong W (2016) Transcrip-
tome analysis and ultrastructure observation reveal that hawthorn
fruit softening is due to cellulose/hemicellulose degradation. Front
Plant Sci 7:1524. https:// doi. org/ 10. 3389/ fpls. 2016. 01524
Xue J, Wang S, Zhou SL (2012) Polymorphic chloroplast microsatel-
lite loci in Nelumbo (Nelumbonaceae). Am J Bot 99:e240–e244
Xue S, Shi T, Luo W, Ni X, Iqbal S, Ni Z, Huang X, Yao D, Shen Z,
Gao Z (2019) Comparative analysis of the complete chloroplast
genome among Prunus mume, P. armeniaca, and P. salicina. Hor-
tic. Res 6:89. https:// doi. org/ 10. 1038/ s41438- 019- 0171-1
Yang Z, Zhao T, Ma Q, Liang L, Wang G (2018) Comparative genom-
ics and phylogenetic analysis revealed the chloroplast genome
variation and interspecific relationships of Corylus (Betulaceae)
species. Front Plant Sci 9:927. https:// doi. org/ 10. 3389/ fpls. 2018.
00927
Zarrei M, Talent N, Kuzmina M, Lee J, Lund J, Shipley PR, Stefanović
S, Dickinson TA (2015) DNA barcodes from four loci provide
poor resolution of taxonomic groups in the genus Crataegus. AoB
Plant 7:plv045. https:// doi. org/ 10. 1093/ aobpla/ plv045
Zhang Y, Dai H, Zhang Q, Li H, Zhang Z (2008) Assessment of genetic
relationship in Crataegus genus by the apple SSR primers. J Fruit
Sci 25:521–525. https:// doi. org/ 10. 13925/j. cnki. gsxb. 2008. 04. 037
Zhang SD, Jin JJ, Chen SY, Chase MW, Soltis DE, Li HT, Yang JB, Li
DZ, Yi TS (2017) Diversification of Rosaceae since the Late Cre-
taceous based on plastid phylogenomics. New Phytol 214:1355–
1367. https:// doi. org/ 10. 1111/ nph. 14461
Zhang LL, Zhang LF, Xu JG (2020a) Chemical composition, antibac-
terial activity and action mechanism of different extracts from
hawthorn (Crataegus pinnatifida Bge.). Sci Rep 10:8876. https://
doi. org/ 10. 1038/ s41598- 020- 65802-7
Page 15 of 16 24Tree Genetics & Genomes (2022) 18: 24
1 3
Zhang X, Wang Y, Wang M, Yuan Q, Huang L (2020b) The complete
chloroplast genome of the Crataegus kansuensis (Rosaceae): char-
acterization and phylogeny. Mitochondrial DNA Part B 5:2920–
2921. https:// doi. org/ 10. 1080/ 23802 359. 2020. 17923 68
Zhang X, Du X, Sun X, Wang J, Dong W (2021) Construction of
molecular identity for partial Crataegus resources based on SSR
markers. J Shenyang Agric Univ 52:153–159. https:// doi. org/ 10.
3969/j. issn. 1000- 1700. 2021. 02. 004
Zhao Y, Yin J, Guo H, Zhang Y, Xiao W, Sun C, Wu J, Qu X, Yu J,
Wang X, Xiao J (2015) The complete chloroplast genome provides
insight into the evolution and polymorphism of Panax ginseng.
Front Plant Sci 5:696. https:// doi. org/ 10. 3389/ fpls. 2014. 00696
Publisher’s note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
24 Page 16 of 16 Tree Genetics & Genomes (2022) 18: 24
Article
Crataegus is an economically important plant due to its medicinal and health-promoting properties. Flavonoids are the main functional components of Crataegus fruit. Fruits of naturally pollinated Crataegus maximowiczii possess an extraordinary black skin and are rich in anthocyanins and other flavonoids. However, the composition of anthocyanins and the overall molecular mechanism of anthocyanin biosynthesis in C. maximowiczii fruits have not been fully elucidated. In this study, the metabolome and transcriptome of C. maximowiczii fruits with black and red skin were analyzed. The results revealed that the differential metabolites and genes were enriched in the anthocyanin biosynthesis pathways in C. maximowiczii fruits. In total, 52 differentially accumulated flavonoid metabolites, 12 differentially accumulated anthocyanins and 22 differentially expressed genes were identified. After weighted gene coexpression network analysis, two modules were found to be highly interrelated with the accumulation of anthocyanin components. The coexpression networks of these two modules were used to identify key candidate transcription factors associated with anthocyanin biosynthesis, such as MYB5, MYB113, bHLH60, ERF105, bZIP44, NAC082, and WRKY11. The results revealed that cyanidin-based anthocyanins were the main pigments responsible for the black coloration of C. maximowiczii fruits. Based on these differentially accumulated anthocyanins and key genes, genetic and metabolic regulatory networks of anthocyanin biosynthesis were also proposed. Overall, this study elucidates the molecular basis of the formation of black color in C. maximowiczii fruits, and provides an intensive study on anthocyanin biosynthesis in C. maximowiczii for comprehensive utilization.
Article
Full-text available
Crataegus hupehensis Sarg. is well-known for its medicinal and nutritive value. In this study, the complete chloroplast genome sequence of C. hupehensis was determined by using Illumina high-throughput sequencing approach. The complete chloroplast genome is 159,766 bp with 36.6% GC content. It contained a pair of inverted repeat regions of 26,385 bp, a large single-copy region of 87,852 bp, and a small single-copy region of 19,144 bp. It contained 112 distinct genes, including 78 protein-coding genes, 4 ribosomal RNA genes, and 30 transfer RNA genes. Phylogenetic analysis based on chloroplast genomes indicated that C. hupehensisis was closely related to C. kansuensis and C. marshallii in the subfamily Maloideae. This complete chloroplast genome will provide valuable insight into evolution, molecular breeding, and phylogenetic analysis of Crataegus species.
Article
Full-text available
Free‐living cyanobacteria were entrapped by eukaryotic cells ~2 billion years ago, ultimately giving rise to chloroplasts. After a century of debate, the presence of chloroplast DNA was demonstrated in the 1960s. The first chloroplast genomes were sequenced in the 1980s, followed by ~100 vegetable, fruit, cereal, beverage, oil, starch/sugar crop chloroplast genomes in the past three decades. Foreign genes were expressed in isolated chloroplasts or intact plant cells in the late 1980s and stably integrated into chloroplast genomes, with typically maternal inheritance shown in the 1990s. Since then, chloroplast genomes conferred the highest reported levels of tolerance or resistance to biotic or abiotic stress. Although launching products with agronomic traits in important crops using this concept has been elusive, commercial products developed include enzymes used in everyday life from processing fruit juice, to enhancing water absorption of cotton fiber or removal of stains as laundry detergents and in dye removal in the textile industry. Plastid genome sequences have revealed the framework of green plant phylogeny as well as the intricate history of plastid genome transfer events to other eukaryotes. Discordant historical signals among plastid genes suggest possible variable constraints across the plastome and further understanding and mitigation of these constraints may yield new opportunities for bioengineering. In this review, we trace the evolutionary history of chloroplasts, status of autonomy, and recent advances in products developed for everyday use or those advanced to the clinic, including treatment of COVID‐19 patients and SARS‐CoV‐2 vaccine.
Article
Full-text available
Abelmoschus is an economically and phylogenetically valuable genus in the family Malvaceae. Owing to coexistence of wild and cultivated form and interspecific hybridization, this genus is controversial in systematics and taxonomy and requires detailed investigation. Here, we present whole chloroplast genome sequences and annotation of three important species: A. moschatus, A. manihot and A. sagittifolius, and compared with A. esculentus published previously. These chloroplast genome sequences ranged from 163121 bp to 163453 bp in length and contained 132 genes with 87 protein-coding genes, 37 transfer RNA and 8 ribosomal RNA genes. Comparative analyses revealed that amino acid frequency and codon usage had similarity among four species, while the number of repeat sequences in A. esculentus were much lower than other three species. Six categories of simple sequence repeats (SSRs) were detected, but A. moschatus and A. manihot did not contain hexanucleotide SSRs. Single nucleotide polymorphisms (SNPs) of A/T, T/A and C/T were the largest number type, and the ratio of transition to transversion was from 0.37 to 0.55. Abelmoschus species showed relatively independent inverted-repeats (IR) boundary traits with different boundary genes compared with the other related Malvaceae species. The intergenic spacer regions had more polymorphic than protein-coding regions and intronic regions, and thirty mutational hotpots (≥200 bp) were identified in Abelmoschus, such as start-psbA, atpB-rbcL, petD-exon2-rpoA, clpP-intron1 and clpP-exon2.These mutational hotpots could be used as polymorphic markers to resolve taxonomic discrepancies and biogeographical origin in genus Abelmoschus. Moreover, phylogenetic analysis of 33 Malvaceae species indicated that they were well divided into six subfamilies, and genus Abelmoschus was a well-supported clade within genus Hibiscus.
Article
Full-text available
Background The invasive species Xanthium spinosum has been used as a traditional Chinese medicine for many years. Unfortunately, no extensive molecular studies of this plant have been conducted. Results Here, the complete chloroplast (cp) genome sequence of X. spinosum was assembled and analyzed. The cp genome of X. spinosum was 152,422 base pairs (bp) in length, with a quadripartite circular structure. The cp genome contained 115 unique genes, including 80 PCGs, 31 tRNA genes, and 4 rRNA genes. Comparative analyses revealed that X. spinosum contains a large number of repeats (999 repeats) and 701 SSRs in its cp genome. Fourteen divergences (Π > 0.03) were found in the intergenic spacer regions. Phylogenetic analyses revealed that Parthenium is a sister clade to both Xanthium and Ambrosia and an early-diverging lineage of subtribe Ambrosiinae, although this finding was supported with a very weak bootstrap value. Conclusion The identified hotspot regions could be used as molecular markers for resolving phylogenetic relationships and species identification in the genus Xanthium.
Article
Full-text available
Crataegus kansuensis Wils. is an important wild eco economical species of the family Rosaceae. The complete chloroplast genome reported here is 159,865 bp in length, including two inverted repeats (IRs) of 26,384 bp, which are separated by a large single-copy (LSC) and a small single-copy (SSC) of 87,815 bp and 19,282 bp, respectively. The whole chloroplast genome of C. kansuensis contains 113 genes, including 79 protein-coding genes, 30 transfer RNA, and 4 ribosome RNA. Phylogenetic analysis indicated that C. kansuensis is closely related to that of C. chungtienensis and C. marshallii, and the genus Crataegus L. was sister to the genus Amelanchier Medik.
Article
Full-text available
Background: Artemisia in East Asia includes a number of economically important taxa that are widely used for food, medicinal, and ornamental purposes. The identification of taxa, however, has been hampered by insufficient diagnostic morphological characteristics and frequent natural hybridization. Development of novel DNA markers or barcodes with sufficient resolution to resolve taxonomic issues of Artemisia in East Asia is significant challenge. Results: To establish a molecular basis for taxonomic identification and comparative phylogenomic analysis of Artemisia, we newly determined 19 chloroplast genome (plastome) sequences of 18 Artemisia taxa in East Asia, de novo-assembled and annotated the plastomes of two taxa using publicly available Illumina reads, and compared them with 11 Artemisia plastomes reported previously. The plastomes of Artemisia were 150,858-151,318 base pairs (bp) in length and harbored 87 protein-coding genes, 37 transfer RNAs, and 8 ribosomal RNA genes in conserved order and orientation. Evolutionary analyses of whole plastomes and 80 non-redundant protein-coding genes revealed that the noncoding trnH-psbA spacer was highly variable in size and nucleotide sequence both between and within taxa, whereas the coding sequences of accD and ycf1 were under weak positive selection and relaxed selective constraints, respectively. Phylogenetic analysis of the whole plastomes based on maximum likelihood and Bayesian inference analyses yielded five groups of Artemisia plastomes clustered in the monophyletic subgenus Dracunculus and paraphyletic subgenus Artemisia, suggesting that the whole plastomes can be used as molecular markers to infer the chloroplast haplotypes of Artemisia taxa. Additionally, analysis of accD and ycf1 hotspots enabled the development of novel markers potentially applicable across the family Asteraceae with high discriminatory power. Conclusions: The complete sequences of the Artemisia plastomes are sufficiently polymorphic to be used as super-barcodes for this genus. It will facilitate the development of new molecular markers and study of the phylogenomic relationships of Artemisia species in the family Asteraceae.
Article
Full-text available
Debregeasia is an economically important genus of the nettle family (Urticaceae). Previous systematic studies based on morphology, or using up to four plastome regions, have not satisfactorily resolved relationships within the genus. Here, we report 25 new plastomes for Urticaceae, including 12 plastomes from five Debregeasia species and 13 plastomes from other genera. Together with the one published plastome for Debregeasia, we analyzed plastome structure and character, identified mutation hotspots and loci under selection, and constructed phylogenies. The plastomes of Debregeasia were found to be very conservative, with a size from 155,743 bp to 156,065 bp, and no structural variation. Eleven mutation hotspots were identified, including three (rpoB-trnC-GCA, trnT-GGU-psbD and ycf1) that are highly variable both within Debregeasia and among genera; these show high potential value for future DNA barcoding, population genetics and phylogenetic reconstruction. Selection pressure analysis revealed nine genes (clpP, ndhF, petB, psbA, psbK, rbcL, rpl23, ycf2, and ycf1) that may experience positive selection. Phylogenomic analyses results suggest that Debregeasia was monophyletic, and closest to Boehmeria among genera examined. Within Debregeasia, D. longifolia was sister to D. saeneb, whereas D. elliptica, D. orientalis with D. squamata formed the other subclade. This study enriches organelle genome resources for Urticaceae, and highlights the utility of plastome data for detecting mutation hotspots for evolutionary and systematic analysis.
Article
Full-text available
Present study was designed to compared the total flavonoids and polyphenols contents and antibacterial activity of hawthorn extracts with different polarities as well as the underlying antibacterial mechanisms. The results showed that among all hawthorn extracts, methanol and ethanol extracts (ME and EE) exhibited high levels of total flavonoids and polyphenols contents, followed by acetone, ethyl acetate, trichloromethane and petroleum ether extracts. ME exhibited the strongest antibacterial activity against tested bacteria, especially Staphylococcus aureus with a 1.25 μg/mL of the minimum inhibitory concentration (MIC) and minimum bactericide concentration (MBC). Further analysis revealed that the main phenolic compounds from ME were epicatechin (281.6 mg/100 g DW), procyanidin B2 (243.5 mg/100 g DW), chlorogenic acid (84.2 mg/100 g DW) and quercetin (78.4 mg/100 g DW). The action mechanism of ME against S. aureus could be ascribed to ME damaging cell wall and cell membrane integrity, inhibiting intracellular enzyme activity, increasing reactive oxygen species (ROS), also changing expression of associated genes and then inducing apoptosis of S. aureus. In addition, the antimicrobial activity of ME against S. aureus has also been demonstrated to be efficient in the food matrix (whole milk).
Article
Full-text available
Crataegus pinnatifida is an important medicinal and edible plant. Now, the complete chloroplast (cp) genome of C. pinnatifida was assembled and annotated. The cp genome of C. pinnatifida was 159,898 bp and contained two short inverted repeat regions (26,540 bp) which were separated by a small single copy region (19,219 bp) and a large single copyregion (87,599 bp). The cp genome encodes 109 unique genes, including 75 protein-coding genes, 30 transfer RNA genes and 4 ribosomal RNA genes. The topology of the phylogenetic tree showed that C. pinnatifida has a close relationship with species Eriobotrya, Sorbus, Pyrus, Mulus, and Chaenomeles.
Article
Full-text available
Medicinal plants, many of which are wild, have recently been under the spotlight worldwide due to growing requests for natural and sustainable eco-compatible remedies for pathological conditions with beneficial health effects that are able to support/supplement a daily diet or to support and/or replace conventional pharmacological therapy. The main requests for these products are: safety, minimum adverse unwanted effects, better efficacy, greater bioavailability, and lower cost when compared with synthetic medications available on the market. One of these popular herbs is hawthorn (Crataegus spp.), belonging to the Rosaceae family, with about 280 species present in Europe, North Africa, West Asia, and North America. Various parts of this herb, including the berries, flowers, and leaves, are rich in nutrients and beneficial bioactive compounds. Its chemical composition has been reported to have many health benefits, including medicinal and nutraceutical properties. Accordingly, the present review gives a snapshot of the in vitro and in vivo therapeutic potential of this herb on human health.