Access to this full-text is provided by Springer Nature.
Content available from Scientific Reports
This content is subject to copyright. Terms and conditions apply.
Characterization of the complete
chloroplast genome and
development of molecular markers
of Salix
Pu Wang1,2, Jiahui Guo1,2, Jie Zhou1 & Yixuan Wang1
Salix, an economically and ecologically multifunctional tree species widely distributed in China,
encompasses ve ornamental species sequenced in this study, which are highly benecial for plant
phytoremediation due to their ability to absorb heavy metals. This research utilized high-throughput
sequencing to acquire chloroplast genome sequences of Salix, analyzing their gene composition and
structural characteristics, identifying potential molecular markers, and laying a foundation for Salix
identication and resource classication. Chloroplast DNA was extracted from the leaves of Salix
argyracea, Salix dasyclados, Salix eriocephala, Salix integra ‘Hakuro Nishiki’, and Salix suchowensis
using an optimized CTAB method. Sequencing was conducted on the Illumina NovaSeq PE150
platform, and bioinformatics tools were employed to compare the structural features and variations
within the chloroplast genomes of the Salix. Analysis revealed high similarity among the chloroplast
genome sequences of the ve Salix species, with a subsequent examination identifying 276, 269,
270, 273, and 273 SSR loci, respectively, along with unique simple repeat sequences in each variety.
Comparison of chloroplast genomes across 22 Salix highlighted variations in regions such as matK-
trnQ, ndhC-trnV, psbE-petL, rpl36-rps8, and ndhB-rps7, which may serve as valuable molecular markers
for willow resource classication studies. In this study, chloroplast genome sequencing and structural
analysis of Salix not only enhances the genetic resources of Salix but also forms a critical basis for the
development of molecular markers and the exploration of interspecic phylogeny in the genus.
Keywords Salix, Chloroplast genome, Structure characteristics, Locus of variation, Molecular markers
e chloroplast (cp) is an essential organelle for photosynthesis and energy supply in green plant cells1–3. It is
vital for starch synthesis, nitrogen metabolism, sulfate reduction, and fatty acid synthesis4,5. Chloroplast DNA
(cp DNA) is a single, circular molecule with four structures, namely a large single-copy (LSC) region, a small
single-copy (SSC) region, and two copies of inverted repeat regions (IRa and IRb)6,7. Due to its small size, highly
conserved structure, low substitution rate, and haploid nature, cpDNA has become the ideal tool in studies on
diversity and evolution at lower taxonomic levels8–10. Chloroplast DNA is maternally inherited, thus providing
essential information for molecular markers, breeding of new varieties, and plant phylogeny11–13.
Willow, a collective term for the Salix and Chosenia arbutafolia (Pall.) A. Skv in the Salicaceae family. e
genus Salix is composed of 520 species with worldwide distribution14,15. e taxonomy and phylogeny of Salix
based on traditional morphological characteristics have been controversial and unreliable because of their
dioecious reproduction, simple owers, large intraspecic phenotypic variation, frequent hybridization, and
easy propagation16–18. Argus (1997) recognized four subgenera within the North American Salix species, Salix,
Longifoliae, Vetrix, and Chamaetia. Ohashi proposed a classication system for the willow genus based on
Japanese plants, dividing the willow genus into six subgenera Pleuradenia, Chosenia, Protitea, Chamaetia, Salix,
Vetrix19,20. e classication and localization of willow plants have been debated for a long time, and there are
dierences in understanding their evolutionary relationships.
is study selects ve willow trees with high ornamental value and high biomass, namely Salix argyracea,
Salix dasyclados, Salix eriocephala, Salix integra ‘Hakuro Nishiki’, and Salix suchowensis. To achieve this, the
complete chloroplast (cp) genomes of S. argyracea, S. dasyclados, S. eriocephala, S. integra ‘Hakuro Nishiki’,
and S. suchowensis were characterized and de novo assembly was performed. e 16 available cp genomes of
1Jiangsu Academy of Forestry, Nanjing, China. 2These authors contributed equally to this work. email:
zjwin718@126.com
OPEN
Scientic Reports | (2024) 14:28528 1
| https://doi.org/10.1038/s41598-024-79604-8
www.nature.com/scientificreports
Content courtesy of Springer Nature, terms of use apply. Rights reserved
other Salix species and Chosenia arbutifolia were also annotated. e potential molecular markers were mined
by analysis of the simple sequence repeat (SSR) markers, repetitive sequences, nucleotide diversity, positive
selection genes, and highly divergent regions which could be used for interspecic identication. To lay the
foundation for further research on the phylogeny, tree species identication, and evolution of the willow genus,
and to provide reference materials for the development of DNA barcodes in the willow genus.
Materials and methods
Plant materials
e ve Salix species of S. argyracea, S. dasyclados, S. eriocephala, S. integra ‘Hakuro Nishiki’, and S. suchowensis
were preserved and deposited in the willow collection at Jiangsu Academy of Forestry (31.861947°N,
118.777145°E). e voucher specimens were deposited at the herbarium of Jiangsu Academy of Forestry under
the voucher numbers P102, P126, 87, P646, and P63, respectively. Fresh leaves were collected for DNA isolation
and library construction. Genomic sequencing was performed using the Illumina Novaseq PE150 platform (San
Diego, CA, USA).
CpDNA sequencing and de novo assembly
e raw sequenced data were ltered by fastp (version 0.20.0, https://github.com/OpenGene/fastp) soware
to obtain clean data. en de novo assembly was constructed by SPAdes v3.10.1 ( h t t p : / / c a b . s p b u . r u / s o w a r e
/ s p a d e s / ) for the complete pseudo genome. Five, high-quality, complete Salix cp genomes were deposited in
NCBI under these accession numbers: MT551159 (S. argyracea), MT551160 (S. dasyclados), MT551161 (S.
eriocephala), MT551162 (S. integra ‘Hakuro Nishiki’), and MT551163 (S. suchowensis).
Chloroplast gene annotation and chloroplast mapping
e cpDNA coding sequence (CDS) was annotated by Prodigal v2.6.321. e rRNA and tRNA were predicted by
HMMER v3.1b2 (http://hmmer.org/) and ARAGORN v1.2.3822. e sequences were submitted to the NCBI for
the nal annotation by BLAST v2.6 (https://blast.ncbi.nlm.nih.gov/Blast.cgi). e cp genome maps of the ve
Salix species were drawn in OGDRAW23. Chloroplast SSRs ranging from mono- to octa-nucleotide repeats, were
identied by using MISA v1.024. RSCU was analyzed with MEGA 725. e sequences were aligned by MAFFT
v7.427 (https://ma.cbrc.jp/alignment/soware/), and the synonymous and nonsynonymous substitution rates
were calculated with KaKs_Calculator v2.0 (https://sourceforge.net/projects/kakscalculator2/). e nucleotide
diversity (Pi) was calculated by dnasp5 (https://dnasp.soware.informer.com/5.1/)26. Using soware CGVIEW
(http://stothard.afns.ualberta.ca/cgview_server/) Default parameters for comparative analysis of chloroplast
genome structure in close source species.
Identication of Simple Sequence Repeats markers
e genomic sequences were analyzed to identify potential microsatellites (SSRs. i.e., mono-, di-, tri-, tetra-,
penta-, and hexanucleotide repeats) using MISA soware (http://pgrc.ipk-gatersleben.de/misa/) with thresholds
of 10 repeat units for mononucleotide SSRs and ve repeat units for di-, tri-, tetra-, penta-, and hexanucleotide
SSRs. e web-based soware REPuter (http://bibiserv.techfak.uni-bielefeld.de/reputer/)27 was used to analyze
the repeat sequences, which included forward, reverse, complement, palindromic, and tandem repeats with
minimal lengths of 30bp and edit distances of less than 3bp.
Phylogenetic analysis and genome homology analysis
e multiple alignment of the cp genomes of 32 species were conducted by MAFFT v7.427 for phylogenetic
analysis. e phylogenetic tree was constructed using the ML (maximum-likelihood) method28–30 by RAxML
v8.2.1031.
e cp genomes for the following species were retrieved from the NCBI database: Chosenia arbutifolia
(NC_036718.1), S. babylonica (NC_028350.1), S. chaenomeloides (NC_037422.1), S. tetrasperma (NC_035744.1),
S. hypoleuca (NC_037423.1), S. interior (NC_024681.1), S. magnica (NC_037424.1), S. minjiangensis
(NC_037425.1), S. oreinoma (NC_035743.1), S. paraplesia (NC_037426.1), S. purpurea (NC_029693.1), S.
rehderiana (NC_037427.1), S. rorida (NC_037428.1), S. taoensis (NC_037429.1), S. tetrasperma (NC_035744.1),
S. gracilistyla (NC_043878.1), S. koriyanagi (NC_044419.1), Eucalyptus spathulata (NC_022400.1), and Quercus
bawanglingensis (NC_046583.1). Populus cathayana (NC_040874.1), Populus yunnanensis (MK267299.1),
Populus tremula (KP861984.1), Populus alba (NC_008235.1), Populus balsamifera (NC_024735.1), Populus
fremontii (NC_024734.1), Populus trichocarpa (NC_009143.1), Populus euphratica (NC_024747.1).
Results
Characterization of chloroplast genomes in Salix
Using the Illumina Novaseq PE150 platform, 20 913 346, 19 041 713, 22 544 659, 18 602 676, and 20 582 680
paired-end clean reads were obtained for S. argyracea, S. dasyclados, S. eriocephala, S. integra, and S. suchowensis,
respectively, with GC content ranging from 36.67% to 36.71% (Table 1). Aer the de novo assembly, the complete
cp genomes were 155 605bp, 155 763bp, 155 552bp, 155 538bp, and 155 550bp in size, respectively (Table 1
and Fig.1). e genomes exhibited a typical quadripartite structure with the LSC region (84 414–84 588bp), SSC
region (16 214–16 275bp), and IRs (27 384–27 479bp) (Table 1 and Fig.1). e slightly dierent size in the SSC
region and IRs indicates the expansion of these regions between species. e GC content of the IR, LSC, and SSC
regions was about 41%, 30%, and 34%, respectively (Table 1).
Among them, 14 genes (ndhA, ndhB, petB, petD, atpF, rpl16, rpl2, rpoC1, trnA-UGC, trnG GCC, trnI GAU,
trnK UUU, trnL UAA, trnV UAC) have one intron, and 3 genes (ycf3, rps12, clpP) have two introns (Table 1). e
Scientic Reports | (2024) 14:28528 2
| https://doi.org/10.1038/s41598-024-79604-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
gene rps12 is located in the IR region, while ycf3 and clpP are located in the LSC region. e ndhA gene with only
one intron is located in the SSC region, while the other genes are located in the LSC and IRs regions (Table 2).
Repetitive sequences and cpSSR analysis
Relative synonymous codon usage (RSCU) was used to evaluate the codon usage frequency. Codon usage bias
is an indicator of natural selection, species mutation, and genetic uctuation. In the ve Salix species, Arg, Leu,
and Ser are the most frequent amino acids. Trp is the only codon exhibiting no bias (RSCU = 1.00) in the ve cp
genomes (Fig.2A). ere are 19–26 forward repeats, 5–7 reverse repeats, 5–7 complement repeats, and 15–19
palindromic repeats in the ve species (Fig.2B). e total numbers are less in S. integra than in the other four
species. e largest repeat sequence was 104bp in S. argyracea in the IGS-rpl16 region.
SSRs were found in the dierent regions of the ve species. most of them were located in the LSC region.
e total numbers of SSRs were slightly dierent (Table 3), SSR-related primers and results are available in
the supplementary materials. Mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs were found in the
ve species. e SSRs (A)18, (C)12, (G)10, (AT)11, and (TAATAT)3 were unique to S. dasyclados, and (T)19 and
(TTGATA)3 were found only in S. integra ‘Hakuro Nishiki’. S. eriocephala, S. suchowensis, and S. argyracea shared
the same di-, tri- tetra-, penta-nucleotide SSRs, and each was repeated three times. these species lacked unique
SSRs. (A)16, (G)13, and (G)12 were present in S. eriocephala. (A)16 and (G)13 were lost, but (G)12 was present in
S. suchowensis and S. argyracea. e two latter species were dierentiated in the number of repeats at (T)15 and
(T)16 (Fig.3). ese specic SSRs provide valuable information for Salix taxonomy.
Variation among the cp genomes
e following highly-divergent regions among the 22 species were detected by Mauve ( h t t p : / / d a r l i n g l a b . o r g /
m a u v e ) , matK–trnQ, ndhC–trnV, psbE–petL, rpl36–rps8, and ndhB–rps7 (Fig.4). ey are potentially suitable
markers for species delimitation within Salix.
e border junctions were compared between IR/SSC and IR/LSC for the 21 Salix species and Chosenia
arbutifolia. In all the species, rpl22 was located at the LSC/IRb junction. rps19 was located within the IRb, 52bp
from the LSC region. ycf1 and ndhF were in the IRb/SSC region. ycf1 and trnN were at the SSC/IRa boundary.
and rps19/trnH were in the IRa/SSC region. Fieen species, including Chosenia arbutifolia, shared the same
border genes and the same junction length. e remaining seven species exhibited fragment deletions or site
variation in the border region. In the LSC/IRb border, the length of rpl22 in the LSC region (348–350bp) and
in the IRb region (50–52bp) varied slightly among S. tetrasperma, S. babylonica, and S. interior. In the IRb/SSC
junction, ycf1 exhibited long fragment deletions in S. integra ‘Hakuro Nishiki’ (1673bp in the IRb and 25bp
in the LSC region) and in S. chaenomeloides (31bp in the SSC region), but there was only a 9bp deletion in
Gene Location Exon I (bp) Intron II (bp) Exon II (bp) Intron II (bp) Exon III (bp)
trnK-UUU LSC 38,37,37,37,37 2544,2524,2545,2545,2545 37,36,36,36,36
trnG-GCC LSC 23 697,695,697,697,697 48
atpF LSC 144 740,742,739,741,741 399
rpoC1 LSC 453 763,773,762,763,764 1617
ycf3 LSC 126 680,678,679,679,680 228 725 153
trnL-UAA LSC 35 585,586,586,586,586 50
trnV-UAC LSC 39 609 35
clpP LSC 71 586,584,585,584,585 292 838 228
petB LSC 6 811 642
petD LSC 9 778,780,779,779,778 489
rpl16 LSC 9 1114,1143,1114,1120,1114 399
rpl2 IRb 396 668 435
ndhB IRb 777 682 756
trnI-GAU IRb 37 949 35
trnA-UGC IRb 38 802 35
rps12 IRb 114 - 30 539 231
rps12 IRa 114 - 231 539 30
trnA-UGC IRa 38 802 35
trnI-GAU IRa 37 949 35
ndhB IRa 777 682 756
rpl2 IRa 396 668 435
ndhA SSC 552 1112,1115,1114,1107,1108 543
Tab le 1. Genes with exons and introns annotated in the chloroplast genomes of ve Salix species. 1. Multiple
numbers in a cell refer to locations of genes in S. argyracea, S. dasyclados, S. eriocephala, S. integra ‘Hakuro
Nishiki’, and S. suchowensis, respectively. A single number means the same gene location in the ve species. 2.
LSC, large single-copy region. SSC, small single-copy region. IR, inverted repeat region.
Scientic Reports | (2024) 14:28528 3
| https://doi.org/10.1038/s41598-024-79604-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
S. minjiangensis (1739bp in the IRb region). e gene ndhF was conserved in all species, except in S. integra
‘Hakuro Nishiki’, with a relatively short total length and only 2238bp in the SSC region. In the SSC/IRa border,
ycf1 was 3721–3694bp long in the IRa region in Salix interior, S. chaenomeloides, S. tetrasperma, S. babylonica,
and S. paraplesia. However, the length of the ycf1 gene in the other species in the IRa region was 3676bp, which
was shorter. e distance of yc and trnN from the border of SSC/IRa was also dierent in S. integra ‘Hakuro
Nishiki’. In S. tetrasperma, S. chaenomeloides, S. brachista, and S. babylonica, gene insertion was present in the
IRa/SSC junction (Fig.5). e length of rps19 and trnH was conserved.
Phylogenic analysis
By aligning the chloroplast genome sequences of 8 poplar species, 20 willow species, Eucalyptus spathulata, and
Quercus bawanglingensis, and using poplar, Eucalyptus spathulata, and Quercus bawanglingensis as outgroups,
the phylogenetic relationships among species were elucidated. e results show that the phylogenetic tree divides
the willow genus into 3 branches, the 20 willow species cluster together, indicating a closer relationship between
poplars and willows (Fig.6).
Fig. 1. Gene map of the ve Salix chloroplast genomes. Genes shown outside the circle are transcribed
clockwise, and those inside are recorded counterclockwise. e gray circle depicts GC content. e known
functional genes are marked with colored bars.
Scientic Reports | (2024) 14:28528 4
| https://doi.org/10.1038/s41598-024-79604-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Discussion
e Salicaceae chloroplast genome structure is usually highly conserved, with sizes ranging from 150-159kb32.
e results revealed that the structure and synteny of the Salix species were highly conserved, all of which are
typical four segment double chain structures, including 1 LSC region, 1 SSC region, and 2 IR regions (IRa and
IRb).
Chloroplast SSRs have been widely used in population genetics, polymorphism investigations, and
evolutionary biology33–35. e number of SSRs (269–276) in the cp genome of the ve Salix species was similar
to that reported for other species, but greater than that in S. wilsonii. e presence of mono-, di-, tri-, tetra-, and
pentanucleotide repeats was conrmed. e number of poly (A)/(T) repeats was far greater than the number of
poly (G)/(C) repeats, which coincides with their number in other angiosperms36. e SSRs (A)18, (C)12, (G)10,
(AT)11, and (TAATAT)3 were unique to S. dasyclados, and (T)19 and (TTGATA)3 were found only in S. integra
‘Hakuro Nishiki’. S. eriocephala, S. suchowensis, and S. argyracea shared the same di-, tri- tetra-, penta-nucleotide
SSRs, and each was repeated three times. these species lacked unique SSRs. (A)16, (G)13, and (G)12 were present in
S. eriocephala. (A)16 and (G)13 were lost, but (G)12 was present in S. suchowensis and S. argyracea. Hexanucleotide
repeats (TTGATA)3, (TAATAT)3, and (TTGATA)4 were found only in S. dasyclados and S. integra ‘Hakuro
Nishiki’, whereas (T)19 and (TTGATA)3 were unique to S. integra ‘Hakuro Nishiki’ and located in ycf3 and rpl16.
Repetitive sequences were found to participate in the cp genome arrangement and sequence variation37–39. e
increase or loss of the repetitive sequences located in the intergenic spacer (IGS) regions and in protein-coding
genes ndhA, rpl16, and psbL, make the ve species distinct from others. e genes clpP and ycf1 were commonly
found as repeat sequences in other Salix species. e repeats and SSRs identied in the Salix cp genome can be
utilized for developing lineage-specic markers for studying the evolution and taxonomy of the genus Salix.
Extension and contraction of the border regions are regarded as the main reasons for dierences in the
length of chloroplast genomes40–42. e sequence and structural variation of chloroplast genes provide a basis
for plant evolution43,44. In addition, through comparative analysis of chloroplast genomes, all willow chloroplast
genomes showed dierences in regions such as matk–trnQ, ndhC–trnV, psbE–petL, rpl36–rps8, and ndhB–rps7,
is indicates that the dierent sizes of the cp genomes are mainly due to the shrinkage and expansion of IR,
LSC, and SSC regions, similar to the previously reported ndings45,46.
e results of this study indicate that there are sequence insertions and deletions of IR/SSC and IR/LSC
boundary genes in the gene coding region and intergenic spacer region of the Salix chloroplast genome.
Meanwhile, the deletion in ycf1 in the SSC region at the SSC/IRa border and insertion at the IRa/LSC junction,
placed the old-world species within one subclade47. e pseudo-infA, pseudo-ycf68, orf42, and orf56 presented
in S. wilsonii were lost in these sequenced Salix species. e IRb SSC boundary genes in the chloroplast genome
of willow trees are highly conserved, while the IRa SSC boundary genes in the chloroplast genome of poplar are
highly conserved, showing signicant dierences48.
e taxonomy and systematic phylogeny of the genus Salix has been obscure because of its dioecious
reproduction, common natural hybridization, large intraspecic phenotypic variation, and scarceness of
informative morphological characteristics49–51. Chen conducted a phylogenetic study on the Salicaceae family
using certain genes or gene fragments from the chloroplast genome, such as rbcL, atpB-rbcL and trnD-T52. In
this study, it was found that ycf1, psaI, ycf2, rpoC2, rpl22, atpF and ndhF genes were in positive selection during
the analysis of the evolutionary direction of protein coding genes in the Salix chloroplast genome, providing new
evidence for further in-depth research on the Salix phylogeny. Traditional taxonomy suggests that the Salicaceae.
can be divided into three genera, namely Populus, Salix, and Chosenia arbutifolia53.
Genome features S. argyracea S. dasyclados S. eriocephala S. integra ’Hakuro Nishiki’ S. suchowe nsis
Genome Size (bp) 155 605 155 763 155 552 155 538 155 550
LSC size (bp) 84 468 84 588 84 414 84 495 84 418
SSC size (bp) 16 219 16 217 16 220 16 275 16 214
IR size (bp) 27 459 27 479 27 459 27 384 27 459
Number of genes (number of unigenes) 131 (77) 131 (77) 131 (77) 131 (77) 131 (77)
tRNA genes 37 37 37 37 37
rRNA genes 8 8 8 8 8
mRNA genes 86 86 86 86 86
Duplicated genes in IR 36 36 36 36 36
GC content of LSC (%) 34.44 36.67 34.45 34.41 34.44
GC content of IR (%) 41.87 41.86 41.87 41.93 41.87
GC content of SSC (%) 30.98 31.00 31.00 30.91 30.99
GC content (%) 36.7 36.67 36.71 36.69 36.7
Total reads 20 913 346 19 041 713 22 544 659 18 602 676 20 582 680
Assembled reads 506 120 1 681 045 2 921 757 1 979 809 2 099 682
Average insert size (bp) 1008.44 3293.85 5613.27 3887.25 4125.15
Tab le 2. Summary characteristics of the ve Salix chloroplast genomes. LSC, large single-copy region. SSC,
small single-copy region. IR, inverted repeat region.
Scientic Reports | (2024) 14:28528 5
| https://doi.org/10.1038/s41598-024-79604-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Salix species and Chosenia arbutifolia gather within large monophyletic branches. One branch consists of S.
willsonii, S. babylonica, S. tetrasperma, and S. Interior, other tree species are another branch. At the same time,
the ycf1 deletion in the SSC region at the SSC/IRa junction and the insertion at the IRa/LSC junction will classify
the old-world species into a subclass. Conservative boundary genes classify other species under the Salix genus.
Fig. 2. e relative synonymous codon usage (RSCU) of amino acids and repeat sequences. (A) Amino acid
usage frequency calculated by RSCU (B) Repeat sequence analysis of chloroplast genomes of the ve Salix
species for positive selection.
Scientic Reports | (2024) 14:28528 6
| https://doi.org/10.1038/s41598-024-79604-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Conclusion
Using the Illumina NovaSeq PE150 platform, we successfully sequenced and assembled the complete chloroplast
genomes of S. argyracea, S. dasyclados, S. eriocephala, S. integra ‘Hakuro Nishiki’, and S. suchowensis, and
compared them with chloroplast genomes of other genera. e results show that the chloroplast sequences and
gene arrangements of the ve Salix species are highly conserved, with sizes, overall structures, gene orders, and
contents similar to those of other genera. By comparing the chloroplast genomes of Salix species, dierences
in regions such as matK-trnQ, ndhC-trnV, psbE-petL, rpl36-rps8, and ndhB-rps7 were identied, which can
serve as important molecular markers for willow resource classication research. e phylogenetic relationships
strongly support the known classication of the Salicaceae family. Furthermore, the high conservation of the
entire Salix cpDNA sequences reinforces the concept of shared evolutionary history among these species. ese
genes provide a promising avenue for further research to deepen our understanding of Salix evolution.
Fig. 3. e number of SSR repeats in the ve Salix species.
Species Region Exon Intron Intergenic Total number of markers in dierent regions Total markers Proportion
S. dasyclados
LSC 31 25 124 180 276 14.10%
SSC 19 6 13 38 65.20%
IR 34 6 18 58 21.00%
S. argyracea
LSC 30 23 120 173 269 13.80%
SSC 19 6 13 38 65.20%
IR 34 6 18 58 21.00%
S. eriocephala
LSC 30 24 120 174 270 14.10%
SSC 19 6 13 38 64.40%
IR 34 6 18 58 21.50%
S. ‘Hakuro Nishiki’
LSC 32 25 120 177 273 13.90%
SSC 19 6 13 38 64.80%
IR 34 6 18 58 21.20%
S. suchowensis
LSC 32 25 120 173 273 14.10%
SSC 19 6 13 38 64.30%
IR 34 6 18 58 21.60%
Tab le 3. Simple sequence repeats (SSRs) found in ve Salix species.
Scientic Reports | (2024) 14:28528 7
| https://doi.org/10.1038/s41598-024-79604-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Fig. 4. e alignment of 21 Salix species and Chosenia arbutifolia. e long, red rectangle represents the
similarity among the genomes. e white bar indicates the annotated gene coding sequences.
Scientic Reports | (2024) 14:28528 8
| https://doi.org/10.1038/s41598-024-79604-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Fig. 5. Comparison of the borders of large single-copy (LSC), small single-copy (SSC), and inverted repeat
(IR) regions among the 22 chloroplast genomes of 21 Salix species and Chosenia arbutifolia.
Scientic Reports | (2024) 14:28528 9
| https://doi.org/10.1038/s41598-024-79604-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Data availability
e raw sequencing data for the Illumina and Nanopore platforms and the mitogenome sequences have been
deposited in NCBI (https://www.ncbi.nlm. nih. gov/) with accession numbers MT551159 (S. argyracea),
MT551160 (S. dasyclados), MT551161 (S. eriocephala), MT551162 (S. integra ‘Hakuro Nishiki’), and MT551163
(S. suchowensis).
Received: 5 July 2024; Accepted: 11 November 2024
References
1. Lee, J. et al. e complete chloroplast genome sequence of Zanthoxylum piperitum. Mitochondrial DNA A DNA Mapp. Seq. Anal.
27, 3525–3526. https://doi.org/10.3109/19401736.2015.1074201 (2016).
2. Liu, X. F., Zhu, G. F., Li, D. M. & Wang, X. J. Complete chloroplast genome sequence and phylogenetic analysis of Spathiphyllum
“Parrish”. PLoS ONE 14, e0224038. https://doi.org/10.1371/journal.pone.0224038 (2019).
3. Xia, M. & Li, Y. Complete chloroplast genome sequence of Adenostemma lavenia (Asteraceae) and phylogenetic analysis with
related species. Mitochondrial DNA B Resour. 6, 2134–2136. https://doi.org/10.1080/23802359.2021.1944369 (2021).
4. Prabhudas, S. K., Prayaga, S., Madasamy, P. & Natarajan, P. Shallow whole genome sequencing for the assembly of complete
chloroplast genome sequence of Arachis hypogaea L. Front. Plant Sci. 7, 1106. https://doi.org/10.3389/fpls.2016.01106 (2016).
5. Jo, I. H. et al. Complete chloroplast genome of the inverted repeat-lacking species Vicia bungei and development of polymorphic
simple sequence repeat markers. Front Plant. Sci. 13, 891783. https://doi.org/10.3389/fpls.2022.891783 (2022).
6. Li, X. et al. Complete chloroplast genome sequence of Magnolia grandiora and comparative analysis with related species. Sci.
China Life Sci. 56, 189–198. https://doi.org/10.1007/s11427-012-4430-8 (2013).
7. Hao, J. et al. e complete chloroplast genome sequence of Plectranthus hadiensis (Lamiaceae) and phylogenetic analysis.
Mitochondrial DNA B Resour. 8, 1049–1053. https://doi.org/10.1080/23802359.2023.2262689 (2023).
8. Xue, S. et al. Comparative analysis of the complete chloroplast genome among Prunus mume, P. armeniaca, and P. salicina. Hor tic
Res. 6, 89. https://doi.org/10.1038/s41438-019-0171-1 (2019).
9. Lin, J., Lin, Z., Chen, Y. & Xu, H. e complete chloroplast genome sequence of Lemna turionifera (Araceae). Mitochondrial DNA
B R esour. 9, 971–975. https://doi.org/10.1080/23802359.2024.2384577 (2024).
10. Li, X. Y. Complete chloroplast genome sequence of Mahonia duclouxiana (Berberidaceae), a medicinal plant in China.
Mitochondrial DNA B Resour. 6, 3023–3024. https://doi.org/10.1080/23802359.2021.1978888 (2021).
Fig. 6. e phylogenetic tree based on the 32 complete chloroplast genome sequences.
Scientic Reports | (2024) 14:28528 10
| https://doi.org/10.1038/s41598-024-79604-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
11. Njuguna, A. W. et al. Comparative analyses of the complete chloroplast genomes of nymphoides and menyanthes species
(menyanthaceae). Aquatic Bot. 156, 73–81. https://doi.org/10.1016/j.aquabot.2019.05.001 (2019).
12. Cui, Y. et al. Complete chloroplast genome and comparative analysis of three Lycium (Solanaceae) species with medicinal and
edible properties. Gene Rep. 17, 100464. https://doi.org/10.1016/j.genrep.2019.100464 (2019).
13. Zhang, Z. et al. e complete chloroplast genome sequence and phylogenetic relationship analysis of Eomecon chionantha, one
species unique to China. J. Plant Res. 137, 575–587. https://doi.org/10.1007/s10265-024-01539-y (2024).
14. Villette, C., Maurer, L. & Heintz, D. Investigation of xenobiotics metabolism in Salix alba Leaves via Mass spectrometry imaging.
J. Vis. Exp. https://doi.org/10.3791/61011 (2020).
15. Gulyaev, S. et al. e phylogeny of Salix revealed by whole genome re-sequencing suggests dierent sex-determination systems in
major groups of the genus. Ann. Bot. 129, 485–498. https://doi.org/10.1093/aob/mcac012 (2022).
16. Wu, J. et al. Phylogeny of Salix subgenus Salix s.l. (Salicaceae): Delimitation, biogeography, and reticulate evolution. BMC Evol.
Biol. 15, 31. https://doi.org/10.1186/s12862-015-0311-7 (2015).
17. Kersten, B. et al. Genome sequences of Populus tremula chloroplast and mitochondrion: Implications for holistic poplar breeding.
PLoS ONE 11, e0147209. https://doi.org/10.1371/journal.pone.0147209 (2016).
18. Ren, W. et al. e chloroplast genome of Salix oderusii and characterization of chloroplast regulatory elements. Front Plant Sci.
13, 987443. https://doi.org/10.3389/fpls.2022.987443 (2022).
19. Ohashi, H. A systematic enumeration of Japanese Salix (Salicaceae). J. Jpn. Bot. 75, 1–41 (2000).
20. Qiao, S. et al. Responses of growth and photosynthesis to alkaline stress in three willow species. Sci. Rep. 14, 14672. h t t p s : / / d o i . o r
g / 1 0 . 1 0 3 8 / s 4 1 5 9 8 - 0 2 4 - 6 5 0 0 4 - 5 (2024).
21. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identication. BMC Bioinform. 11, 119. h t t p s :
/ / d o i . o r g / 1 0 . 1 1 8 6 / 1 4 7 1 - 2 1 0 5 - 1 1 - 1 1 9 (2010).
22. Laslett, D. & Canback, B. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids
Res. 32, 11–16. https://doi.org/10.1093/nar/gkh152 (2004).
23. Greiner, S., Lehwark, P. & Bock, R. OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical
visualization of organellar genomes. Nucleic Acids Res. 47, W59-w64. https://doi.org/10.1093/nar/gkz238 (2019).
24. iel, T., Michalek, W., Varshney, R. K. & Graner, A. Exploiting EST databases for the development and characterization of gene-
derived SSR-markers in barley (Hordeum vulgare L.). eor. Appl. Genet. 106, 411–422. h t t p s : / / d o i . o r g / 1 0 . 1 0 0 7 / s 0 0 1 2 2 - 0 0 2 - 1 0 3
1 - 0 (2003).
25. Kumar, S., Stecher, G. & Tamura, K. MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol.
Evolut. 33, 1870–1874. https://doi.org/10.1093/molbev/msw054 (2016).
26. Librado, P. & Rozas, J. DnaSP v5: a soware for comprehensive analysis of DNA polymorphism data. Bioinformatics 25, 1451–1452.
https://doi.org/10.1093/bioinformatics/btp187 (2009).
27. Kurtz, S. et al. REPuter: e manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 29, 4633–4642. h t t p s : /
/ d o i . o r g / 1 0 . 1 0 9 3 / n a r / 2 9 . 2 2 . 4 6 3 3 (2001).
28. Luo, Z. et al. Molecular characteristics and phylogenetic denition on the complete chloroplast genome of Petrocodon longitubus.
Plant Biotechnol. Rep. https://doi.org/10.1007/s11816-024-00919-z (2024).
29. Miao, X. et al. Assembly and comparative analysis of the complete mitochondrial and chloroplast genome of Cyperus stoloniferus
(Cyperaceae), a coastal plant possessing saline-alkali tolerance. BMC Plant Biol. 24, 628. h t t p s : / / d o i . o r g / 1 0 . 1 1 8 6 / s 1 2 8 7 0 - 0 2 4 - 0 5 3 3
3 - 9 (2024).
30. Shen, Z. et al. e complete chloroplast genome sequence of the medicinal moss Rhodobryum giganteum (Bryaceae, Bryophyta):
Comparative genomics and phylogenetic analyses. Genes 15, 900 (2024).
31. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–
1313. https://doi.org/10.1093/bioinformatics/btu033 (2014).
32. Alzahrani, D. A., Yaradua, S. S., Albokhari, E. J. & Abba, A. Complete chloroplast genome sequence of Barleria prionitis,
comparative chloroplast genomics and phylogenetic relationships among Acanthoideae. BMC Genomics 21, 393. h t t p s : / / d o i . o r g / 1
0 . 1 1 8 6 / s 1 2 8 6 4 - 0 2 0 - 0 6 7 9 8 - 2 (2020).
33. Phumichai, C., Phumichai, T. & Wongkaew, A. Novel chloroplast microsatellite (cpSSR) markers for genetic diversity assessment
of cultivated and Wild Hevea Rubber. Plant Mol. Biol. Rep. 33, 1486–1498. https://doi.org/10.1007/s11105-014-0850-x (2015).
34. Honig, J. A. et al. Classication of bentgrass (Agrostis) cultivars and accessions based on microsatellite (SSR) markers. Genet.
Resour. Crop Evol. 63, 1139–1160. https://doi.org/10.1007/s10722-015-0307-6 (2016).
35. López, K. E. R., Armijos, C. E., Parra, M. & Torres, M. d. L. e rst complete chloroplast genome sequence of Mortiño (Vaccinium
oribundum) and comparative analyses with other vaccinium species. Horticulturae 9, 302 (2023).
36. Melotto-Passarin, D. M., Tambarussi, E. V., Dressano, K., De Martin, V. F. & Carrer, H. Characterization of chloroplast DNA
microsatellites from Saccharum spp and related species. Genet. Mol. Res. 10, 2024–2033. https://doi.org/10.4238/vol10-3gmr1019
(2011).
37. Bai, D., Luo, X. & Yang, Y. Complete chloroplast genome sequence of “Field Muskmelon”, an invasive weed to China. Mitochondrial
DNA B Resour. 6, 3352–3353. https://doi.org/10.1080/23802359.2021.1994888 (2021).
38. Bozkurt, A., Kaymaz, Y., Ateş, D. & Tanyolaç, M. B. e complete sequence of Lens tomentosus chloroplast genome. Acta
Physiologiae Plantarum 46, 2. https://doi.org/10.1007/s11738-023-03628-2 (2023).
39. Zhao, M., Wu, Y. & Ren, Y. Complete Chloroplast Genome Sequence Structure and Phylogenetic Analysis of Kohlrabi (Brassica
oleracea var. gongylodes L.). Genes (Basel) 15. https://doi.org/10.3390/genes15050550 (2024).
40. Lloyd Evans, D., Joshi, S. V. & Wang, J. Whole chloroplast genome and gene locus phylogenies reveal the taxonomic placement and
relationship of Tripidium (Panicoideae: Andropogoneae) to sugarcane. BMC Evol. Biol. 19, 33. h t t p s : / / d o i . o r g / 1 0 . 1 1 8 6 / s 1 2 8 6 2 - 0 1
9 - 1 3 5 6 - 9 (2019).
41. Long, J., Tian, Y., Zhang, J. & Wang, Z. e complete chloroplast genome sequence of Olea dioica Roxb, 1820 (Oleaceae).
Mitochondrial DNA B Resour. 9, 748–752. https://doi.org/10.1080/23802359.2024.2366373 (2024).
42. Ahmad, W. et al. Complete chloroplast genome sequencing and comparative analysis of threatened dragon trees Dracaena serrulata
and Dracaena cinnabari. Sci. Rep. 12, 16787. https://doi.org/10.1038/s41598-022-20304-6 (2022).
43. Wei, F. et al. e complete chloroplast genome sequence of the medicinal plant Sophora tonkinensis. Sci. Rep. 10, 12473. h t t p s : / / d
o i . o r g / 1 0 . 1 0 3 8 / s 4 1 5 9 8 - 0 2 0 - 6 9 5 4 9 - z (2020).
44. Wu, F. Y., Ma, S. C., Ye, P. M., Ye, H. & Ma, J. L. e complete chloroplast genome sequence of Camellia zhaiana (eaceae), a
critically endangered species from China. Mitochondrial DNA B Resour. 6, 2425–2426. h t t p s : / / d o i . o r g / 1 0 . 1 0 8 0 / 2 3 8 0 2 3 5 9 . 2 0 2 1 . 1 9 5
5 0 2 7 (2021).
45. Kim, K. J. & Lee, H. L. Complete chloroplast genome sequences from Korean ginseng (Panax schinseng Nees) and comparative
analysis of sequence evolution among 17 vascular plants. DNA Res. 11, 247–261. https://doi.org/10.1093/dnares/11.4.247 (2004).
46. Guo, L., Zhai, J. & Gu, Y. e complete chloroplast genome sequence of Isoetes baodongii (Isoetaceae). Mitochondrial DNA B
Resour. 9, 667–671. https://doi.org/10.1080/23802359.2024.2356128 (2024).
47. Huang, Y., Wang, J., Yang, Y., Fan, C. & Chen, J. Phylogenomic analysis and dynamic evolution of chloroplast genomes in salicaceae.
Front Plant Sci. 8, 1050. https://doi.org/10.3389/fpls.2017.01050 (2017).
48. Chen, Y., Hu, N. & Wu, H. Analyzing and characterizing the chloroplast genome of Salix wilsonii. Biomed. Res. Int. 2019, 5190425.
https://doi.org/10.1155/2019/5190425 (2019).
Scientic Reports | (2024) 14:28528 11
| https://doi.org/10.1038/s41598-024-79604-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
49. Percy, D. M. et al. Understanding the spectacular failure of DNA barcoding in willows (Salix): Does this result from a trans-specic
selective sweep?. Mol. Ecol. 23, 4737–4756. https://doi.org/10.1111/mec.12837 (2014).
50. Marinček, P. et al. Challenge accepted: Evolutionary lineages versus taxonomic classication of North American shrub willows
(Salix). Am. J. Bot. 111, e16361. https://doi.org/10.1002/ajb2.16361 (2024).
51. Nie, L. et al. Complete chloroplast genome sequence of the medicinal plant Arctium lappa. Genome 63, 53–60. h t t p s : / / d o i . o r g / 1 0 .
1 1 3 9 / g e n - 2 0 1 9 - 0 0 7 0 (2020).
52. Jia-Hui, C., Hang, S. & Yong-Ping, Y. Cladistic analysis of the genus salix (Salicaceae). Acta Botanica Yunnanica (2008).
53. Zhou, J., Jiao, Z., Guo, J., Wang, B. S. & Zheng, J. Complete chloroplast genome sequencing of ve Salix species and its application
in the phylogeny and taxonomy of the genus. Mitochondrial DNA B Resour. 6, 2348–2352. h t t p s : / / d o i . o r g / 1 0 . 1 0 8 0 / 2 3 8 0 2 3 5 9 . 2 0 2 1
. 1 9 5 0 0 5 5 (2021).
Author contributions
Conceptualization, PW and J.Z.; Methodology, J.Z. and J.H.G; Validation, J.Z. and Y.X.W; Writing the original
dra, J.Z.; Supervision, J.Z.; Funding acquisition, J.Z.
Funding
is research was funded Independent Research Projects of Jiangsu Academy of Forestry [ZZKY202201] and
Key research and development plan projects National Forestry and Grassland Administration GLM【2021】83.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
e authors declare no competing interests.
Additional information
Supplementary Information e online version contains supplementary material available at h t t p s : / / d o i . o r g / 1
0 . 1 0 3 8 / s 4 1 5 9 8 - 0 2 4 - 7 9 6 0 4 - 8 .
Correspondence and requests for materials should be addressed to J.Z.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives
4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in
any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide
a link to the Creative Commons licence, and indicate if you modied the licensed material. You do not have
permission under this licence to share adapted material derived from this article or parts of it. e images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence
and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to
obtain permission directly from the copyright holder. To view a copy of this licence, visit h t t p : / / c r e a t i v e c o m m o
n s . o r g / l i c e n s e s / b y - n c - n d / 4 . 0 / .
© e Author(s) 2024
Scientic Reports | (2024) 14:28528 12
| https://doi.org/10.1038/s41598-024-79604-8
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.